Twitter/X remains one of the most valuable real-time data sources on the internet. With 550+ million monthly active users, it's where news breaks, trends emerge, and public sentiment forms in real time. For developers, researchers, and businesses, Twitter/X data powers:
The problem? Getting this data has become extremely expensive since the 2023 API pricing changes. That's why scraping has become the go-to approach for most developers.
Since Elon Musk's acquisition, Twitter/X API pricing has become prohibitively expensive for most developers:
| Tier | Price | Tweet Reads | Tweet Posts | Cost per 1K Reads |
|---|---|---|---|---|
| Free | $0/mo | 0 (write-only) | 1,500/mo | N/A |
| Basic | $100/mo | 10,000/mo | 3,000/mo | $10.00 |
| Pro | $5,000/mo | 1,000,000/mo | 300,000/mo | $5.00 |
| Enterprise | $42,000+/mo | 50,000,000+/mo | Negotiable | ~$0.84 |
At $10 per 1,000 tweet reads on the Basic tier, even a simple sentiment analysis project becomes unaffordable. The free tier doesn't even allow reading tweets. This pricing gap has made web scraping the practical choice for most Twitter/X data extraction use cases.
Here are four approaches to extract data from Twitter/X, from hands-on browser automation to production-ready API solutions:
Playwright is the best choice for scraping Twitter/X with Python because it handles JavaScript rendering, which Twitter/X requires for all content.
pip install playwright beautifulsoup4
playwright install chromium
from playwright.sync_api import sync_playwright
from bs4 import BeautifulSoup
import json
import time
def scrape_twitter_timeline(username, max_tweets=20):
"""Scrape tweets from a public Twitter/X profile."""
tweets = []
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
context = browser.new_context(
viewport={"width": 1280, "height": 900},
user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/120.0.0.0 Safari/537.36"
)
page = context.new_page()
# Navigate to profile
page.goto(f"https://x.com/{username}", wait_until="networkidle")
time.sleep(3) # Wait for dynamic content
# Scroll to load more tweets
scroll_count = 0
while len(tweets) < max_tweets and scroll_count < 10:
# Extract tweet elements
html = page.content()
soup = BeautifulSoup(html, "html.parser")
for article in soup.find_all("article", {"data-testid": "tweet"}):
tweet_data = extract_tweet(article)
if tweet_data and tweet_data not in tweets:
tweets.append(tweet_data)
# Scroll down
page.evaluate("window.scrollBy(0, 1000)")
time.sleep(2)
scroll_count += 1
browser.close()
return tweets[:max_tweets]
def extract_tweet(article):
"""Extract structured data from a tweet article element."""
try:
# Tweet text
text_el = article.find("div", {"data-testid": "tweetText"})
text = text_el.get_text(strip=True) if text_el else ""
# Username and handle
user_links = article.find_all("a", href=True)
username = ""
display_name = ""
for link in user_links:
href = link.get("href", "")
if href.startswith("/") and not href.startswith("/i/"):
username = href.strip("/")
display_name = link.get_text(strip=True)
break
# Engagement metrics
likes = get_metric(article, "like")
retweets = get_metric(article, "retweet")
replies = get_metric(article, "reply")
# Timestamp
time_el = article.find("time")
timestamp = time_el.get("datetime", "") if time_el else ""
return {
"text": text,
"username": username,
"display_name": display_name,
"timestamp": timestamp,
"likes": likes,
"retweets": retweets,
"replies": replies
}
except Exception:
return None
def get_metric(article, metric_type):
"""Extract engagement metric (likes, retweets, replies)."""
el = article.find("button", {"data-testid": metric_type})
if el:
text = el.get_text(strip=True)
return parse_count(text) if text else 0
return 0
def parse_count(text):
"""Parse Twitter-style count strings (1.2K, 3.5M, etc.)."""
text = text.strip().upper()
if "K" in text:
return int(float(text.replace("K", "")) * 1000)
if "M" in text:
return int(float(text.replace("M", "")) * 1000000)
try:
return int(text)
except ValueError:
return 0
# Usage
tweets = scrape_twitter_timeline("elonmusk", max_tweets=10)
for t in tweets:
print(f"@{t['username']}: {t['text'][:80]}...")
print(f" ❤️ {t['likes']} 🔁 {t['retweets']} 💬 {t['replies']}")
print()
def scrape_twitter_search(query, max_tweets=30):
"""Scrape tweets matching a search query."""
import urllib.parse
encoded_query = urllib.parse.quote(query)
tweets = []
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
context = browser.new_context(
viewport={"width": 1280, "height": 900},
user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 Chrome/120.0.0.0 Safari/537.36"
)
page = context.new_page()
# Twitter search URL (Latest tab for chronological results)
url = f"https://x.com/search?q={encoded_query}&src=typed_query&f=live"
page.goto(url, wait_until="networkidle")
time.sleep(3)
scroll_count = 0
while len(tweets) < max_tweets and scroll_count < 15:
html = page.content()
soup = BeautifulSoup(html, "html.parser")
for article in soup.find_all("article", {"data-testid": "tweet"}):
tweet_data = extract_tweet(article)
if tweet_data and tweet_data not in tweets:
tweets.append(tweet_data)
page.evaluate("window.scrollBy(0, 1200)")
time.sleep(2)
scroll_count += 1
browser.close()
return tweets[:max_tweets]
# Search for tweets about web scraping
results = scrape_twitter_search("web scraping python", max_tweets=20)
print(f"Found {len(results)} tweets about web scraping")
Puppeteer is ideal for JavaScript developers who want native browser control and easy integration with Node.js web applications.
npm install puppeteer cheerio
const puppeteer = require('puppeteer');
const cheerio = require('cheerio');
async function scrapeTrends() {
const browser = await puppeteer.launch({
headless: 'new',
args: ['--no-sandbox', '--disable-setuid-sandbox']
});
const page = await browser.newPage();
await page.setUserAgent(
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) ' +
'AppleWebKit/537.36 (KHTML, like Gecko) ' +
'Chrome/120.0.0.0 Safari/537.36'
);
await page.setViewport({ width: 1280, height: 900 });
// Navigate to the Explore page for trends
await page.goto('https://x.com/explore/tabs/trending', {
waitUntil: 'networkidle2',
timeout: 30000
});
await page.waitForTimeout(3000);
const html = await page.content();
const $ = cheerio.load(html);
const trends = [];
// Extract trending topics
$('[data-testid="trend"]').each((i, el) => {
const trendText = $(el).find('span').text();
const tweetCount = $(el).find('[dir="ltr"]').last().text();
if (trendText) {
trends.push({
rank: i + 1,
topic: trendText.trim(),
tweet_count: tweetCount.trim() || 'N/A'
});
}
});
await browser.close();
return trends;
}
async function scrapeUserProfile(username) {
const browser = await puppeteer.launch({ headless: 'new' });
const page = await browser.newPage();
await page.setUserAgent(
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) ' +
'AppleWebKit/537.36 Chrome/120.0.0.0 Safari/537.36'
);
await page.goto(`https://x.com/${username}`, {
waitUntil: 'networkidle2'
});
await page.waitForTimeout(3000);
const profile = await page.evaluate(() => {
const getName = () => {
const el = document.querySelector('[data-testid="UserName"]');
return el ? el.innerText.split('\n')[0] : '';
};
const getBio = () => {
const el = document.querySelector('[data-testid="UserDescription"]');
return el ? el.innerText : '';
};
const getStats = () => {
const links = document.querySelectorAll('a[href*="/followers"], a[href*="/following"]');
const stats = {};
links.forEach(link => {
const text = link.innerText;
if (link.href.includes('/following')) {
stats.following = text.split(' ')[0];
} else if (link.href.includes('/followers')) {
stats.followers = text.split(' ')[0];
}
});
return stats;
};
return {
name: getName(),
bio: getBio(),
...getStats()
};
});
await browser.close();
return { username, ...profile };
}
// Usage
(async () => {
console.log('--- Trending Topics ---');
const trends = await scrapeTrends();
trends.slice(0, 10).forEach(t =>
console.log(`#${t.rank}: ${t.topic} (${t.tweet_count})`)
);
console.log('\n--- User Profile ---');
const profile = await scrapeUserProfile('OpenAI');
console.log(JSON.stringify(profile, null, 2));
})();
Twitter/X's web app communicates with internal GraphQL API endpoints. These undocumented guest APIs can return structured JSON data without browser rendering — making them significantly faster than headless browser approaches.
import requests
import json
class TwitterGuestAPI:
"""Access Twitter/X data via guest API tokens."""
BASE_URL = "https://api.x.com"
BEARER_TOKEN = "AAAAAAAAAAAAAAAAAAAAANRILgAAAAAAnNwIzUejR" \
"COuH5E6I8xnZz4puTs%3D1Zv7ttfk8LF81IUq16cHjhLTvJu" \
"4FA33AGWWjCpTnA" # Public bearer token from web app
def __init__(self):
self.session = requests.Session()
self.session.headers.update({
"Authorization": f"Bearer {self.BEARER_TOKEN}",
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
"AppleWebKit/537.36 Chrome/120.0.0.0 Safari/537.36",
})
self.guest_token = self._get_guest_token()
self.session.headers["x-guest-token"] = self.guest_token
def _get_guest_token(self):
"""Activate a guest token for unauthenticated access."""
resp = self.session.post(f"{self.BASE_URL}/1.1/guest/activate.json")
return resp.json()["guest_token"]
def search_tweets(self, query, count=20):
"""Search tweets using the adaptive search endpoint."""
params = {
"q": query,
"count": count,
"tweet_search_mode": "live",
"query_source": "typed_query",
}
resp = self.session.get(
f"{self.BASE_URL}/2/search/adaptive.json",
params=params
)
data = resp.json()
return self._parse_search_results(data)
def get_user_tweets(self, user_id, count=20):
"""Fetch user timeline via GraphQL."""
variables = json.dumps({
"userId": user_id,
"count": count,
"includePromotedContent": False,
"withQuickPromoteEligibilityTweetFields": False,
})
features = json.dumps({
"rweb_lists_timeline_redesign_enabled": True,
"responsive_web_graphql_exclude_directive_enabled": True,
"verified_phone_label_enabled": False,
"responsive_web_graphql_timeline_navigation_enabled": True,
})
params = {"variables": variables, "features": features}
resp = self.session.get(
f"{self.BASE_URL}/graphql/V7H0Ap3_Hh2FyS75OCDO3Q/UserTweets",
params=params
)
return resp.json()
def _parse_search_results(self, data):
"""Parse adaptive search response into tweet objects."""
tweets = []
global_objects = data.get("globalObjects", {})
tweet_data = global_objects.get("tweets", {})
user_data = global_objects.get("users", {})
for tweet_id, tweet in tweet_data.items():
user = user_data.get(str(tweet["user_id_str"]), {})
tweets.append({
"id": tweet_id,
"text": tweet["full_text"],
"username": user.get("screen_name", ""),
"display_name": user.get("name", ""),
"created_at": tweet["created_at"],
"likes": tweet["favorite_count"],
"retweets": tweet["retweet_count"],
"replies": tweet["reply_count"],
})
return sorted(tweets, key=lambda x: x["likes"], reverse=True)
# Usage
api = TwitterGuestAPI()
tweets = api.search_tweets("web scraping API", count=10)
for t in tweets:
print(f"@{t['username']}: {t['text'][:100]}...")
print(f" ❤️ {t['likes']} 🔁 {t['retweets']}")
print()
For production applications, Mantis provides the most reliable way to extract Twitter/X data. One API call handles JavaScript rendering, anti-bot bypassing, proxy rotation, and structured data extraction.
import requests
# Scrape a Twitter/X profile page
response = requests.post(
"https://api.mantisapi.com/v1/scrape",
headers={"x-api-key": "YOUR_API_KEY"},
json={
"url": "https://x.com/OpenAI",
"render_js": True,
"wait_for": "article[data-testid='tweet']",
"extract": {
"profile": {
"name": "[data-testid='UserName'] span:first-child",
"bio": "[data-testid='UserDescription']",
},
"tweets": {
"_selector": "article[data-testid='tweet']",
"_type": "list",
"text": "[data-testid='tweetText']",
"time": "time@datetime",
}
}
}
)
data = response.json()
print(f"Profile: {data['extracted']['profile']['name']}")
for tweet in data['extracted']['tweets'][:5]:
print(f" - {tweet['text'][:80]}...")
print(f" Posted: {tweet['time']}")
Extract Twitter/X data with a single API call. No headless browsers, no proxy management, no broken selectors.
View Pricing Get Started FreeTwitter/X has some of the most aggressive anti-scraping measures on the web. Understanding them is essential for any scraping approach:
Since 2023, Twitter/X requires authentication to view most content. Unauthenticated visitors see a login prompt after viewing a few tweets. This is the single biggest barrier to scraping — and the reason simple HTTP-based scraping no longer works.
Twitter/X rate limits by both IP address and account. Verified accounts get higher limits (~6,000 tweets/day read), while unverified accounts are capped at ~600 tweets/day. IP-based limits kick in even faster for unauthenticated requests.
Twitter/X checks browser characteristics — canvas fingerprint, WebGL renderer, installed fonts, screen resolution, timezone, and language headers. Mismatches between these signals flag automated browsers.
Suspected bots receive JavaScript challenge pages that require full browser execution. Simple HTTP clients cannot pass these challenges, which is why headless browsers or API solutions are necessary.
Twitter/X monitors mouse movements, scroll patterns, click timing, and navigation sequences. Perfectly uniform scrolling (common in scrapers) triggers detection. Adding random delays and natural scroll patterns helps avoid this.
Internal GraphQL endpoints are monitored for unusual request patterns. Sudden spikes in requests from a single IP or account trigger temporary blocks or CAPTCHA challenges.
| Data Type | Fields | Auth Required? |
|---|---|---|
| Tweets | Text, timestamp, likes, retweets, replies, media URLs, hashtags, mentions | Mostly yes |
| Profiles | Name, bio, followers/following count, join date, location, verified status | Partial |
| Search | Tweets matching keywords, hashtags, from/to specific users, date ranges | Yes |
| Trends | Trending topics, tweet counts, category, location-specific trends | Yes |
| Threads | Full conversation chains, reply trees, quoted tweets | Yes |
| Lists | List members, list tweets, public lists for any user | Yes |
Track what people say about your brand in real time. Combine Twitter/X scraping with AI sentiment analysis to detect PR crises before they escalate.
import requests
from datetime import datetime
def monitor_brand_sentiment(brand_name, interval_hours=1):
"""Monitor brand mentions and classify sentiment."""
# Scrape recent mentions via Mantis
response = requests.post(
"https://api.mantisapi.com/v1/scrape",
headers={"x-api-key": "YOUR_API_KEY"},
json={
"url": f"https://x.com/search?q={brand_name}&f=live",
"render_js": True,
"wait_for": "article[data-testid='tweet']",
"scroll_count": 3,
"extract": {
"tweets": {
"_selector": "article[data-testid='tweet']",
"_type": "list",
"text": "[data-testid='tweetText']",
"time": "time@datetime",
"likes": "[data-testid='like'] span",
}
}
}
)
tweets = response.json()["extracted"]["tweets"]
# Simple sentiment classification
positive_words = {"love", "great", "amazing", "best", "awesome", "excellent"}
negative_words = {"hate", "terrible", "worst", "awful", "broken", "scam"}
results = {"positive": 0, "negative": 0, "neutral": 0, "alerts": []}
for tweet in tweets:
text_lower = tweet["text"].lower()
pos = sum(1 for w in positive_words if w in text_lower)
neg = sum(1 for w in negative_words if w in text_lower)
if neg > pos:
results["negative"] += 1
# Alert on high-engagement negative tweets
likes = int(tweet.get("likes", "0").replace(",", "") or 0)
if likes > 100:
results["alerts"].append(tweet)
elif pos > neg:
results["positive"] += 1
else:
results["neutral"] += 1
return results
# Run monitoring
sentiment = monitor_brand_sentiment("YourBrand")
print(f"Positive: {sentiment['positive']}")
print(f"Negative: {sentiment['negative']}")
print(f"Neutral: {sentiment['neutral']}")
if sentiment["alerts"]:
print(f"⚠️ {len(sentiment['alerts'])} high-engagement negative mentions!")
Monitor competitor social media campaigns — track their posting frequency, engagement rates, and top-performing content.
async function trackCompetitor(username) {
const puppeteer = require('puppeteer');
const browser = await puppeteer.launch({ headless: 'new' });
const page = await browser.newPage();
await page.setUserAgent(
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) ' +
'AppleWebKit/537.36 Chrome/120.0.0.0 Safari/537.36'
);
await page.goto(`https://x.com/${username}`, {
waitUntil: 'networkidle2'
});
await page.waitForTimeout(3000);
// Scroll and collect tweets
const tweets = [];
for (let i = 0; i < 5; i++) {
const newTweets = await page.evaluate(() => {
const articles = document.querySelectorAll('article[data-testid="tweet"]');
return Array.from(articles).map(article => {
const text = article.querySelector('[data-testid="tweetText"]');
const time = article.querySelector('time');
const like = article.querySelector('[data-testid="like"]');
const retweet = article.querySelector('[data-testid="retweet"]');
return {
text: text?.innerText || '',
time: time?.getAttribute('datetime') || '',
likes: like?.innerText || '0',
retweets: retweet?.innerText || '0',
};
});
});
tweets.push(...newTweets);
await page.evaluate(() => window.scrollBy(0, 1000));
await page.waitForTimeout(2000);
}
await browser.close();
// Deduplicate and analyze
const unique = [...new Map(tweets.map(t => [t.text, t])).values()];
const report = {
username,
total_tweets: unique.length,
avg_likes: Math.round(
unique.reduce((sum, t) => sum + parseCount(t.likes), 0) / unique.length
),
avg_retweets: Math.round(
unique.reduce((sum, t) => sum + parseCount(t.retweets), 0) / unique.length
),
top_tweet: unique.sort((a, b) =>
parseCount(b.likes) - parseCount(a.likes)
)[0],
};
return report;
}
function parseCount(str) {
str = (str || '0').toUpperCase().trim();
if (str.includes('K')) return parseFloat(str) * 1000;
if (str.includes('M')) return parseFloat(str) * 1000000;
return parseInt(str) || 0;
}
// Track a competitor
trackCompetitor('competitor_handle').then(report => {
console.log(`📊 ${report.username} Analysis:`);
console.log(` Tweets analyzed: ${report.total_tweets}`);
console.log(` Avg likes: ${report.avg_likes}`);
console.log(` Avg retweets: ${report.avg_retweets}`);
console.log(` Top tweet: ${report.top_tweet.text.slice(0, 80)}...`);
});
Build an AI agent that monitors Twitter/X for emerging trends in your industry and generates actionable reports.
import requests
import json
def trend_intelligence_agent(topics, api_key):
"""AI agent that monitors Twitter/X trends and generates insights."""
all_data = {}
for topic in topics:
# Fetch tweets via Mantis
response = requests.post(
"https://api.mantisapi.com/v1/scrape",
headers={"x-api-key": api_key},
json={
"url": f"https://x.com/search?q={topic}&f=live",
"render_js": True,
"wait_for": "article[data-testid='tweet']",
"scroll_count": 2,
"extract": {
"tweets": {
"_selector": "article[data-testid='tweet']",
"_type": "list",
"text": "[data-testid='tweetText']",
"likes": "[data-testid='like'] span",
}
}
}
)
tweets = response.json().get("extracted", {}).get("tweets", [])
all_data[topic] = {
"count": len(tweets),
"total_engagement": sum(
int(t.get("likes", "0").replace(",", "") or 0) for t in tweets
),
"sample_tweets": [t["text"][:120] for t in tweets[:3]],
}
# Generate report
report = "🔍 Twitter/X Trend Intelligence Report\n"
report += "=" * 40 + "\n\n"
for topic, data in sorted(
all_data.items(),
key=lambda x: x[1]["total_engagement"],
reverse=True
):
report += f"📌 {topic}\n"
report += f" Tweets found: {data['count']}\n"
report += f" Total engagement: {data['total_engagement']}\n"
for sample in data["sample_tweets"]:
report += f" → {sample}...\n"
report += "\n"
return report
# Monitor AI industry trends
topics = ["AI agents", "LLM fine-tuning", "RAG pipeline", "AI coding assistant"]
report = trend_intelligence_agent(topics, "YOUR_API_KEY")
print(report)
| Feature | Twitter/X API (Basic) | DIY Scraping | Mantis API |
|---|---|---|---|
| Cost | $100/mo (10K reads) | Server + proxy costs | $29/mo (5K requests) |
| Cost per 1K reads | $10.00 | $1-5 (proxies) | $5.80 |
| Setup time | Hours (approval needed) | Days | Minutes |
| Data format | Structured JSON | Raw HTML → parse | Structured JSON |
| Rate limits | Strict (per tier) | IP-based blocks | Per plan |
| JS rendering | N/A (API) | You manage | Included |
| Anti-bot handling | N/A (API) | You manage | Included |
| Maintenance | API version updates | Constant (DOM changes) | Zero |
| Historical data | Enterprise only ($42K+) | Limited by scrolling | Current pages |
| Reliability | High | Low-medium | High |
Mantis handles JavaScript rendering, anti-bot measures, and proxy rotation. Get structured data with a single API call.
View Pricing Get Started FreeTwitter/X scraping exists in a complex legal landscape. Here's what you need to know:
Disclaimer: This article is for educational purposes only. Web scraping may violate Twitter/X's Terms of Service. Always ensure your scraping activities comply with applicable laws and regulations in your jurisdiction.
See the structured FAQ data above for common questions about scraping Twitter/X. Key points:
Now that you know how to scrape Twitter/X, explore more scraping guides: