Is it legal to scrape Twitter/X?

Scraping publicly available Twitter/X data is generally legal in the US under the hiQ v. LinkedIn precedent. However, Twitter's Terms of Service prohibit unauthorized scraping, and they actively enforce through rate limiting and legal action. Always scrape only public data, respect rate limits, and consult legal counsel for commercial use.

How do I scrape tweets without the Twitter API?

You can use a headless browser like Playwright or Puppeteer to load Twitter/X pages and extract tweet data from the rendered DOM. Alternatively, use a web scraping API like Mantis that handles authentication, JavaScript rendering, and anti-bot bypassing automatically.

Why is the Twitter/X API so expensive?

Since Elon Musk's acquisition, Twitter/X has drastically increased API pricing. The Basic tier costs $100/month for just 10,000 tweet reads. The Pro tier is $5,000/month. Enterprise pricing starts at $42,000/month. This has pushed many developers toward scraping alternatives.

Can I scrape Twitter/X with Python?

Yes. Use Playwright for Python to render Twitter/X pages in a headless browser, then extract data from the DOM. Libraries like BeautifulSoup can parse the HTML. For production use, a web scraping API is more reliable since Twitter/X aggressively blocks scrapers.

How does Twitter/X detect and block scrapers?

Twitter/X uses multiple anti-bot measures: rate limiting by IP and account, browser fingerprinting, CAPTCHAs, login walls for most content, JavaScript challenge pages, behavioral analysis (scroll patterns, mouse movements), and API endpoint monitoring. They also serve different content to suspected bots.

What data can I extract from Twitter/X?

Common data points include: tweet text, timestamps, likes/retweets/replies counts, user profiles (bio, followers, following), hashtag trends, search results, thread conversations, and media URLs. Some data requires authentication to access.

How to Scrape Twitter (X) Data in 2026: Tweets, Profiles & Trends

Published March 30, 2026 · 18 min read

Why Scrape Twitter/X?
The Twitter/X API Pricing Problem
4 Methods to Scrape Twitter/X
Method 1: Python + Playwright
Method 2: Node.js + Puppeteer
Method 3: Twitter/X Guest API Endpoints
Method 4: Mantis Web Scraping API
Twitter/X Anti-Bot Defenses
What Data Can You Extract?
3 Real-World Use Cases
Twitter/X API vs Scraping vs Mantis
Legal Considerations
FAQ

Why Scrape Twitter/X?

Twitter/X remains one of the most valuable real-time data sources on the internet. With 550+ million monthly active users, it's where news breaks, trends emerge, and public sentiment forms in real time. For developers, researchers, and businesses, Twitter/X data powers:

Sentiment analysis — Track brand perception, stock sentiment, political opinions
Trend monitoring — Detect emerging topics before they go mainstream
Competitive intelligence — Monitor competitor mentions, campaigns, and engagement
Research — Academic studies on social behavior, misinformation, public health
Lead generation — Find potential customers discussing relevant problems
AI training data — Fine-tune language models on real conversational data

The problem? Getting this data has become extremely expensive since the 2023 API pricing changes. That's why scraping has become the go-to approach for most developers.

The Twitter/X API Pricing Problem

Since Elon Musk's acquisition, Twitter/X API pricing has become prohibitively expensive for most developers:

Tier	Price	Tweet Reads	Tweet Posts	Cost per 1K Reads
Free	$0/mo	0 (write-only)	1,500/mo	N/A
Basic	$100/mo	10,000/mo	3,000/mo	$10.00
Pro	$5,000/mo	1,000,000/mo	300,000/mo	$5.00
Enterprise	$42,000+/mo	50,000,000+/mo	Negotiable	~$0.84

At $10 per 1,000 tweet reads on the Basic tier, even a simple sentiment analysis project becomes unaffordable. The free tier doesn't even allow reading tweets. This pricing gap has made web scraping the practical choice for most Twitter/X data extraction use cases.

Bottom line: The Twitter/X API free tier is write-only. Reading any data requires $100/month minimum — and you only get 10,000 tweets. For most projects, scraping is 10-100x more cost-effective.

4 Methods to Scrape Twitter/X

Here are four approaches to extract data from Twitter/X, from hands-on browser automation to production-ready API solutions:

Python + Playwright — Headless browser, full JS rendering, good for prototyping
Node.js + Puppeteer — JavaScript-native browser automation, great for real-time scraping
Twitter/X Guest API Endpoints — Undocumented internal APIs, fast but fragile
Mantis Web Scraping API — One-call solution, handles anti-bot, production-ready

Method 1: Python + Playwright

Playwright is the best choice for scraping Twitter/X with Python because it handles JavaScript rendering, which Twitter/X requires for all content.

Setup

pip install playwright beautifulsoup4
playwright install chromium

Scrape a User's Timeline

from playwright.sync_api import sync_playwright
from bs4 import BeautifulSoup
import json
import time

def scrape_twitter_timeline(username, max_tweets=20):
    """Scrape tweets from a public Twitter/X profile."""
    tweets = []

    with sync_playwright() as p:
        browser = p.chromium.launch(headless=True)
        context = browser.new_context(
            viewport={"width": 1280, "height": 900},
            user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
                       "AppleWebKit/537.36 (KHTML, like Gecko) "
                       "Chrome/120.0.0.0 Safari/537.36"
        )
        page = context.new_page()

        # Navigate to profile
        page.goto(f"https://x.com/{username}", wait_until="networkidle")
        time.sleep(3)  # Wait for dynamic content

        # Scroll to load more tweets
        scroll_count = 0
        while len(tweets) < max_tweets and scroll_count < 10:
            # Extract tweet elements
            html = page.content()
            soup = BeautifulSoup(html, "html.parser")

            for article in soup.find_all("article", {"data-testid": "tweet"}):
                tweet_data = extract_tweet(article)
                if tweet_data and tweet_data not in tweets:
                    tweets.append(tweet_data)

            # Scroll down
            page.evaluate("window.scrollBy(0, 1000)")
            time.sleep(2)
            scroll_count += 1

        browser.close()

    return tweets[:max_tweets]


def extract_tweet(article):
    """Extract structured data from a tweet article element."""
    try:
        # Tweet text
        text_el = article.find("div", {"data-testid": "tweetText"})
        text = text_el.get_text(strip=True) if text_el else ""

        # Username and handle
        user_links = article.find_all("a", href=True)
        username = ""
        display_name = ""
        for link in user_links:
            href = link.get("href", "")
            if href.startswith("/") and not href.startswith("/i/"):
                username = href.strip("/")
                display_name = link.get_text(strip=True)
                break

        # Engagement metrics
        likes = get_metric(article, "like")
        retweets = get_metric(article, "retweet")
        replies = get_metric(article, "reply")

        # Timestamp
        time_el = article.find("time")
        timestamp = time_el.get("datetime", "") if time_el else ""

        return {
            "text": text,
            "username": username,
            "display_name": display_name,
            "timestamp": timestamp,
            "likes": likes,
            "retweets": retweets,
            "replies": replies
        }
    except Exception:
        return None


def get_metric(article, metric_type):
    """Extract engagement metric (likes, retweets, replies)."""
    el = article.find("button", {"data-testid": metric_type})
    if el:
        text = el.get_text(strip=True)
        return parse_count(text) if text else 0
    return 0


def parse_count(text):
    """Parse Twitter-style count strings (1.2K, 3.5M, etc.)."""
    text = text.strip().upper()
    if "K" in text:
        return int(float(text.replace("K", "")) * 1000)
    if "M" in text:
        return int(float(text.replace("M", "")) * 1000000)
    try:
        return int(text)
    except ValueError:
        return 0


# Usage
tweets = scrape_twitter_timeline("elonmusk", max_tweets=10)
for t in tweets:
    print(f"@{t['username']}: {t['text'][:80]}...")
    print(f"  ❤️ {t['likes']}  🔁 {t['retweets']}  💬 {t['replies']}")
    print()

Scrape Twitter/X Search Results

def scrape_twitter_search(query, max_tweets=30):
    """Scrape tweets matching a search query."""
    import urllib.parse
    encoded_query = urllib.parse.quote(query)
    tweets = []

    with sync_playwright() as p:
        browser = p.chromium.launch(headless=True)
        context = browser.new_context(
            viewport={"width": 1280, "height": 900},
            user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
                       "AppleWebKit/537.36 Chrome/120.0.0.0 Safari/537.36"
        )
        page = context.new_page()

        # Twitter search URL (Latest tab for chronological results)
        url = f"https://x.com/search?q={encoded_query}&src=typed_query&f=live"
        page.goto(url, wait_until="networkidle")
        time.sleep(3)

        scroll_count = 0
        while len(tweets) < max_tweets and scroll_count < 15:
            html = page.content()
            soup = BeautifulSoup(html, "html.parser")

            for article in soup.find_all("article", {"data-testid": "tweet"}):
                tweet_data = extract_tweet(article)
                if tweet_data and tweet_data not in tweets:
                    tweets.append(tweet_data)

            page.evaluate("window.scrollBy(0, 1200)")
            time.sleep(2)
            scroll_count += 1

        browser.close()

    return tweets[:max_tweets]


# Search for tweets about web scraping
results = scrape_twitter_search("web scraping python", max_tweets=20)
print(f"Found {len(results)} tweets about web scraping")

Method 2: Node.js + Puppeteer

Puppeteer is ideal for JavaScript developers who want native browser control and easy integration with Node.js web applications.

Setup

npm install puppeteer cheerio

Scrape Trending Topics

const puppeteer = require('puppeteer');
const cheerio = require('cheerio');

async function scrapeTrends() {
  const browser = await puppeteer.launch({
    headless: 'new',
    args: ['--no-sandbox', '--disable-setuid-sandbox']
  });

  const page = await browser.newPage();

  await page.setUserAgent(
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) ' +
    'AppleWebKit/537.36 (KHTML, like Gecko) ' +
    'Chrome/120.0.0.0 Safari/537.36'
  );

  await page.setViewport({ width: 1280, height: 900 });

  // Navigate to the Explore page for trends
  await page.goto('https://x.com/explore/tabs/trending', {
    waitUntil: 'networkidle2',
    timeout: 30000
  });

  await page.waitForTimeout(3000);

  const html = await page.content();
  const $ = cheerio.load(html);

  const trends = [];

  // Extract trending topics
  $('[data-testid="trend"]').each((i, el) => {
    const trendText = $(el).find('span').text();
    const tweetCount = $(el).find('[dir="ltr"]').last().text();

    if (trendText) {
      trends.push({
        rank: i + 1,
        topic: trendText.trim(),
        tweet_count: tweetCount.trim() || 'N/A'
      });
    }
  });

  await browser.close();
  return trends;
}

async function scrapeUserProfile(username) {
  const browser = await puppeteer.launch({ headless: 'new' });
  const page = await browser.newPage();

  await page.setUserAgent(
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) ' +
    'AppleWebKit/537.36 Chrome/120.0.0.0 Safari/537.36'
  );

  await page.goto(`https://x.com/${username}`, {
    waitUntil: 'networkidle2'
  });

  await page.waitForTimeout(3000);

  const profile = await page.evaluate(() => {
    const getName = () => {
      const el = document.querySelector('[data-testid="UserName"]');
      return el ? el.innerText.split('\n')[0] : '';
    };

    const getBio = () => {
      const el = document.querySelector('[data-testid="UserDescription"]');
      return el ? el.innerText : '';
    };

    const getStats = () => {
      const links = document.querySelectorAll('a[href*="/followers"], a[href*="/following"]');
      const stats = {};
      links.forEach(link => {
        const text = link.innerText;
        if (link.href.includes('/following')) {
          stats.following = text.split(' ')[0];
        } else if (link.href.includes('/followers')) {
          stats.followers = text.split(' ')[0];
        }
      });
      return stats;
    };

    return {
      name: getName(),
      bio: getBio(),
      ...getStats()
    };
  });

  await browser.close();
  return { username, ...profile };
}

// Usage
(async () => {
  console.log('--- Trending Topics ---');
  const trends = await scrapeTrends();
  trends.slice(0, 10).forEach(t =>
    console.log(`#${t.rank}: ${t.topic} (${t.tweet_count})`)
  );

  console.log('\n--- User Profile ---');
  const profile = await scrapeUserProfile('OpenAI');
  console.log(JSON.stringify(profile, null, 2));
})();

Method 3: Twitter/X Guest API Endpoints

Twitter/X's web app communicates with internal GraphQL API endpoints. These undocumented guest APIs can return structured JSON data without browser rendering — making them significantly faster than headless browser approaches.

⚠️ Warning: These endpoints are undocumented, change frequently, and Twitter/X actively monitors for unauthorized usage. They should only be used for research and personal projects. For production use, consider the Mantis API approach below.

import requests
import json

class TwitterGuestAPI:
    """Access Twitter/X data via guest API tokens."""

    BASE_URL = "https://api.x.com"
    BEARER_TOKEN = "AAAAAAAAAAAAAAAAAAAAANRILgAAAAAAnNwIzUejR"  \
                   "COuH5E6I8xnZz4puTs%3D1Zv7ttfk8LF81IUq16cHjhLTvJu"  \
                   "4FA33AGWWjCpTnA"  # Public bearer token from web app

    def __init__(self):
        self.session = requests.Session()
        self.session.headers.update({
            "Authorization": f"Bearer {self.BEARER_TOKEN}",
            "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
                          "AppleWebKit/537.36 Chrome/120.0.0.0 Safari/537.36",
        })
        self.guest_token = self._get_guest_token()
        self.session.headers["x-guest-token"] = self.guest_token

    def _get_guest_token(self):
        """Activate a guest token for unauthenticated access."""
        resp = self.session.post(f"{self.BASE_URL}/1.1/guest/activate.json")
        return resp.json()["guest_token"]

    def search_tweets(self, query, count=20):
        """Search tweets using the adaptive search endpoint."""
        params = {
            "q": query,
            "count": count,
            "tweet_search_mode": "live",
            "query_source": "typed_query",
        }
        resp = self.session.get(
            f"{self.BASE_URL}/2/search/adaptive.json",
            params=params
        )
        data = resp.json()
        return self._parse_search_results(data)

    def get_user_tweets(self, user_id, count=20):
        """Fetch user timeline via GraphQL."""
        variables = json.dumps({
            "userId": user_id,
            "count": count,
            "includePromotedContent": False,
            "withQuickPromoteEligibilityTweetFields": False,
        })
        features = json.dumps({
            "rweb_lists_timeline_redesign_enabled": True,
            "responsive_web_graphql_exclude_directive_enabled": True,
            "verified_phone_label_enabled": False,
            "responsive_web_graphql_timeline_navigation_enabled": True,
        })
        params = {"variables": variables, "features": features}
        resp = self.session.get(
            f"{self.BASE_URL}/graphql/V7H0Ap3_Hh2FyS75OCDO3Q/UserTweets",
            params=params
        )
        return resp.json()

    def _parse_search_results(self, data):
        """Parse adaptive search response into tweet objects."""
        tweets = []
        global_objects = data.get("globalObjects", {})
        tweet_data = global_objects.get("tweets", {})
        user_data = global_objects.get("users", {})

        for tweet_id, tweet in tweet_data.items():
            user = user_data.get(str(tweet["user_id_str"]), {})
            tweets.append({
                "id": tweet_id,
                "text": tweet["full_text"],
                "username": user.get("screen_name", ""),
                "display_name": user.get("name", ""),
                "created_at": tweet["created_at"],
                "likes": tweet["favorite_count"],
                "retweets": tweet["retweet_count"],
                "replies": tweet["reply_count"],
            })

        return sorted(tweets, key=lambda x: x["likes"], reverse=True)


# Usage
api = TwitterGuestAPI()
tweets = api.search_tweets("web scraping API", count=10)
for t in tweets:
    print(f"@{t['username']}: {t['text'][:100]}...")
    print(f"  ❤️ {t['likes']}  🔁 {t['retweets']}")
    print()

Method 4: Mantis Web Scraping API

For production applications, Mantis provides the most reliable way to extract Twitter/X data. One API call handles JavaScript rendering, anti-bot bypassing, proxy rotation, and structured data extraction.

import requests

# Scrape a Twitter/X profile page
response = requests.post(
    "https://api.mantisapi.com/v1/scrape",
    headers={"x-api-key": "YOUR_API_KEY"},
    json={
        "url": "https://x.com/OpenAI",
        "render_js": True,
        "wait_for": "article[data-testid='tweet']",
        "extract": {
            "profile": {
                "name": "[data-testid='UserName'] span:first-child",
                "bio": "[data-testid='UserDescription']",
            },
            "tweets": {
                "_selector": "article[data-testid='tweet']",
                "_type": "list",
                "text": "[data-testid='tweetText']",
                "time": "time@datetime",
            }
        }
    }
)

data = response.json()
print(f"Profile: {data['extracted']['profile']['name']}")
for tweet in data['extracted']['tweets'][:5]:
    print(f"  - {tweet['text'][:80]}...")
    print(f"    Posted: {tweet['time']}")

Why Mantis for Twitter/X?

Anti-bot handling — Automatic proxy rotation, fingerprint randomization, CAPTCHA solving
JavaScript rendering — Full Chromium rendering for Twitter/X's SPA
Structured extraction — CSS selector-based data extraction in one call
No maintenance — When Twitter/X changes their DOM, Mantis adapts. Your code stays the same.
10x cheaper than the API — Starting at $0.005 per request vs $0.01 per tweet read on Twitter's Basic tier

Skip the Anti-Bot Cat-and-Mouse Game

Extract Twitter/X data with a single API call. No headless browsers, no proxy management, no broken selectors.

View Pricing Get Started Free

Twitter/X Anti-Bot Defenses

Twitter/X has some of the most aggressive anti-scraping measures on the web. Understanding them is essential for any scraping approach:

1. Login Wall

Since 2023, Twitter/X requires authentication to view most content. Unauthenticated visitors see a login prompt after viewing a few tweets. This is the single biggest barrier to scraping — and the reason simple HTTP-based scraping no longer works.

2. Rate Limiting

Twitter/X rate limits by both IP address and account. Verified accounts get higher limits (~6,000 tweets/day read), while unverified accounts are capped at ~600 tweets/day. IP-based limits kick in even faster for unauthenticated requests.

3. Browser Fingerprinting

Twitter/X checks browser characteristics — canvas fingerprint, WebGL renderer, installed fonts, screen resolution, timezone, and language headers. Mismatches between these signals flag automated browsers.

4. JavaScript Challenges

Suspected bots receive JavaScript challenge pages that require full browser execution. Simple HTTP clients cannot pass these challenges, which is why headless browsers or API solutions are necessary.

5. Behavioral Analysis

Twitter/X monitors mouse movements, scroll patterns, click timing, and navigation sequences. Perfectly uniform scrolling (common in scrapers) triggers detection. Adding random delays and natural scroll patterns helps avoid this.

6. API Endpoint Monitoring

Internal GraphQL endpoints are monitored for unusual request patterns. Sudden spikes in requests from a single IP or account trigger temporary blocks or CAPTCHA challenges.

What Data Can You Extract?

Data Type	Fields	Auth Required?
Tweets	Text, timestamp, likes, retweets, replies, media URLs, hashtags, mentions	Mostly yes
Profiles	Name, bio, followers/following count, join date, location, verified status	Partial
Search	Tweets matching keywords, hashtags, from/to specific users, date ranges	Yes
Trends	Trending topics, tweet counts, category, location-specific trends	Yes
Threads	Full conversation chains, reply trees, quoted tweets	Yes
Lists	List members, list tweets, public lists for any user	Yes

3 Real-World Use Cases

Use Case 1: Brand Sentiment Monitor

Track what people say about your brand in real time. Combine Twitter/X scraping with AI sentiment analysis to detect PR crises before they escalate.

import requests
from datetime import datetime

def monitor_brand_sentiment(brand_name, interval_hours=1):
    """Monitor brand mentions and classify sentiment."""

    # Scrape recent mentions via Mantis
    response = requests.post(
        "https://api.mantisapi.com/v1/scrape",
        headers={"x-api-key": "YOUR_API_KEY"},
        json={
            "url": f"https://x.com/search?q={brand_name}&f=live",
            "render_js": True,
            "wait_for": "article[data-testid='tweet']",
            "scroll_count": 3,
            "extract": {
                "tweets": {
                    "_selector": "article[data-testid='tweet']",
                    "_type": "list",
                    "text": "[data-testid='tweetText']",
                    "time": "time@datetime",
                    "likes": "[data-testid='like'] span",
                }
            }
        }
    )

    tweets = response.json()["extracted"]["tweets"]

    # Simple sentiment classification
    positive_words = {"love", "great", "amazing", "best", "awesome", "excellent"}
    negative_words = {"hate", "terrible", "worst", "awful", "broken", "scam"}

    results = {"positive": 0, "negative": 0, "neutral": 0, "alerts": []}

    for tweet in tweets:
        text_lower = tweet["text"].lower()
        pos = sum(1 for w in positive_words if w in text_lower)
        neg = sum(1 for w in negative_words if w in text_lower)

        if neg > pos:
            results["negative"] += 1
            # Alert on high-engagement negative tweets
            likes = int(tweet.get("likes", "0").replace(",", "") or 0)
            if likes > 100:
                results["alerts"].append(tweet)
        elif pos > neg:
            results["positive"] += 1
        else:
            results["neutral"] += 1

    return results


# Run monitoring
sentiment = monitor_brand_sentiment("YourBrand")
print(f"Positive: {sentiment['positive']}")
print(f"Negative: {sentiment['negative']}")
print(f"Neutral: {sentiment['neutral']}")
if sentiment["alerts"]:
    print(f"⚠️ {len(sentiment['alerts'])} high-engagement negative mentions!")

Use Case 2: Competitor Campaign Tracker

Monitor competitor social media campaigns — track their posting frequency, engagement rates, and top-performing content.

async function trackCompetitor(username) {
  const puppeteer = require('puppeteer');
  const browser = await puppeteer.launch({ headless: 'new' });
  const page = await browser.newPage();

  await page.setUserAgent(
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) ' +
    'AppleWebKit/537.36 Chrome/120.0.0.0 Safari/537.36'
  );

  await page.goto(`https://x.com/${username}`, {
    waitUntil: 'networkidle2'
  });
  await page.waitForTimeout(3000);

  // Scroll and collect tweets
  const tweets = [];
  for (let i = 0; i < 5; i++) {
    const newTweets = await page.evaluate(() => {
      const articles = document.querySelectorAll('article[data-testid="tweet"]');
      return Array.from(articles).map(article => {
        const text = article.querySelector('[data-testid="tweetText"]');
        const time = article.querySelector('time');
        const like = article.querySelector('[data-testid="like"]');
        const retweet = article.querySelector('[data-testid="retweet"]');

        return {
          text: text?.innerText || '',
          time: time?.getAttribute('datetime') || '',
          likes: like?.innerText || '0',
          retweets: retweet?.innerText || '0',
        };
      });
    });

    tweets.push(...newTweets);
    await page.evaluate(() => window.scrollBy(0, 1000));
    await page.waitForTimeout(2000);
  }

  await browser.close();

  // Deduplicate and analyze
  const unique = [...new Map(tweets.map(t => [t.text, t])).values()];

  const report = {
    username,
    total_tweets: unique.length,
    avg_likes: Math.round(
      unique.reduce((sum, t) => sum + parseCount(t.likes), 0) / unique.length
    ),
    avg_retweets: Math.round(
      unique.reduce((sum, t) => sum + parseCount(t.retweets), 0) / unique.length
    ),
    top_tweet: unique.sort((a, b) =>
      parseCount(b.likes) - parseCount(a.likes)
    )[0],
  };

  return report;
}

function parseCount(str) {
  str = (str || '0').toUpperCase().trim();
  if (str.includes('K')) return parseFloat(str) * 1000;
  if (str.includes('M')) return parseFloat(str) * 1000000;
  return parseInt(str) || 0;
}

// Track a competitor
trackCompetitor('competitor_handle').then(report => {
  console.log(`📊 ${report.username} Analysis:`);
  console.log(`   Tweets analyzed: ${report.total_tweets}`);
  console.log(`   Avg likes: ${report.avg_likes}`);
  console.log(`   Avg retweets: ${report.avg_retweets}`);
  console.log(`   Top tweet: ${report.top_tweet.text.slice(0, 80)}...`);
});

Use Case 3: AI Agent Trend Intelligence

Build an AI agent that monitors Twitter/X for emerging trends in your industry and generates actionable reports.

import requests
import json

def trend_intelligence_agent(topics, api_key):
    """AI agent that monitors Twitter/X trends and generates insights."""

    all_data = {}

    for topic in topics:
        # Fetch tweets via Mantis
        response = requests.post(
            "https://api.mantisapi.com/v1/scrape",
            headers={"x-api-key": api_key},
            json={
                "url": f"https://x.com/search?q={topic}&f=live",
                "render_js": True,
                "wait_for": "article[data-testid='tweet']",
                "scroll_count": 2,
                "extract": {
                    "tweets": {
                        "_selector": "article[data-testid='tweet']",
                        "_type": "list",
                        "text": "[data-testid='tweetText']",
                        "likes": "[data-testid='like'] span",
                    }
                }
            }
        )

        tweets = response.json().get("extracted", {}).get("tweets", [])
        all_data[topic] = {
            "count": len(tweets),
            "total_engagement": sum(
                int(t.get("likes", "0").replace(",", "") or 0) for t in tweets
            ),
            "sample_tweets": [t["text"][:120] for t in tweets[:3]],
        }

    # Generate report
    report = "🔍 Twitter/X Trend Intelligence Report\n"
    report += "=" * 40 + "\n\n"

    for topic, data in sorted(
        all_data.items(),
        key=lambda x: x[1]["total_engagement"],
        reverse=True
    ):
        report += f"📌 {topic}\n"
        report += f"   Tweets found: {data['count']}\n"
        report += f"   Total engagement: {data['total_engagement']}\n"
        for sample in data["sample_tweets"]:
            report += f"   → {sample}...\n"
        report += "\n"

    return report


# Monitor AI industry trends
topics = ["AI agents", "LLM fine-tuning", "RAG pipeline", "AI coding assistant"]
report = trend_intelligence_agent(topics, "YOUR_API_KEY")
print(report)

Twitter/X API vs Scraping vs Mantis

Feature	Twitter/X API (Basic)	DIY Scraping	Mantis API
Cost	$100/mo (10K reads)	Server + proxy costs	$29/mo (5K requests)
Cost per 1K reads	$10.00	$1-5 (proxies)	$5.80
Setup time	Hours (approval needed)	Days	Minutes
Data format	Structured JSON	Raw HTML → parse	Structured JSON
Rate limits	Strict (per tier)	IP-based blocks	Per plan
JS rendering	N/A (API)	You manage	Included
Anti-bot handling	N/A (API)	You manage	Included
Maintenance	API version updates	Constant (DOM changes)	Zero
Historical data	Enterprise only ($42K+)	Limited by scrolling	Current pages
Reliability	High	Low-medium	High

Extract Twitter/X Data Without the $100/mo API Tax

Mantis handles JavaScript rendering, anti-bot measures, and proxy rotation. Get structured data with a single API call.

View Pricing Get Started Free

Legal Considerations

Twitter/X scraping exists in a complex legal landscape. Here's what you need to know:

Key Legal Precedents

hiQ Labs v. LinkedIn (2022) — The Ninth Circuit ruled that scraping publicly accessible data does not violate the Computer Fraud and Abuse Act (CFAA). This is the strongest precedent supporting public data scraping.
Van Buren v. United States (2021) — The Supreme Court narrowed the CFAA's scope, ruling that "exceeding authorized access" only applies to accessing data you're not supposed to see at all — not accessing publicly available data in unauthorized ways.
Twitter v. Unknown Scrapers (2023) — Twitter/X has sent cease-and-desist letters and filed lawsuits against commercial scrapers, arguing ToS violations and copyright infringement.

Best Practices

Only scrape public data — Never scrape private/protected accounts or DMs
Respect rate limits — Don't overwhelm Twitter/X servers
Don't circumvent access controls — Avoid bypassing login walls through credential stuffing
Check robots.txt — Twitter/X's robots.txt restricts many paths
Comply with GDPR/CCPA — Personal data has additional restrictions in the EU and California
Consult legal counsel — For commercial use, get legal advice specific to your jurisdiction

Disclaimer: This article is for educational purposes only. Web scraping may violate Twitter/X's Terms of Service. Always ensure your scraping activities comply with applicable laws and regulations in your jurisdiction.

FAQ

See the structured FAQ data above for common questions about scraping Twitter/X. Key points:

Twitter/X's official API starts at $100/month for read access — scraping is significantly cheaper
Headless browsers (Playwright, Puppeteer) are required because Twitter/X is a JavaScript SPA
Anti-bot defenses include login walls, fingerprinting, rate limiting, and behavioral analysis
Guest API endpoints exist but are undocumented and frequently change
Legal status depends on jurisdiction — public data scraping has strong precedent in the US
For production use, a managed API like Mantis eliminates maintenance headaches

Next Steps

Now that you know how to scrape Twitter/X, explore more scraping guides:

How to Scrape Twitter (X) Data in 2026: Tweets, Profiles & Trends

Table of Contents

Why Scrape Twitter/X?

The Twitter/X API Pricing Problem

4 Methods to Scrape Twitter/X

Method 1: Python + Playwright

Setup

Scrape a User's Timeline

Scrape Twitter/X Search Results

Method 2: Node.js + Puppeteer

Setup

Scrape Trending Topics

Method 3: Twitter/X Guest API Endpoints

Method 4: Mantis Web Scraping API

Why Mantis for Twitter/X?

Skip the Anti-Bot Cat-and-Mouse Game

Twitter/X Anti-Bot Defenses

1. Login Wall

2. Rate Limiting

3. Browser Fingerprinting

4. JavaScript Challenges

5. Behavioral Analysis

6. API Endpoint Monitoring

What Data Can You Extract?

3 Real-World Use Cases

Use Case 1: Brand Sentiment Monitor

Use Case 2: Competitor Campaign Tracker

Use Case 3: AI Agent Trend Intelligence

Twitter/X API vs Scraping vs Mantis

Extract Twitter/X Data Without the $100/mo API Tax

Legal Considerations

Key Legal Precedents

Best Practices

FAQ

Next Steps