How to Scrape Twitter (X) Data in 2026: Tweets, Profiles & Trends

Published March 30, 2026 · 18 min read

Table of Contents

Why Scrape Twitter/X?

Twitter/X remains one of the most valuable real-time data sources on the internet. With 550+ million monthly active users, it's where news breaks, trends emerge, and public sentiment forms in real time. For developers, researchers, and businesses, Twitter/X data powers:

The problem? Getting this data has become extremely expensive since the 2023 API pricing changes. That's why scraping has become the go-to approach for most developers.

The Twitter/X API Pricing Problem

Since Elon Musk's acquisition, Twitter/X API pricing has become prohibitively expensive for most developers:

Tier Price Tweet Reads Tweet Posts Cost per 1K Reads
Free $0/mo 0 (write-only) 1,500/mo N/A
Basic $100/mo 10,000/mo 3,000/mo $10.00
Pro $5,000/mo 1,000,000/mo 300,000/mo $5.00
Enterprise $42,000+/mo 50,000,000+/mo Negotiable ~$0.84

At $10 per 1,000 tweet reads on the Basic tier, even a simple sentiment analysis project becomes unaffordable. The free tier doesn't even allow reading tweets. This pricing gap has made web scraping the practical choice for most Twitter/X data extraction use cases.

Bottom line: The Twitter/X API free tier is write-only. Reading any data requires $100/month minimum — and you only get 10,000 tweets. For most projects, scraping is 10-100x more cost-effective.

4 Methods to Scrape Twitter/X

Here are four approaches to extract data from Twitter/X, from hands-on browser automation to production-ready API solutions:

  1. Python + Playwright — Headless browser, full JS rendering, good for prototyping
  2. Node.js + Puppeteer — JavaScript-native browser automation, great for real-time scraping
  3. Twitter/X Guest API Endpoints — Undocumented internal APIs, fast but fragile
  4. Mantis Web Scraping API — One-call solution, handles anti-bot, production-ready

Method 1: Python + Playwright

Playwright is the best choice for scraping Twitter/X with Python because it handles JavaScript rendering, which Twitter/X requires for all content.

Setup

pip install playwright beautifulsoup4
playwright install chromium

Scrape a User's Timeline

from playwright.sync_api import sync_playwright
from bs4 import BeautifulSoup
import json
import time

def scrape_twitter_timeline(username, max_tweets=20):
    """Scrape tweets from a public Twitter/X profile."""
    tweets = []

    with sync_playwright() as p:
        browser = p.chromium.launch(headless=True)
        context = browser.new_context(
            viewport={"width": 1280, "height": 900},
            user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
                       "AppleWebKit/537.36 (KHTML, like Gecko) "
                       "Chrome/120.0.0.0 Safari/537.36"
        )
        page = context.new_page()

        # Navigate to profile
        page.goto(f"https://x.com/{username}", wait_until="networkidle")
        time.sleep(3)  # Wait for dynamic content

        # Scroll to load more tweets
        scroll_count = 0
        while len(tweets) < max_tweets and scroll_count < 10:
            # Extract tweet elements
            html = page.content()
            soup = BeautifulSoup(html, "html.parser")

            for article in soup.find_all("article", {"data-testid": "tweet"}):
                tweet_data = extract_tweet(article)
                if tweet_data and tweet_data not in tweets:
                    tweets.append(tweet_data)

            # Scroll down
            page.evaluate("window.scrollBy(0, 1000)")
            time.sleep(2)
            scroll_count += 1

        browser.close()

    return tweets[:max_tweets]


def extract_tweet(article):
    """Extract structured data from a tweet article element."""
    try:
        # Tweet text
        text_el = article.find("div", {"data-testid": "tweetText"})
        text = text_el.get_text(strip=True) if text_el else ""

        # Username and handle
        user_links = article.find_all("a", href=True)
        username = ""
        display_name = ""
        for link in user_links:
            href = link.get("href", "")
            if href.startswith("/") and not href.startswith("/i/"):
                username = href.strip("/")
                display_name = link.get_text(strip=True)
                break

        # Engagement metrics
        likes = get_metric(article, "like")
        retweets = get_metric(article, "retweet")
        replies = get_metric(article, "reply")

        # Timestamp
        time_el = article.find("time")
        timestamp = time_el.get("datetime", "") if time_el else ""

        return {
            "text": text,
            "username": username,
            "display_name": display_name,
            "timestamp": timestamp,
            "likes": likes,
            "retweets": retweets,
            "replies": replies
        }
    except Exception:
        return None


def get_metric(article, metric_type):
    """Extract engagement metric (likes, retweets, replies)."""
    el = article.find("button", {"data-testid": metric_type})
    if el:
        text = el.get_text(strip=True)
        return parse_count(text) if text else 0
    return 0


def parse_count(text):
    """Parse Twitter-style count strings (1.2K, 3.5M, etc.)."""
    text = text.strip().upper()
    if "K" in text:
        return int(float(text.replace("K", "")) * 1000)
    if "M" in text:
        return int(float(text.replace("M", "")) * 1000000)
    try:
        return int(text)
    except ValueError:
        return 0


# Usage
tweets = scrape_twitter_timeline("elonmusk", max_tweets=10)
for t in tweets:
    print(f"@{t['username']}: {t['text'][:80]}...")
    print(f"  ❤️ {t['likes']}  🔁 {t['retweets']}  💬 {t['replies']}")
    print()

Scrape Twitter/X Search Results

def scrape_twitter_search(query, max_tweets=30):
    """Scrape tweets matching a search query."""
    import urllib.parse
    encoded_query = urllib.parse.quote(query)
    tweets = []

    with sync_playwright() as p:
        browser = p.chromium.launch(headless=True)
        context = browser.new_context(
            viewport={"width": 1280, "height": 900},
            user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
                       "AppleWebKit/537.36 Chrome/120.0.0.0 Safari/537.36"
        )
        page = context.new_page()

        # Twitter search URL (Latest tab for chronological results)
        url = f"https://x.com/search?q={encoded_query}&src=typed_query&f=live"
        page.goto(url, wait_until="networkidle")
        time.sleep(3)

        scroll_count = 0
        while len(tweets) < max_tweets and scroll_count < 15:
            html = page.content()
            soup = BeautifulSoup(html, "html.parser")

            for article in soup.find_all("article", {"data-testid": "tweet"}):
                tweet_data = extract_tweet(article)
                if tweet_data and tweet_data not in tweets:
                    tweets.append(tweet_data)

            page.evaluate("window.scrollBy(0, 1200)")
            time.sleep(2)
            scroll_count += 1

        browser.close()

    return tweets[:max_tweets]


# Search for tweets about web scraping
results = scrape_twitter_search("web scraping python", max_tweets=20)
print(f"Found {len(results)} tweets about web scraping")

Method 2: Node.js + Puppeteer

Puppeteer is ideal for JavaScript developers who want native browser control and easy integration with Node.js web applications.

Setup

npm install puppeteer cheerio

Scrape Trending Topics

const puppeteer = require('puppeteer');
const cheerio = require('cheerio');

async function scrapeTrends() {
  const browser = await puppeteer.launch({
    headless: 'new',
    args: ['--no-sandbox', '--disable-setuid-sandbox']
  });

  const page = await browser.newPage();

  await page.setUserAgent(
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) ' +
    'AppleWebKit/537.36 (KHTML, like Gecko) ' +
    'Chrome/120.0.0.0 Safari/537.36'
  );

  await page.setViewport({ width: 1280, height: 900 });

  // Navigate to the Explore page for trends
  await page.goto('https://x.com/explore/tabs/trending', {
    waitUntil: 'networkidle2',
    timeout: 30000
  });

  await page.waitForTimeout(3000);

  const html = await page.content();
  const $ = cheerio.load(html);

  const trends = [];

  // Extract trending topics
  $('[data-testid="trend"]').each((i, el) => {
    const trendText = $(el).find('span').text();
    const tweetCount = $(el).find('[dir="ltr"]').last().text();

    if (trendText) {
      trends.push({
        rank: i + 1,
        topic: trendText.trim(),
        tweet_count: tweetCount.trim() || 'N/A'
      });
    }
  });

  await browser.close();
  return trends;
}

async function scrapeUserProfile(username) {
  const browser = await puppeteer.launch({ headless: 'new' });
  const page = await browser.newPage();

  await page.setUserAgent(
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) ' +
    'AppleWebKit/537.36 Chrome/120.0.0.0 Safari/537.36'
  );

  await page.goto(`https://x.com/${username}`, {
    waitUntil: 'networkidle2'
  });

  await page.waitForTimeout(3000);

  const profile = await page.evaluate(() => {
    const getName = () => {
      const el = document.querySelector('[data-testid="UserName"]');
      return el ? el.innerText.split('\n')[0] : '';
    };

    const getBio = () => {
      const el = document.querySelector('[data-testid="UserDescription"]');
      return el ? el.innerText : '';
    };

    const getStats = () => {
      const links = document.querySelectorAll('a[href*="/followers"], a[href*="/following"]');
      const stats = {};
      links.forEach(link => {
        const text = link.innerText;
        if (link.href.includes('/following')) {
          stats.following = text.split(' ')[0];
        } else if (link.href.includes('/followers')) {
          stats.followers = text.split(' ')[0];
        }
      });
      return stats;
    };

    return {
      name: getName(),
      bio: getBio(),
      ...getStats()
    };
  });

  await browser.close();
  return { username, ...profile };
}

// Usage
(async () => {
  console.log('--- Trending Topics ---');
  const trends = await scrapeTrends();
  trends.slice(0, 10).forEach(t =>
    console.log(`#${t.rank}: ${t.topic} (${t.tweet_count})`)
  );

  console.log('\n--- User Profile ---');
  const profile = await scrapeUserProfile('OpenAI');
  console.log(JSON.stringify(profile, null, 2));
})();

Method 3: Twitter/X Guest API Endpoints

Twitter/X's web app communicates with internal GraphQL API endpoints. These undocumented guest APIs can return structured JSON data without browser rendering — making them significantly faster than headless browser approaches.

⚠️ Warning: These endpoints are undocumented, change frequently, and Twitter/X actively monitors for unauthorized usage. They should only be used for research and personal projects. For production use, consider the Mantis API approach below.
import requests
import json

class TwitterGuestAPI:
    """Access Twitter/X data via guest API tokens."""

    BASE_URL = "https://api.x.com"
    BEARER_TOKEN = "AAAAAAAAAAAAAAAAAAAAANRILgAAAAAAnNwIzUejR"  \
                   "COuH5E6I8xnZz4puTs%3D1Zv7ttfk8LF81IUq16cHjhLTvJu"  \
                   "4FA33AGWWjCpTnA"  # Public bearer token from web app

    def __init__(self):
        self.session = requests.Session()
        self.session.headers.update({
            "Authorization": f"Bearer {self.BEARER_TOKEN}",
            "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
                          "AppleWebKit/537.36 Chrome/120.0.0.0 Safari/537.36",
        })
        self.guest_token = self._get_guest_token()
        self.session.headers["x-guest-token"] = self.guest_token

    def _get_guest_token(self):
        """Activate a guest token for unauthenticated access."""
        resp = self.session.post(f"{self.BASE_URL}/1.1/guest/activate.json")
        return resp.json()["guest_token"]

    def search_tweets(self, query, count=20):
        """Search tweets using the adaptive search endpoint."""
        params = {
            "q": query,
            "count": count,
            "tweet_search_mode": "live",
            "query_source": "typed_query",
        }
        resp = self.session.get(
            f"{self.BASE_URL}/2/search/adaptive.json",
            params=params
        )
        data = resp.json()
        return self._parse_search_results(data)

    def get_user_tweets(self, user_id, count=20):
        """Fetch user timeline via GraphQL."""
        variables = json.dumps({
            "userId": user_id,
            "count": count,
            "includePromotedContent": False,
            "withQuickPromoteEligibilityTweetFields": False,
        })
        features = json.dumps({
            "rweb_lists_timeline_redesign_enabled": True,
            "responsive_web_graphql_exclude_directive_enabled": True,
            "verified_phone_label_enabled": False,
            "responsive_web_graphql_timeline_navigation_enabled": True,
        })
        params = {"variables": variables, "features": features}
        resp = self.session.get(
            f"{self.BASE_URL}/graphql/V7H0Ap3_Hh2FyS75OCDO3Q/UserTweets",
            params=params
        )
        return resp.json()

    def _parse_search_results(self, data):
        """Parse adaptive search response into tweet objects."""
        tweets = []
        global_objects = data.get("globalObjects", {})
        tweet_data = global_objects.get("tweets", {})
        user_data = global_objects.get("users", {})

        for tweet_id, tweet in tweet_data.items():
            user = user_data.get(str(tweet["user_id_str"]), {})
            tweets.append({
                "id": tweet_id,
                "text": tweet["full_text"],
                "username": user.get("screen_name", ""),
                "display_name": user.get("name", ""),
                "created_at": tweet["created_at"],
                "likes": tweet["favorite_count"],
                "retweets": tweet["retweet_count"],
                "replies": tweet["reply_count"],
            })

        return sorted(tweets, key=lambda x: x["likes"], reverse=True)


# Usage
api = TwitterGuestAPI()
tweets = api.search_tweets("web scraping API", count=10)
for t in tweets:
    print(f"@{t['username']}: {t['text'][:100]}...")
    print(f"  ❤️ {t['likes']}  🔁 {t['retweets']}")
    print()

Method 4: Mantis Web Scraping API

For production applications, Mantis provides the most reliable way to extract Twitter/X data. One API call handles JavaScript rendering, anti-bot bypassing, proxy rotation, and structured data extraction.

import requests

# Scrape a Twitter/X profile page
response = requests.post(
    "https://api.mantisapi.com/v1/scrape",
    headers={"x-api-key": "YOUR_API_KEY"},
    json={
        "url": "https://x.com/OpenAI",
        "render_js": True,
        "wait_for": "article[data-testid='tweet']",
        "extract": {
            "profile": {
                "name": "[data-testid='UserName'] span:first-child",
                "bio": "[data-testid='UserDescription']",
            },
            "tweets": {
                "_selector": "article[data-testid='tweet']",
                "_type": "list",
                "text": "[data-testid='tweetText']",
                "time": "time@datetime",
            }
        }
    }
)

data = response.json()
print(f"Profile: {data['extracted']['profile']['name']}")
for tweet in data['extracted']['tweets'][:5]:
    print(f"  - {tweet['text'][:80]}...")
    print(f"    Posted: {tweet['time']}")

Why Mantis for Twitter/X?

Skip the Anti-Bot Cat-and-Mouse Game

Extract Twitter/X data with a single API call. No headless browsers, no proxy management, no broken selectors.

View Pricing Get Started Free

Twitter/X Anti-Bot Defenses

Twitter/X has some of the most aggressive anti-scraping measures on the web. Understanding them is essential for any scraping approach:

1. Login Wall

Since 2023, Twitter/X requires authentication to view most content. Unauthenticated visitors see a login prompt after viewing a few tweets. This is the single biggest barrier to scraping — and the reason simple HTTP-based scraping no longer works.

2. Rate Limiting

Twitter/X rate limits by both IP address and account. Verified accounts get higher limits (~6,000 tweets/day read), while unverified accounts are capped at ~600 tweets/day. IP-based limits kick in even faster for unauthenticated requests.

3. Browser Fingerprinting

Twitter/X checks browser characteristics — canvas fingerprint, WebGL renderer, installed fonts, screen resolution, timezone, and language headers. Mismatches between these signals flag automated browsers.

4. JavaScript Challenges

Suspected bots receive JavaScript challenge pages that require full browser execution. Simple HTTP clients cannot pass these challenges, which is why headless browsers or API solutions are necessary.

5. Behavioral Analysis

Twitter/X monitors mouse movements, scroll patterns, click timing, and navigation sequences. Perfectly uniform scrolling (common in scrapers) triggers detection. Adding random delays and natural scroll patterns helps avoid this.

6. API Endpoint Monitoring

Internal GraphQL endpoints are monitored for unusual request patterns. Sudden spikes in requests from a single IP or account trigger temporary blocks or CAPTCHA challenges.

What Data Can You Extract?

Data Type Fields Auth Required?
Tweets Text, timestamp, likes, retweets, replies, media URLs, hashtags, mentions Mostly yes
Profiles Name, bio, followers/following count, join date, location, verified status Partial
Search Tweets matching keywords, hashtags, from/to specific users, date ranges Yes
Trends Trending topics, tweet counts, category, location-specific trends Yes
Threads Full conversation chains, reply trees, quoted tweets Yes
Lists List members, list tweets, public lists for any user Yes

3 Real-World Use Cases

Use Case 1: Brand Sentiment Monitor

Track what people say about your brand in real time. Combine Twitter/X scraping with AI sentiment analysis to detect PR crises before they escalate.

import requests
from datetime import datetime

def monitor_brand_sentiment(brand_name, interval_hours=1):
    """Monitor brand mentions and classify sentiment."""

    # Scrape recent mentions via Mantis
    response = requests.post(
        "https://api.mantisapi.com/v1/scrape",
        headers={"x-api-key": "YOUR_API_KEY"},
        json={
            "url": f"https://x.com/search?q={brand_name}&f=live",
            "render_js": True,
            "wait_for": "article[data-testid='tweet']",
            "scroll_count": 3,
            "extract": {
                "tweets": {
                    "_selector": "article[data-testid='tweet']",
                    "_type": "list",
                    "text": "[data-testid='tweetText']",
                    "time": "time@datetime",
                    "likes": "[data-testid='like'] span",
                }
            }
        }
    )

    tweets = response.json()["extracted"]["tweets"]

    # Simple sentiment classification
    positive_words = {"love", "great", "amazing", "best", "awesome", "excellent"}
    negative_words = {"hate", "terrible", "worst", "awful", "broken", "scam"}

    results = {"positive": 0, "negative": 0, "neutral": 0, "alerts": []}

    for tweet in tweets:
        text_lower = tweet["text"].lower()
        pos = sum(1 for w in positive_words if w in text_lower)
        neg = sum(1 for w in negative_words if w in text_lower)

        if neg > pos:
            results["negative"] += 1
            # Alert on high-engagement negative tweets
            likes = int(tweet.get("likes", "0").replace(",", "") or 0)
            if likes > 100:
                results["alerts"].append(tweet)
        elif pos > neg:
            results["positive"] += 1
        else:
            results["neutral"] += 1

    return results


# Run monitoring
sentiment = monitor_brand_sentiment("YourBrand")
print(f"Positive: {sentiment['positive']}")
print(f"Negative: {sentiment['negative']}")
print(f"Neutral: {sentiment['neutral']}")
if sentiment["alerts"]:
    print(f"⚠️ {len(sentiment['alerts'])} high-engagement negative mentions!")

Use Case 2: Competitor Campaign Tracker

Monitor competitor social media campaigns — track their posting frequency, engagement rates, and top-performing content.

async function trackCompetitor(username) {
  const puppeteer = require('puppeteer');
  const browser = await puppeteer.launch({ headless: 'new' });
  const page = await browser.newPage();

  await page.setUserAgent(
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) ' +
    'AppleWebKit/537.36 Chrome/120.0.0.0 Safari/537.36'
  );

  await page.goto(`https://x.com/${username}`, {
    waitUntil: 'networkidle2'
  });
  await page.waitForTimeout(3000);

  // Scroll and collect tweets
  const tweets = [];
  for (let i = 0; i < 5; i++) {
    const newTweets = await page.evaluate(() => {
      const articles = document.querySelectorAll('article[data-testid="tweet"]');
      return Array.from(articles).map(article => {
        const text = article.querySelector('[data-testid="tweetText"]');
        const time = article.querySelector('time');
        const like = article.querySelector('[data-testid="like"]');
        const retweet = article.querySelector('[data-testid="retweet"]');

        return {
          text: text?.innerText || '',
          time: time?.getAttribute('datetime') || '',
          likes: like?.innerText || '0',
          retweets: retweet?.innerText || '0',
        };
      });
    });

    tweets.push(...newTweets);
    await page.evaluate(() => window.scrollBy(0, 1000));
    await page.waitForTimeout(2000);
  }

  await browser.close();

  // Deduplicate and analyze
  const unique = [...new Map(tweets.map(t => [t.text, t])).values()];

  const report = {
    username,
    total_tweets: unique.length,
    avg_likes: Math.round(
      unique.reduce((sum, t) => sum + parseCount(t.likes), 0) / unique.length
    ),
    avg_retweets: Math.round(
      unique.reduce((sum, t) => sum + parseCount(t.retweets), 0) / unique.length
    ),
    top_tweet: unique.sort((a, b) =>
      parseCount(b.likes) - parseCount(a.likes)
    )[0],
  };

  return report;
}

function parseCount(str) {
  str = (str || '0').toUpperCase().trim();
  if (str.includes('K')) return parseFloat(str) * 1000;
  if (str.includes('M')) return parseFloat(str) * 1000000;
  return parseInt(str) || 0;
}

// Track a competitor
trackCompetitor('competitor_handle').then(report => {
  console.log(`📊 ${report.username} Analysis:`);
  console.log(`   Tweets analyzed: ${report.total_tweets}`);
  console.log(`   Avg likes: ${report.avg_likes}`);
  console.log(`   Avg retweets: ${report.avg_retweets}`);
  console.log(`   Top tweet: ${report.top_tweet.text.slice(0, 80)}...`);
});

Use Case 3: AI Agent Trend Intelligence

Build an AI agent that monitors Twitter/X for emerging trends in your industry and generates actionable reports.

import requests
import json

def trend_intelligence_agent(topics, api_key):
    """AI agent that monitors Twitter/X trends and generates insights."""

    all_data = {}

    for topic in topics:
        # Fetch tweets via Mantis
        response = requests.post(
            "https://api.mantisapi.com/v1/scrape",
            headers={"x-api-key": api_key},
            json={
                "url": f"https://x.com/search?q={topic}&f=live",
                "render_js": True,
                "wait_for": "article[data-testid='tweet']",
                "scroll_count": 2,
                "extract": {
                    "tweets": {
                        "_selector": "article[data-testid='tweet']",
                        "_type": "list",
                        "text": "[data-testid='tweetText']",
                        "likes": "[data-testid='like'] span",
                    }
                }
            }
        )

        tweets = response.json().get("extracted", {}).get("tweets", [])
        all_data[topic] = {
            "count": len(tweets),
            "total_engagement": sum(
                int(t.get("likes", "0").replace(",", "") or 0) for t in tweets
            ),
            "sample_tweets": [t["text"][:120] for t in tweets[:3]],
        }

    # Generate report
    report = "🔍 Twitter/X Trend Intelligence Report\n"
    report += "=" * 40 + "\n\n"

    for topic, data in sorted(
        all_data.items(),
        key=lambda x: x[1]["total_engagement"],
        reverse=True
    ):
        report += f"📌 {topic}\n"
        report += f"   Tweets found: {data['count']}\n"
        report += f"   Total engagement: {data['total_engagement']}\n"
        for sample in data["sample_tweets"]:
            report += f"   → {sample}...\n"
        report += "\n"

    return report


# Monitor AI industry trends
topics = ["AI agents", "LLM fine-tuning", "RAG pipeline", "AI coding assistant"]
report = trend_intelligence_agent(topics, "YOUR_API_KEY")
print(report)

Twitter/X API vs Scraping vs Mantis

Feature Twitter/X API (Basic) DIY Scraping Mantis API
Cost $100/mo (10K reads) Server + proxy costs $29/mo (5K requests)
Cost per 1K reads $10.00 $1-5 (proxies) $5.80
Setup time Hours (approval needed) Days Minutes
Data format Structured JSON Raw HTML → parse Structured JSON
Rate limits Strict (per tier) IP-based blocks Per plan
JS rendering N/A (API) You manage Included
Anti-bot handling N/A (API) You manage Included
Maintenance API version updates Constant (DOM changes) Zero
Historical data Enterprise only ($42K+) Limited by scrolling Current pages
Reliability High Low-medium High

Extract Twitter/X Data Without the $100/mo API Tax

Mantis handles JavaScript rendering, anti-bot measures, and proxy rotation. Get structured data with a single API call.

View Pricing Get Started Free

Twitter/X scraping exists in a complex legal landscape. Here's what you need to know:

Key Legal Precedents

Best Practices

  1. Only scrape public data — Never scrape private/protected accounts or DMs
  2. Respect rate limits — Don't overwhelm Twitter/X servers
  3. Don't circumvent access controls — Avoid bypassing login walls through credential stuffing
  4. Check robots.txt — Twitter/X's robots.txt restricts many paths
  5. Comply with GDPR/CCPA — Personal data has additional restrictions in the EU and California
  6. Consult legal counsel — For commercial use, get legal advice specific to your jurisdiction
Disclaimer: This article is for educational purposes only. Web scraping may violate Twitter/X's Terms of Service. Always ensure your scraping activities comply with applicable laws and regulations in your jurisdiction.

FAQ

See the structured FAQ data above for common questions about scraping Twitter/X. Key points:

Next Steps

Now that you know how to scrape Twitter/X, explore more scraping guides: