How to Scrape TikTok Data in 2026: Videos, Profiles & Hashtags

Q: How do I scrape TikTok without getting blocked?

TikTok uses aggressive anti-bot detection including TLS fingerprinting, CAPTCHA puzzle sliders, device ID tracking, and behavioral analysis. Use rotating residential proxies, headless browsers with stealth plugins, random delays (3-10 seconds), and rotate User-Agent strings. Alternatively, use a web scraping API like Mantis that handles anti-blocking automatically.

Q: Can I use TikTok's official API instead of scraping?

TikTok offers the Research API (requires academic affiliation and approval), the Login Kit / Display API (limited to user's own content), and the Marketing API (ads data only). None provide access to public video/profile data at scale. For trend analysis, competitor monitoring, or market research, scraping or a web scraping API is the practical option.

Q: What Python library is best for scraping TikTok?

Playwright with stealth plugins is the most reliable approach since TikTok heavily relies on JavaScript rendering and has strong bot detection. The unofficial TikTok web API endpoints work but change frequently. For production use, a web scraping API like Mantis is recommended for reliability.

📑 Table of Contents

Why Scrape TikTok Data?
What Data Can You Extract?
Method 1: Python + Requests (TikTok Web API)
Method 2: Playwright (Headless Browser)
Method 3: Node.js + Puppeteer
Method 4: Web Scraping API (Easiest)
Beating TikTok's Anti-Bot Detection
TikTok Official API vs Scraping
Method Comparison
Real-World Use Cases
Legal Considerations
FAQ

Why Scrape TikTok Data?

TikTok has over 2 billion monthly active users and is the fastest-growing social media platform in history. It's not just a video app anymore — it's a search engine, a commerce platform, and the cultural epicenter for Gen Z and Millennials. Businesses, researchers, and developers scrape TikTok for:

Trend discovery — Identify viral trends, sounds, and formats before they peak, giving brands a first-mover advantage
Influencer marketing — Evaluate creators by engagement rate, follower growth, content quality, and audience demographics before sponsorship deals
Competitor monitoring — Track competitor content strategies, posting frequency, and engagement benchmarks
Market research — Discover consumer preferences, product reviews, and emerging niches through TikTok's organic content
Sound & music analytics — Track which sounds are trending and which creators are using them — critical for music marketing
E-commerce intelligence — Monitor TikTok Shop products, pricing, reviews, and sales performance
AI agent social intelligence — Give AI assistants the ability to research trends, creators, and cultural moments on TikTok
Academic research — Study content virality, recommendation algorithms, and social media behavior at scale

TikTok's official APIs are extremely restricted — the Research API requires academic affiliation, the Display API only shows your own content, and neither provides the public discovery data that marketers and researchers need. Scraping is often the only practical option.

What Data Can You Extract?

TikTok profiles and videos contain rich metadata, all publicly accessible without authentication:

Data Point	Available	Source
Username & Display Name	✅	Profile page
Bio & External Links	✅	Profile page
Follower / Following / Likes Count	✅	Profile page
Verified Badge	✅	Profile page
Profile Picture (HD)	✅	Profile page
Video List (recent 30+)	✅	Profile page + scroll
Video Caption & Hashtags	✅	Video page
Video Likes / Comments / Shares / Views	✅	Video page
Video Duration	✅	Video page
Sound / Music Info	✅	Video page
Hashtag View Count	✅	Hashtag page
Hashtag Top Videos	✅	Hashtag page
Comments (text, author, likes)	✅	Video page
Trending Videos	✅	Discover / For You
Sound/Music Usage Count	✅	Sound page
TikTok Shop Products	✅	Shop pages

Method 1: Python + Requests (TikTok Web API)

TikTok's web app makes internal API calls that return JSON data. These undocumented endpoints are faster than full browser rendering — but they require specific headers and signatures that TikTok rotates frequently.

Install Dependencies

pip install requests

Public Profile Scraper

# tiktok_scraper.py
import requests
import json
import re

HEADERS = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
                   "AppleWebKit/537.36 (KHTML, like Gecko) "
                   "Chrome/125.0.0.0 Safari/537.36",
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
    "Accept-Language": "en-US,en;q=0.9",
    "Referer": "https://www.tiktok.com/",
}

def scrape_tiktok_profile(username: str) -> dict:
    """Scrape public TikTok profile data via server-rendered HTML."""
    url = f"https://www.tiktok.com/@{username}"

    resp = requests.get(url, headers=HEADERS, timeout=15)

    if resp.status_code == 404:
        return {"error": f"User '@{username}' not found"}
    if resp.status_code != 200:
        return {"error": f"Request failed: {resp.status_code}"}

    # TikTok embeds JSON data in a script tag
    match = re.search(
        r'<script id="__UNIVERSAL_DATA_FOR_REHYDRATION__"[^>]*>(.+?)</script>',
        resp.text
    )

    if not match:
        # Try SIGI_STATE fallback
        match = re.search(
            r'<script id="SIGI_STATE"[^>]*>(.+?)</script>',
            resp.text
        )

    if not match:
        return {"error": "Could not find embedded data — TikTok may have changed the page structure"}

    try:
        data = json.loads(match.group(1))
    except json.JSONDecodeError:
        return {"error": "Failed to parse embedded JSON"}

    # Navigate the nested data structure
    user_module = (
        data.get("__DEFAULT_SCOPE__", {})
        .get("webapp.user-detail", {})
    )
    user_info = user_module.get("userInfo", {})
    user = user_info.get("user", {})
    stats = user_info.get("stats", {})

    if not user:
        return {"error": "User data not found in response"}

    # Extract recent videos if available
    item_module = (
        data.get("__DEFAULT_SCOPE__", {})
        .get("webapp.user-detail", {})
        .get("itemList", [])
    )

    videos = []
    for item in item_module[:12]:
        videos.append({
            "id": item.get("id"),
            "desc": item.get("desc", ""),
            "url": f"https://www.tiktok.com/@{username}/video/{item.get('id')}",
            "likes": item.get("stats", {}).get("diggCount", 0),
            "comments": item.get("stats", {}).get("commentCount", 0),
            "shares": item.get("stats", {}).get("shareCount", 0),
            "views": item.get("stats", {}).get("playCount", 0),
            "duration": item.get("video", {}).get("duration", 0),
            "music": item.get("music", {}).get("title", ""),
            "music_author": item.get("music", {}).get("authorName", ""),
            "create_time": item.get("createTime"),
        })

    return {
        "username": user.get("uniqueId"),
        "display_name": user.get("nickname"),
        "bio": user.get("signature"),
        "verified": user.get("verified", False),
        "followers": stats.get("followerCount", 0),
        "following": stats.get("followingCount", 0),
        "total_likes": stats.get("heartCount", 0),
        "total_videos": stats.get("videoCount", 0),
        "profile_pic": user.get("avatarLarger"),
        "region": user.get("region"),
        "recent_videos": videos,
    }

# Example usage
profile = scrape_tiktok_profile("tiktok")
print(json.dumps(profile, indent=2))
print(f"\nFollowers: {profile.get('followers', 0):,}")
print(f"Total Likes: {profile.get('total_likes', 0):,}")

⚠️ Important: Data Extraction Fragility

TikTok's embedded JSON structure (the __UNIVERSAL_DATA_FOR_REHYDRATION__ script) changes without notice. Previously it was SIGI_STATE, and before that __NEXT_DATA__. Always have fallback parsing logic and test regularly. A managed API abstracts this instability away.

Hashtag Explorer

# tiktok_hashtag.py
def scrape_tiktok_hashtag(tag: str) -> dict:
    """Scrape TikTok hashtag page for view count and top videos."""
    url = f"https://www.tiktok.com/tag/{tag}"

    resp = requests.get(url, headers=HEADERS, timeout=15)

    if resp.status_code != 200:
        return {"error": f"Failed to fetch #{tag}: {resp.status_code}"}

    match = re.search(
        r'<script id="__UNIVERSAL_DATA_FOR_REHYDRATION__"[^>]*>(.+?)</script>',
        resp.text
    )

    if not match:
        return {"error": "Could not parse hashtag page"}

    data = json.loads(match.group(1))
    challenge_module = (
        data.get("__DEFAULT_SCOPE__", {})
        .get("webapp.challenge-detail", {})
    )
    challenge_info = challenge_module.get("challengeInfo", {})
    challenge = challenge_info.get("challenge", {})
    stats = challenge_info.get("stats", {})

    # Get top videos
    item_list = challenge_module.get("itemList", [])
    top_videos = []
    for item in item_list[:9]:
        top_videos.append({
            "id": item.get("id"),
            "desc": item.get("desc", "")[:200],
            "author": item.get("author", {}).get("uniqueId", ""),
            "views": item.get("stats", {}).get("playCount", 0),
            "likes": item.get("stats", {}).get("diggCount", 0),
            "comments": item.get("stats", {}).get("commentCount", 0),
            "shares": item.get("stats", {}).get("shareCount", 0),
        })

    return {
        "hashtag": tag,
        "title": challenge.get("title", tag),
        "view_count": stats.get("viewCount", 0),
        "video_count": stats.get("videoCount", 0),
        "top_videos": top_videos,
    }

result = scrape_tiktok_hashtag("webdevelopment")
print(f"#{result['hashtag']}: {result.get('view_count', 0):,} views")

Method 2: Playwright (Headless Browser)

TikTok is a JavaScript-heavy single-page app with aggressive bot detection. Playwright renders the full page, handles dynamic content loading, and can scroll through video feeds to collect large datasets.

Install

pip install playwright playwright-stealth
playwright install chromium

Full-Render TikTok Scraper

# playwright_tiktok.py
import asyncio
from playwright.async_api import async_playwright
from playwright_stealth import stealth_async
import json
import random

async def scrape_tiktok_profile(username: str) -> dict:
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        context = await browser.new_context(
            user_agent=(
                "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
                "AppleWebKit/537.36 (KHTML, like Gecko) "
                "Chrome/125.0.0.0 Safari/537.36"
            ),
            viewport={"width": 1920, "height": 1080},
            locale="en-US",
        )

        page = await context.new_page()
        await stealth_async(page)

        # Intercept API responses for richer data
        api_data = {}
        async def handle_response(response):
            url = response.url
            if "/api/user/detail" in url or "userInfo" in url:
                try:
                    data = await response.json()
                    api_data["user"] = data
                except Exception:
                    pass
            elif "/api/post/item_list" in url or "itemList" in url:
                try:
                    data = await response.json()
                    api_data["items"] = data
                except Exception:
                    pass

        page.on("response", handle_response)

        url = f"https://www.tiktok.com/@{username}"
        await page.goto(url, wait_until="networkidle")

        # Dismiss cookie banner if present
        try:
            cookie_btn = page.locator(
                'button:has-text("Accept all")'
            )
            await cookie_btn.click(timeout=3000)
        except Exception:
            pass

        await page.wait_for_timeout(3000)

        # Extract data from the page DOM
        profile = await page.evaluate("""() => {
            // Try to get data from the embedded JSON
            const script = document.querySelector(
                '#__UNIVERSAL_DATA_FOR_REHYDRATION__'
            );
            if (script) {
                try {
                    const data = JSON.parse(script.textContent);
                    const userDetail = data?.['__DEFAULT_SCOPE__']?.['webapp.user-detail'];
                    const user = userDetail?.userInfo?.user || {};
                    const stats = userDetail?.userInfo?.stats || {};
                    const items = userDetail?.itemList || [];

                    return {
                        username: user.uniqueId,
                        display_name: user.nickname,
                        bio: user.signature,
                        verified: user.verified || false,
                        followers: stats.followerCount || 0,
                        following: stats.followingCount || 0,
                        total_likes: stats.heartCount || 0,
                        total_videos: stats.videoCount || 0,
                        profile_pic: user.avatarLarger,
                        videos: items.slice(0, 12).map(v => ({
                            id: v.id,
                            desc: v.desc,
                            views: v.stats?.playCount || 0,
                            likes: v.stats?.diggCount || 0,
                            comments: v.stats?.commentCount || 0,
                            shares: v.stats?.shareCount || 0,
                            duration: v.video?.duration || 0,
                            music: v.music?.title || '',
                        })),
                        source: 'embedded_json',
                    };
                } catch (e) {}
            }

            // Fallback: parse from visible DOM elements
            const getName = () => {
                const h1 = document.querySelector('h1[data-e2e="user-title"]');
                return h1 ? h1.textContent.trim() : null;
            };

            const getSubtitle = () => {
                const h2 = document.querySelector('h2[data-e2e="user-subtitle"]');
                return h2 ? h2.textContent.trim() : null;
            };

            const getCount = (selector) => {
                const el = document.querySelector(selector);
                if (!el) return 0;
                const text = el.textContent.replace(/,/g, '');
                const match = text.match(/([\d.]+)\s*(K|M|B)?/i);
                if (!match) return 0;
                let num = parseFloat(match[1]);
                const suffix = (match[2] || '').toUpperCase();
                if (suffix === 'K') num *= 1000;
                if (suffix === 'M') num *= 1000000;
                if (suffix === 'B') num *= 1000000000;
                return Math.round(num);
            };

            return {
                username: getSubtitle(),
                display_name: getName(),
                bio: document.querySelector(
                    'h2[data-e2e="user-bio"]'
                )?.textContent?.trim() || '',
                followers: getCount(
                    '[data-e2e="followers-count"]'
                ),
                following: getCount(
                    '[data-e2e="following-count"]'
                ),
                total_likes: getCount(
                    '[data-e2e="likes-count"]'
                ),
                videos: [],
                source: 'dom_fallback',
            };
        }""")

        await browser.close()
        return profile

# Run it
data = asyncio.run(scrape_tiktok_profile("tiktok"))
print(json.dumps(data, indent=2))

Scroll & Collect Videos

# tiktok_scroll.py
async def scrape_user_videos(
    username: str, max_videos: int = 50
) -> list:
    """Scroll through a TikTok profile and collect video data."""
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        context = await browser.new_context(
            user_agent=(
                "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
                "AppleWebKit/537.36 Chrome/125.0.0.0"
            ),
            viewport={"width": 1920, "height": 1080},
        )
        page = await context.new_page()
        await stealth_async(page)

        await page.goto(
            f"https://www.tiktok.com/@{username}",
            wait_until="networkidle",
        )

        # Dismiss cookie banner
        try:
            await page.click(
                'button:has-text("Accept all")', timeout=3000
            )
        except Exception:
            pass

        videos = []
        prev_count = 0
        scroll_attempts = 0

        while len(videos) < max_videos and scroll_attempts < 30:
            # Collect video links and metadata
            new_videos = await page.evaluate("""() => {
                const items = document.querySelectorAll(
                    '[data-e2e="user-post-item"]'
                );
                return [...items].map(item => {
                    const link = item.querySelector('a');
                    const views = item.querySelector(
                        '[data-e2e="video-views"]'
                    );
                    return {
                        url: link ? link.href : null,
                        views_text: views
                            ? views.textContent.trim() : '0',
                    };
                }).filter(v => v.url);
            }""")

            for v in new_videos:
                if v not in videos:
                    videos.append(v)

            if len(videos) == prev_count:
                scroll_attempts += 1
            else:
                scroll_attempts = 0
            prev_count = len(videos)

            # Scroll down
            await page.evaluate(
                "window.scrollBy(0, window.innerHeight * 2)"
            )
            await page.wait_for_timeout(
                random.randint(1500, 3000)
            )

        await browser.close()
        return videos[:max_videos]

💡 Pro Tip: Intercepting API Calls

TikTok's web app makes internal API calls as you scroll. Intercept these responses with page.on("response") to capture structured JSON data — far cleaner than parsing DOM elements. Look for URLs containing /api/post/item_list/ or /api/comment/list/.

Method 3: Node.js + Puppeteer

Puppeteer with the stealth plugin provides excellent TikTok compatibility. The key advantage: intercepting network requests to capture TikTok's internal API responses as structured JSON.

Install

npm install puppeteer puppeteer-extra puppeteer-extra-plugin-stealth

Profile Scraper with API Interception

// tiktok-scraper.mjs
import puppeteer from "puppeteer-extra";
import StealthPlugin from "puppeteer-extra-plugin-stealth";

puppeteer.use(StealthPlugin());

async function scrapeTikTokProfile(username) {
  const browser = await puppeteer.launch({
    headless: "new",
    args: [
      "--no-sandbox",
      "--disable-setuid-sandbox",
      "--disable-blink-features=AutomationControlled",
    ],
  });

  const page = await browser.newPage();
  await page.setViewport({ width: 1920, height: 1080 });

  // Intercept TikTok's internal API responses
  let userData = null;
  let videoData = [];

  page.on("response", async (resp) => {
    const url = resp.url();
    if (url.includes("/api/user/detail") ||
        url.includes("user-detail")) {
      try {
        const json = await resp.json();
        userData = json;
      } catch (e) {}
    }
    if (url.includes("/api/post/item_list") ||
        url.includes("item_list")) {
      try {
        const json = await resp.json();
        if (json.itemList) {
          videoData.push(...json.itemList);
        }
      } catch (e) {}
    }
  });

  await page.goto(`https://www.tiktok.com/@${username}`, {
    waitUntil: "networkidle0",
    timeout: 30000,
  });

  // Dismiss cookie consent
  try {
    await page.click('button:has-text("Accept all")', {
      timeout: 3000,
    });
  } catch (e) {}

  await page.waitForTimeout(4000);

  // If API interception didn't work, parse from page
  if (!userData) {
    userData = await page.evaluate(() => {
      const script = document.querySelector(
        "#__UNIVERSAL_DATA_FOR_REHYDRATION__"
      );
      if (!script) return null;
      try {
        const data = JSON.parse(script.textContent);
        return data?.["__DEFAULT_SCOPE__"]?.[
          "webapp.user-detail"
        ];
      } catch (e) {
        return null;
      }
    });
  }

  let result;
  if (userData?.userInfo) {
    const user = userData.userInfo.user || {};
    const stats = userData.userInfo.stats || {};
    const items = userData.itemList || videoData;

    result = {
      username: user.uniqueId || username,
      display_name: user.nickname || null,
      bio: user.signature || null,
      verified: user.verified || false,
      followers: stats.followerCount || 0,
      following: stats.followingCount || 0,
      total_likes: stats.heartCount || 0,
      total_videos: stats.videoCount || 0,
      profile_pic: user.avatarLarger || null,
      region: user.region || null,
      videos: (items || []).slice(0, 12).map((v) => ({
        id: v.id,
        desc: v.desc || "",
        views: v.stats?.playCount || 0,
        likes: v.stats?.diggCount || 0,
        comments: v.stats?.commentCount || 0,
        shares: v.stats?.shareCount || 0,
        duration: v.video?.duration || 0,
        music: v.music?.title || "",
        music_author: v.music?.authorName || "",
      })),
    };
  } else {
    // Fallback: parse from DOM
    result = await page.evaluate(() => ({
      username: document.querySelector(
        'h2[data-e2e="user-subtitle"]'
      )?.textContent?.trim(),
      display_name: document.querySelector(
        'h1[data-e2e="user-title"]'
      )?.textContent?.trim(),
      bio: document.querySelector(
        'h2[data-e2e="user-bio"]'
      )?.textContent?.trim(),
      followers: document.querySelector(
        '[data-e2e="followers-count"]'
      )?.textContent?.trim(),
      following: document.querySelector(
        '[data-e2e="following-count"]'
      )?.textContent?.trim(),
      total_likes: document.querySelector(
        '[data-e2e="likes-count"]'
      )?.textContent?.trim(),
    }));
  }

  await browser.close();
  return result;
}

// Batch scrape with rate limiting
async function scrapeMultiple(usernames, delayMs = 8000) {
  const results = [];
  for (const username of usernames) {
    try {
      const profile = await scrapeTikTokProfile(username);
      results.push(profile);
      console.log(
        `✓ @${profile.username} — ` +
        `${(profile.followers || 0).toLocaleString()} followers`
      );
    } catch (err) {
      console.error(`✗ @${username}: ${err.message}`);
      results.push({ username, error: err.message });
    }
    await new Promise((r) => setTimeout(r, delayMs));
  }
  return results;
}

// Usage
const profiles = await scrapeMultiple([
  "tiktok", "charlidamelio", "khaby.lame"
]);
console.log(JSON.stringify(profiles, null, 2));

Method 4: Web Scraping API (Easiest)

The most reliable approach for production TikTok scraping. A web scraping API handles TikTok's aggressive anti-bot detection, TLS fingerprinting, CAPTCHA solving, and proxy rotation — you send a URL, get structured data back.

Using the Mantis API

# One API call — structured TikTok data
import requests

resp = requests.post(
    "https://api.mantisapi.com/v1/scrape",
    headers={
        "Authorization": "Bearer YOUR_API_KEY",
        "Content-Type": "application/json",
    },
    json={
        "url": "https://www.tiktok.com/@charlidamelio",
        "extract": {
            "username": "TikTok username",
            "display_name": "display name",
            "bio": "profile bio text",
            "verified": "whether account is verified",
            "followers": "follower count as integer",
            "following": "following count as integer",
            "total_likes": "total likes received as integer",
            "total_videos": "total video count",
            "recent_videos": (
                "array of recent videos with: "
                "caption, views, likes, comments, shares, "
                "duration_seconds, music_title"
            ),
        },
        "render_js": True,
    },
)

profile = resp.json()
print(f"@{profile.get('username')} — "
      f"{profile.get('followers', 0):,} followers")

Skip the Bot Detection & CAPTCHAs

Mantis handles TikTok's TLS fingerprinting, puzzle CAPTCHAs, proxy rotation, and JavaScript rendering — so you don't have to.

View Pricing Get Started Free

Node.js with Mantis

// mantis-tiktok.mjs
const resp = await fetch("https://api.mantisapi.com/v1/scrape", {
  method: "POST",
  headers: {
    Authorization: "Bearer YOUR_API_KEY",
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    url: "https://www.tiktok.com/@tiktok",
    extract: {
      username: "TikTok handle",
      followers: "follower count as number",
      total_likes: "total likes as number",
      recent_videos:
        "array of last 12 videos with caption, " +
        "views, likes, comments, shares, music",
    },
    render_js: true,
  }),
});

const profile = await resp.json();
console.log(profile);

cURL Example

curl -X POST https://api.mantisapi.com/v1/scrape \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://www.tiktok.com/tag/ai",
    "extract": {
      "hashtag": "hashtag name",
      "view_count": "total views as integer",
      "top_videos": "array of top 9 videos with: author, caption, views, likes"
    },
    "render_js": true
  }'

Beating TikTok's Anti-Bot Detection

TikTok has some of the most sophisticated anti-scraping defenses of any social platform — arguably more aggressive than Instagram. Here's what you're up against:

TikTok's Defense Layers

Defense	What It Does	Countermeasure
TLS Fingerprinting	Detects non-browser TLS handshakes (JA3/JA4)	Use real browser (Playwright/Puppeteer) or TLS-spoofing libraries
CAPTCHA (Puzzle Slider)	Slide-to-verify challenges on suspicious requests	CAPTCHA solving services, or use a managed API
Device ID Tracking	Assigns persistent device fingerprints across sessions	Fresh browser profiles, randomized fingerprints
IP Rate Limiting	Blocks IPs after rapid successive requests	Rotating residential proxies, long random delays
Browser Fingerprinting	Detects WebDriver, headless indicators, automation	Stealth plugins (playwright-stealth, puppeteer-extra)
Behavioral Analysis	Detects non-human browsing patterns	Random delays, mouse movement simulation, scroll patterns
Geo-Restrictions	Shows different content based on IP location	Geo-targeted proxies for specific markets
API Signature (X-Bogus)	Signs API requests with encrypted parameters	Reverse-engineer signing algorithm (constantly changes)
msToken Cookie	Session validation token required for API calls	Generate via real browser session, rotate frequently

Essential Anti-Detection Techniques

# tiktok_stealth.py
import random
import time

PROXY_POOL = [
    # Residential proxies are essential for TikTok
    # Datacenter IPs are blocked almost immediately
    "http://user:pass@res-proxy1.example.com:8080",
    "http://user:pass@res-proxy2.example.com:8080",
]

def tiktok_delay():
    """TikTok needs moderate delays with occasional pauses."""
    base = random.uniform(3, 10)
    # 15% chance of a longer pause (human-like browsing)
    if random.random() < 0.15:
        base += random.uniform(15, 45)
    time.sleep(base)

def is_blocked(response) -> bool:
    """Detect TikTok blocking patterns."""
    if response.status_code == 429:
        return True
    if response.status_code == 403:
        return True
    if "captcha" in response.text.lower():
        return True
    if "verify" in response.url and "tiktok.com" in response.url:
        return True
    # Empty response often means shadowban
    if len(response.text) < 500:
        return True
    return False

def get_tiktok_session():
    """Create a session with TikTok-appropriate headers."""
    session = requests.Session()
    proxy = random.choice(PROXY_POOL)
    session.proxies = {"http": proxy, "https": proxy}
    session.headers.update({
        "User-Agent": random.choice([
            "Mozilla/5.0 (iPhone; CPU iPhone OS 17_5 like Mac OS X) "
            "AppleWebKit/605.1.15 (KHTML, like Gecko) "
            "Version/17.5 Mobile/15E148 Safari/604.1",
            "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
            "AppleWebKit/537.36 Chrome/125.0.0.0 Safari/537.36",
            "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
            "AppleWebKit/537.36 Chrome/125.0.0.0 Safari/537.36",
        ]),
        "Accept-Language": "en-US,en;q=0.9",
        "Accept": "text/html,application/xhtml+xml,*/*",
        "Sec-Fetch-Dest": "document",
        "Sec-Fetch-Mode": "navigate",
        "Sec-Fetch-Site": "none",
    })
    return session

⚠️ Critical: TLS Fingerprinting

TikTok is one of the few platforms that actively checks TLS fingerprints (JA3/JA4 hashes). Python's requests library has a distinctive TLS fingerprint that TikTok blocks. For direct HTTP requests, you need libraries like curl_cffi or tls-client that spoof browser TLS handshakes. Full browser automation (Playwright/Puppeteer) avoids this issue entirely.

TikTok Official API vs Scraping

TikTok offers several official APIs, but they're heavily restricted compared to what's available through scraping:

Feature	Research API	Login Kit / Display API	Marketing API	Web Scraping	Mantis API
Access	Academics only (approved)	User-authorized content	Ad accounts only	Any public content	Any public content
Requires Approval	Yes (academic affiliation)	Yes (TikTok review)	Yes (ad account)	No	No
Rate Limits	1,000 requests/day	100 requests/day	Varies by spend	Depends on proxies	Based on plan
Public Profile Data	✅ (limited fields)	❌ Own account only	❌	✅ Full data	✅ Full data
Video Search	✅ (keyword/hashtag)	❌	❌	✅	✅
Trending Content	❌	❌	❌	✅	✅
Comments	✅ (limited)	❌	❌	✅	✅
Sound/Music Data	❌	❌	❌	✅	✅
Competitor Analysis	✅ (research only)	❌	❌	✅	✅
Commercial Use	❌ (research only)	Limited	Ads only	Gray area	✅
Cost	Free (if approved)	Free	Tied to ad spend	$100-500+/mo (proxies)	$0-299/mo

💡 The TikTok API Gap

TikTok's API ecosystem is the most restrictive of any major social platform. The Research API is academic-only with a multi-month approval process. The Display API only shows your own content. There's no commercial API for public data access — making scraping essentially the only option for businesses doing market research, influencer marketing, or competitive intelligence on TikTok.

Method Comparison

Criteria	Python + Requests	Playwright	Node.js + Puppeteer	Mantis API
Setup Time	5 min	10 min	10 min	2 min
JS Rendering	❌ (parses embedded JSON)	✅	✅	✅
Anti-Detection	Low (TLS fingerprinted)	Good (with stealth)	Good (with stealth)	Built-in
Speed	Fast (when it works)	Slow	Slow	Medium
Maintenance	Very High (JSON structure changes)	High	High	None
CAPTCHA Handling	❌	Manual / external solver	Manual / external solver	Built-in
Scale	Low	Low-Medium	Low-Medium	High
Cost (5K profiles/mo)	$100-300 (proxies + TLS lib)	$200-500 (proxies + compute)	$200-500 (proxies + compute)	$99 (Pro plan)
Best For	Quick prototypes	Rich data + scrolling	Backend services	Production

Real-World Use Cases

1. Trend Discovery Engine for AI Agents

Give an AI agent the ability to discover what's trending on TikTok — a critical capability for marketing assistants, content strategists, and social media managers.

# agent_tiktok_trends.py — LangChain tool
from langchain.tools import tool
import requests

MANTIS_KEY = "YOUR_API_KEY"

@tool
def discover_tiktok_trends(topic: str) -> str:
    """Discover trending TikTok content for a topic.
    Returns top videos, hashtag stats, and trending sounds."""

    # Scrape the hashtag page for the topic
    resp = requests.post(
        "https://api.mantisapi.com/v1/scrape",
        headers={
            "Authorization": f"Bearer {MANTIS_KEY}",
            "Content-Type": "application/json",
        },
        json={
            "url": f"https://www.tiktok.com/tag/{topic}",
            "extract": {
                "hashtag": "hashtag name",
                "total_views": "total view count for this hashtag",
                "video_count": "number of videos with this hashtag",
                "top_videos": (
                    "array of top 6 videos: author username, "
                    "caption (first 100 chars), views, likes, "
                    "comments, shares, music_title"
                ),
            },
            "render_js": True,
        },
    )
    data = resp.json()

    result = f"TikTok Trends: #{topic}\n"
    result += f"Total Views: {data.get('total_views', 'N/A')}\n"
    result += f"Videos: {data.get('video_count', 'N/A')}\n\n"

    top = data.get("top_videos", [])
    if top:
        result += "Top Performing Videos:\n"
        for i, v in enumerate(top[:5], 1):
            result += (
                f"  {i}. @{v.get('author', '?')} — "
                f"{v.get('caption', '[no caption]')[:80]}\n"
                f"     👁 {v.get('views', 0):,} views | "
                f"❤️ {v.get('likes', 0):,} likes | "
                f"💬 {v.get('comments', 0):,}\n"
                f"     🎵 {v.get('music_title', 'Original Sound')}\n"
            )

    return result

# Use in a LangChain agent:
# agent = create_agent(tools=[discover_tiktok_trends], ...)
# agent.run("What's trending on TikTok for AI?")

2. Influencer Performance Analyzer

Evaluate TikTok creators before sponsorship deals — engagement rate, content consistency, audience quality, and estimated partnership value.

# tiktok_influencer.py
import requests

MANTIS_KEY = "YOUR_API_KEY"

def analyze_tiktok_creator(username: str) -> dict:
    """Calculate performance metrics for a TikTok creator."""
    resp = requests.post(
        "https://api.mantisapi.com/v1/scrape",
        headers={
            "Authorization": f"Bearer {MANTIS_KEY}",
            "Content-Type": "application/json",
        },
        json={
            "url": f"https://www.tiktok.com/@{username}",
            "extract": {
                "username": "TikTok username",
                "display_name": "display name",
                "verified": "verified status boolean",
                "followers": "follower count as integer",
                "following": "following count as integer",
                "total_likes": "total likes as integer",
                "total_videos": "total video count as integer",
                "recent_videos": (
                    "array of last 12 videos: "
                    "views (integer), likes (integer), "
                    "comments (integer), shares (integer), "
                    "duration_seconds (integer), "
                    "caption (string), music_title (string)"
                ),
            },
            "render_js": True,
        },
    )
    data = resp.json()

    videos = data.get("recent_videos", [])
    followers = data.get("followers", 1)

    if videos and followers > 0:
        # TikTok engagement = (likes + comments + shares) / views
        total_views = sum(v.get("views", 0) for v in videos)
        total_engagement = sum(
            v.get("likes", 0) + v.get("comments", 0) + v.get("shares", 0)
            for v in videos
        )

        # View-based engagement rate (TikTok standard)
        view_eng_rate = (
            (total_engagement / total_views * 100)
            if total_views > 0 else 0
        )

        # Follower-based engagement rate (cross-platform comparison)
        follower_eng_rate = (
            total_engagement / len(videos) / followers * 100
        )

        avg_views = total_views / len(videos)
        avg_likes = sum(v.get("likes", 0) for v in videos) / len(videos)
        avg_comments = sum(v.get("comments", 0) for v in videos) / len(videos)
        avg_shares = sum(v.get("shares", 0) for v in videos) / len(videos)

        # View-to-follower ratio (viral potential)
        view_ratio = avg_views / followers if followers > 0 else 0

        # Average video duration
        avg_duration = sum(
            v.get("duration_seconds", 0) for v in videos
        ) / len(videos)
    else:
        view_eng_rate = follower_eng_rate = 0
        avg_views = avg_likes = avg_comments = avg_shares = 0
        view_ratio = avg_duration = 0

    # Performance tier (TikTok benchmarks)
    if view_eng_rate > 10:
        tier = "Viral"
    elif view_eng_rate > 6:
        tier = "Excellent"
    elif view_eng_rate > 3:
        tier = "Good"
    elif view_eng_rate > 1:
        tier = "Average"
    else:
        tier = "Below Average"

    return {
        "username": data.get("username", username),
        "display_name": data.get("display_name"),
        "verified": data.get("verified", False),
        "followers": followers,
        "total_likes": data.get("total_likes", 0),
        "total_videos": data.get("total_videos", 0),
        "view_engagement_rate": round(view_eng_rate, 2),
        "follower_engagement_rate": round(follower_eng_rate, 2),
        "performance_tier": tier,
        "avg_views": round(avg_views),
        "avg_likes": round(avg_likes),
        "avg_comments": round(avg_comments),
        "avg_shares": round(avg_shares),
        "view_to_follower_ratio": round(view_ratio, 2),
        "avg_duration_seconds": round(avg_duration, 1),
        "estimated_cpm": "$5-15",
        "estimated_sponsored_post": (
            f"${round(avg_views * 0.01, 2):,.0f} - "
            f"${round(avg_views * 0.03, 2):,.0f}"
        ),
    }

# Compare multiple creators
creators = ["charlidamelio", "khaby.lame", "bellapoarch"]
for creator in creators:
    result = analyze_tiktok_creator(creator)
    print(f"\n@{result['username']}:")
    print(f"  Followers: {result['followers']:,}")
    print(f"  Avg Views: {result['avg_views']:,}")
    print(f"  Engagement: {result['view_engagement_rate']}% "
          f"({result['performance_tier']})")
    print(f"  View/Follower: {result['view_to_follower_ratio']}x")
    print(f"  Est. Sponsored Post: {result['estimated_sponsored_post']}")

3. Competitive Hashtag Tracker

Monitor hashtag performance over time to identify trends, measure campaign impact, and discover content gaps in your niche.

# tiktok_hashtag_tracker.py
import requests
import json
from datetime import datetime

MANTIS_KEY = "YOUR_API_KEY"

TRACKED_HASHTAGS = [
    "webscraping", "aiagents", "pythonprogramming",
    "automation", "apidevelopment", "datascience",
    "machinelearning", "devtools",
]

def track_hashtag(tag: str) -> dict:
    """Get current stats and top content for a hashtag."""
    resp = requests.post(
        "https://api.mantisapi.com/v1/scrape",
        headers={
            "Authorization": f"Bearer {MANTIS_KEY}",
            "Content-Type": "application/json",
        },
        json={
            "url": f"https://www.tiktok.com/tag/{tag}",
            "extract": {
                "hashtag": "hashtag name",
                "total_views": "total view count as integer",
                "video_count": "number of videos as integer",
                "top_videos": (
                    "array of top 6 videos: "
                    "author, caption (first 100 chars), "
                    "views, likes, comments, shares, "
                    "music_title"
                ),
            },
            "render_js": True,
        },
    )
    data = resp.json()

    # Calculate averages for top content
    top = data.get("top_videos", [])
    if top:
        avg_views = sum(v.get("views", 0) for v in top) / len(top)
        avg_eng = sum(
            v.get("likes", 0) + v.get("comments", 0) + v.get("shares", 0)
            for v in top
        ) / len(top)
        top_sounds = {}
        for v in top:
            sound = v.get("music_title", "Original")
            top_sounds[sound] = top_sounds.get(sound, 0) + 1
    else:
        avg_views = avg_eng = 0
        top_sounds = {}

    return {
        "hashtag": tag,
        "total_views": data.get("total_views", 0),
        "video_count": data.get("video_count", 0),
        "top_avg_views": round(avg_views),
        "top_avg_engagement": round(avg_eng),
        "trending_sounds": dict(
            sorted(top_sounds.items(), key=lambda x: -x[1])
        ),
        "top_creators": [
            v.get("author") for v in top[:3]
        ],
        "scraped_at": datetime.utcnow().isoformat(),
    }

# Run tracking for all hashtags
report = {
    "date": datetime.utcnow().strftime("%Y-%m-%d"),
    "hashtags": [],
}

for tag in TRACKED_HASHTAGS:
    result = track_hashtag(tag)
    report["hashtags"].append(result)
    print(f"#{result['hashtag']}: "
          f"{result.get('total_views', 0):,} views, "
          f"{result.get('video_count', 0):,} videos")

# Save daily snapshot
date = datetime.utcnow().strftime("%Y-%m-%d")
with open(f"tiktok_hashtags_{date}.json", "w") as f:
    json.dump(report, f, indent=2)

# Identify opportunities
print("\n📊 Hashtag Opportunities (high views, low competition):")
sorted_tags = sorted(
    report["hashtags"],
    key=lambda x: (
        x.get("total_views", 0) /
        max(x.get("video_count", 1), 1)
    ),
    reverse=True,
)
for tag in sorted_tags[:3]:
    views_per_video = (
        tag.get("total_views", 0) /
        max(tag.get("video_count", 1), 1)
    )
    print(f"  #{tag['hashtag']}: "
          f"{views_per_video:,.0f} views/video avg")

Legal Considerations

TikTok scraping carries unique legal considerations beyond typical social media platforms, due to TikTok's regulatory environment:

TikTok's Terms of Service — Explicitly prohibit automated data collection, scraping, and crawling. Like Meta, TikTok can pursue breach of contract claims against scrapers
hiQ Labs v. LinkedIn (2022) — Ninth Circuit ruled scraping publicly accessible data doesn't violate the CFAA. This precedent generally supports scraping public TikTok content
Van Buren v. United States (2021) — Supreme Court narrowed CFAA's "exceeds authorized access" provision — favorable for scraping public data, but doesn't address ToS/contract claims
COPPA (Children's Online Privacy) — TikTok has a significant underage user base. TikTok paid $5.7M FTC settlement in 2019 for COPPA violations. Scraping data from minors carries additional legal risk
National Security Concerns — TikTok faces ongoing scrutiny from US, EU, and other governments over data practices. Regulatory changes could affect data access and scraping legality
GDPR (EU) — TikTok profiles contain personal data. Scraping EU user data for commercial purposes without a legal basis likely violates GDPR
CCPA (California) — Similar obligations for California residents' personal data
Video Copyright — TikTok videos are copyrighted by creators. Scraping metadata is different from downloading and republishing video content
Music & Sound Rights — TikTok videos often use licensed music. Downloading audio content raises additional copyright concerns

⚖️ Best Practices for Legal Safety

Only scrape publicly visible metadata (don't download videos). Don't scrape data identifiable as belonging to minors. Respect private accounts. Don't store personal data of EU residents without a legal basis. Use rate limiting to avoid service disruption. Never create fake accounts. Consult a lawyer for commercial use cases — especially given TikTok's evolving regulatory landscape.

Production-Ready TikTok Scraping

Stop fighting TLS fingerprinting, puzzle CAPTCHAs, and API signature algorithms. Mantis extracts structured TikTok data with a single API call.

View Pricing Get Started Free

Frequently Asked Questions

Is it legal to scrape TikTok data?

Scraping publicly available TikTok data is in a legal gray area. The hiQ v. LinkedIn ruling supports scraping public data, but TikTok's Terms of Service prohibit it. TikTok also faces unique regulatory scrutiny (COPPA, national security). For commercial use, consult legal counsel and consider a managed API approach.

How do I scrape TikTok without getting blocked?

TikTok uses TLS fingerprinting, puzzle CAPTCHAs, device ID tracking, and behavioral analysis. Use headless browsers with stealth plugins (not raw HTTP requests — TLS fingerprinting catches those), rotating residential proxies, and random delays of 3-10 seconds. Or use a web scraping API like Mantis that handles anti-blocking automatically.

Can I use TikTok's official API instead of scraping?

TikTok's Research API requires academic affiliation and multi-month approval. The Login Kit / Display API only shows your own content. The Marketing API is restricted to ad accounts. None provide commercial access to public profile/video data at scale. Scraping or a web scraping API is the practical option.

What data can I extract from TikTok?

From public profiles: username, display name, bio, follower/following/likes counts, verified status, and video list. From videos: caption, views, likes, comments, shares, duration, and music/sound info. From hashtags: view count and top videos. From trending: discover page videos and popular sounds.

What Python library is best for scraping TikTok?

Playwright with stealth plugins is the most reliable since TikTok requires JavaScript rendering and uses TLS fingerprinting that blocks standard HTTP libraries. For API-level requests, curl_cffi can spoof browser TLS fingerprints. For production, use a managed API like Mantis.

How do I scrape TikTok trending videos?

TikTok's trending/discover page requires JavaScript rendering. Use Playwright or Puppeteer to load the page, scroll to load content, and extract video metadata. TikTok's internal API endpoints (like /api/recommend/item_list/) return structured data but require valid signatures. Mantis can extract trending data with a single API call.

How to Scrape TikTok Data in 2026: Videos, Profiles & Hashtags

📑 Table of Contents

Why Scrape TikTok Data?

What Data Can You Extract?

Method 1: Python + Requests (TikTok Web API)

Install Dependencies

Public Profile Scraper

Hashtag Explorer

Method 2: Playwright (Headless Browser)

Install

Full-Render TikTok Scraper

Scroll & Collect Videos

Method 3: Node.js + Puppeteer

Install

Profile Scraper with API Interception

Method 4: Web Scraping API (Easiest)

Using the Mantis API

Skip the Bot Detection & CAPTCHAs

Node.js with Mantis

cURL Example

Beating TikTok's Anti-Bot Detection

TikTok's Defense Layers

Essential Anti-Detection Techniques

TikTok Official API vs Scraping

Method Comparison

Real-World Use Cases

1. Trend Discovery Engine for AI Agents

2. Influencer Performance Analyzer

3. Competitive Hashtag Tracker

Legal Considerations

Production-Ready TikTok Scraping

Frequently Asked Questions

Is it legal to scrape TikTok data?

How do I scrape TikTok without getting blocked?

Can I use TikTok's official API instead of scraping?

What data can I extract from TikTok?

What Python library is best for scraping TikTok?

How do I scrape TikTok trending videos?

Related Guides