๐Ÿ”ฑ Mantis

How to Scrape TikTok Data in 2026: Videos, Profiles & Hashtags

Extract videos, profiles, hashtags, trending content, and sounds using Python, Node.js, and API-based approaches โ€” with production-ready code.

๐Ÿ“‘ Table of Contents

Why Scrape TikTok Data?

TikTok has over 2 billion monthly active users and is the fastest-growing social media platform in history. It's not just a video app anymore โ€” it's a search engine, a commerce platform, and the cultural epicenter for Gen Z and Millennials. Businesses, researchers, and developers scrape TikTok for:

TikTok's official APIs are extremely restricted โ€” the Research API requires academic affiliation, the Display API only shows your own content, and neither provides the public discovery data that marketers and researchers need. Scraping is often the only practical option.

What Data Can You Extract?

TikTok profiles and videos contain rich metadata, all publicly accessible without authentication:

Data PointAvailableSource
Username & Display Nameโœ…Profile page
Bio & External Linksโœ…Profile page
Follower / Following / Likes Countโœ…Profile page
Verified Badgeโœ…Profile page
Profile Picture (HD)โœ…Profile page
Video List (recent 30+)โœ…Profile page + scroll
Video Caption & Hashtagsโœ…Video page
Video Likes / Comments / Shares / Viewsโœ…Video page
Video Durationโœ…Video page
Sound / Music Infoโœ…Video page
Hashtag View Countโœ…Hashtag page
Hashtag Top Videosโœ…Hashtag page
Comments (text, author, likes)โœ…Video page
Trending Videosโœ…Discover / For You
Sound/Music Usage Countโœ…Sound page
TikTok Shop Productsโœ…Shop pages

Method 1: Python + Requests (TikTok Web API)

TikTok's web app makes internal API calls that return JSON data. These undocumented endpoints are faster than full browser rendering โ€” but they require specific headers and signatures that TikTok rotates frequently.

Install Dependencies

pip install requests

Public Profile Scraper

# tiktok_scraper.py
import requests
import json
import re

HEADERS = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
                   "AppleWebKit/537.36 (KHTML, like Gecko) "
                   "Chrome/125.0.0.0 Safari/537.36",
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
    "Accept-Language": "en-US,en;q=0.9",
    "Referer": "https://www.tiktok.com/",
}

def scrape_tiktok_profile(username: str) -> dict:
    """Scrape public TikTok profile data via server-rendered HTML."""
    url = f"https://www.tiktok.com/@{username}"

    resp = requests.get(url, headers=HEADERS, timeout=15)

    if resp.status_code == 404:
        return {"error": f"User '@{username}' not found"}
    if resp.status_code != 200:
        return {"error": f"Request failed: {resp.status_code}"}

    # TikTok embeds JSON data in a script tag
    match = re.search(
        r'<script id="__UNIVERSAL_DATA_FOR_REHYDRATION__"[^>]*>(.+?)</script>',
        resp.text
    )

    if not match:
        # Try SIGI_STATE fallback
        match = re.search(
            r'<script id="SIGI_STATE"[^>]*>(.+?)</script>',
            resp.text
        )

    if not match:
        return {"error": "Could not find embedded data โ€” TikTok may have changed the page structure"}

    try:
        data = json.loads(match.group(1))
    except json.JSONDecodeError:
        return {"error": "Failed to parse embedded JSON"}

    # Navigate the nested data structure
    user_module = (
        data.get("__DEFAULT_SCOPE__", {})
        .get("webapp.user-detail", {})
    )
    user_info = user_module.get("userInfo", {})
    user = user_info.get("user", {})
    stats = user_info.get("stats", {})

    if not user:
        return {"error": "User data not found in response"}

    # Extract recent videos if available
    item_module = (
        data.get("__DEFAULT_SCOPE__", {})
        .get("webapp.user-detail", {})
        .get("itemList", [])
    )

    videos = []
    for item in item_module[:12]:
        videos.append({
            "id": item.get("id"),
            "desc": item.get("desc", ""),
            "url": f"https://www.tiktok.com/@{username}/video/{item.get('id')}",
            "likes": item.get("stats", {}).get("diggCount", 0),
            "comments": item.get("stats", {}).get("commentCount", 0),
            "shares": item.get("stats", {}).get("shareCount", 0),
            "views": item.get("stats", {}).get("playCount", 0),
            "duration": item.get("video", {}).get("duration", 0),
            "music": item.get("music", {}).get("title", ""),
            "music_author": item.get("music", {}).get("authorName", ""),
            "create_time": item.get("createTime"),
        })

    return {
        "username": user.get("uniqueId"),
        "display_name": user.get("nickname"),
        "bio": user.get("signature"),
        "verified": user.get("verified", False),
        "followers": stats.get("followerCount", 0),
        "following": stats.get("followingCount", 0),
        "total_likes": stats.get("heartCount", 0),
        "total_videos": stats.get("videoCount", 0),
        "profile_pic": user.get("avatarLarger"),
        "region": user.get("region"),
        "recent_videos": videos,
    }

# Example usage
profile = scrape_tiktok_profile("tiktok")
print(json.dumps(profile, indent=2))
print(f"\nFollowers: {profile.get('followers', 0):,}")
print(f"Total Likes: {profile.get('total_likes', 0):,}")
โš ๏ธ Important: Data Extraction Fragility

TikTok's embedded JSON structure (the __UNIVERSAL_DATA_FOR_REHYDRATION__ script) changes without notice. Previously it was SIGI_STATE, and before that __NEXT_DATA__. Always have fallback parsing logic and test regularly. A managed API abstracts this instability away.

Hashtag Explorer

# tiktok_hashtag.py
def scrape_tiktok_hashtag(tag: str) -> dict:
    """Scrape TikTok hashtag page for view count and top videos."""
    url = f"https://www.tiktok.com/tag/{tag}"

    resp = requests.get(url, headers=HEADERS, timeout=15)

    if resp.status_code != 200:
        return {"error": f"Failed to fetch #{tag}: {resp.status_code}"}

    match = re.search(
        r'<script id="__UNIVERSAL_DATA_FOR_REHYDRATION__"[^>]*>(.+?)</script>',
        resp.text
    )

    if not match:
        return {"error": "Could not parse hashtag page"}

    data = json.loads(match.group(1))
    challenge_module = (
        data.get("__DEFAULT_SCOPE__", {})
        .get("webapp.challenge-detail", {})
    )
    challenge_info = challenge_module.get("challengeInfo", {})
    challenge = challenge_info.get("challenge", {})
    stats = challenge_info.get("stats", {})

    # Get top videos
    item_list = challenge_module.get("itemList", [])
    top_videos = []
    for item in item_list[:9]:
        top_videos.append({
            "id": item.get("id"),
            "desc": item.get("desc", "")[:200],
            "author": item.get("author", {}).get("uniqueId", ""),
            "views": item.get("stats", {}).get("playCount", 0),
            "likes": item.get("stats", {}).get("diggCount", 0),
            "comments": item.get("stats", {}).get("commentCount", 0),
            "shares": item.get("stats", {}).get("shareCount", 0),
        })

    return {
        "hashtag": tag,
        "title": challenge.get("title", tag),
        "view_count": stats.get("viewCount", 0),
        "video_count": stats.get("videoCount", 0),
        "top_videos": top_videos,
    }

result = scrape_tiktok_hashtag("webdevelopment")
print(f"#{result['hashtag']}: {result.get('view_count', 0):,} views")

Method 2: Playwright (Headless Browser)

TikTok is a JavaScript-heavy single-page app with aggressive bot detection. Playwright renders the full page, handles dynamic content loading, and can scroll through video feeds to collect large datasets.

Install

pip install playwright playwright-stealth
playwright install chromium

Full-Render TikTok Scraper

# playwright_tiktok.py
import asyncio
from playwright.async_api import async_playwright
from playwright_stealth import stealth_async
import json
import random

async def scrape_tiktok_profile(username: str) -> dict:
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        context = await browser.new_context(
            user_agent=(
                "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
                "AppleWebKit/537.36 (KHTML, like Gecko) "
                "Chrome/125.0.0.0 Safari/537.36"
            ),
            viewport={"width": 1920, "height": 1080},
            locale="en-US",
        )

        page = await context.new_page()
        await stealth_async(page)

        # Intercept API responses for richer data
        api_data = {}
        async def handle_response(response):
            url = response.url
            if "/api/user/detail" in url or "userInfo" in url:
                try:
                    data = await response.json()
                    api_data["user"] = data
                except Exception:
                    pass
            elif "/api/post/item_list" in url or "itemList" in url:
                try:
                    data = await response.json()
                    api_data["items"] = data
                except Exception:
                    pass

        page.on("response", handle_response)

        url = f"https://www.tiktok.com/@{username}"
        await page.goto(url, wait_until="networkidle")

        # Dismiss cookie banner if present
        try:
            cookie_btn = page.locator(
                'button:has-text("Accept all")'
            )
            await cookie_btn.click(timeout=3000)
        except Exception:
            pass

        await page.wait_for_timeout(3000)

        # Extract data from the page DOM
        profile = await page.evaluate("""() => {
            // Try to get data from the embedded JSON
            const script = document.querySelector(
                '#__UNIVERSAL_DATA_FOR_REHYDRATION__'
            );
            if (script) {
                try {
                    const data = JSON.parse(script.textContent);
                    const userDetail = data?.['__DEFAULT_SCOPE__']?.['webapp.user-detail'];
                    const user = userDetail?.userInfo?.user || {};
                    const stats = userDetail?.userInfo?.stats || {};
                    const items = userDetail?.itemList || [];

                    return {
                        username: user.uniqueId,
                        display_name: user.nickname,
                        bio: user.signature,
                        verified: user.verified || false,
                        followers: stats.followerCount || 0,
                        following: stats.followingCount || 0,
                        total_likes: stats.heartCount || 0,
                        total_videos: stats.videoCount || 0,
                        profile_pic: user.avatarLarger,
                        videos: items.slice(0, 12).map(v => ({
                            id: v.id,
                            desc: v.desc,
                            views: v.stats?.playCount || 0,
                            likes: v.stats?.diggCount || 0,
                            comments: v.stats?.commentCount || 0,
                            shares: v.stats?.shareCount || 0,
                            duration: v.video?.duration || 0,
                            music: v.music?.title || '',
                        })),
                        source: 'embedded_json',
                    };
                } catch (e) {}
            }

            // Fallback: parse from visible DOM elements
            const getName = () => {
                const h1 = document.querySelector('h1[data-e2e="user-title"]');
                return h1 ? h1.textContent.trim() : null;
            };

            const getSubtitle = () => {
                const h2 = document.querySelector('h2[data-e2e="user-subtitle"]');
                return h2 ? h2.textContent.trim() : null;
            };

            const getCount = (selector) => {
                const el = document.querySelector(selector);
                if (!el) return 0;
                const text = el.textContent.replace(/,/g, '');
                const match = text.match(/([\d.]+)\s*(K|M|B)?/i);
                if (!match) return 0;
                let num = parseFloat(match[1]);
                const suffix = (match[2] || '').toUpperCase();
                if (suffix === 'K') num *= 1000;
                if (suffix === 'M') num *= 1000000;
                if (suffix === 'B') num *= 1000000000;
                return Math.round(num);
            };

            return {
                username: getSubtitle(),
                display_name: getName(),
                bio: document.querySelector(
                    'h2[data-e2e="user-bio"]'
                )?.textContent?.trim() || '',
                followers: getCount(
                    '[data-e2e="followers-count"]'
                ),
                following: getCount(
                    '[data-e2e="following-count"]'
                ),
                total_likes: getCount(
                    '[data-e2e="likes-count"]'
                ),
                videos: [],
                source: 'dom_fallback',
            };
        }""")

        await browser.close()
        return profile

# Run it
data = asyncio.run(scrape_tiktok_profile("tiktok"))
print(json.dumps(data, indent=2))

Scroll & Collect Videos

# tiktok_scroll.py
async def scrape_user_videos(
    username: str, max_videos: int = 50
) -> list:
    """Scroll through a TikTok profile and collect video data."""
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        context = await browser.new_context(
            user_agent=(
                "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
                "AppleWebKit/537.36 Chrome/125.0.0.0"
            ),
            viewport={"width": 1920, "height": 1080},
        )
        page = await context.new_page()
        await stealth_async(page)

        await page.goto(
            f"https://www.tiktok.com/@{username}",
            wait_until="networkidle",
        )

        # Dismiss cookie banner
        try:
            await page.click(
                'button:has-text("Accept all")', timeout=3000
            )
        except Exception:
            pass

        videos = []
        prev_count = 0
        scroll_attempts = 0

        while len(videos) < max_videos and scroll_attempts < 30:
            # Collect video links and metadata
            new_videos = await page.evaluate("""() => {
                const items = document.querySelectorAll(
                    '[data-e2e="user-post-item"]'
                );
                return [...items].map(item => {
                    const link = item.querySelector('a');
                    const views = item.querySelector(
                        '[data-e2e="video-views"]'
                    );
                    return {
                        url: link ? link.href : null,
                        views_text: views
                            ? views.textContent.trim() : '0',
                    };
                }).filter(v => v.url);
            }""")

            for v in new_videos:
                if v not in videos:
                    videos.append(v)

            if len(videos) == prev_count:
                scroll_attempts += 1
            else:
                scroll_attempts = 0
            prev_count = len(videos)

            # Scroll down
            await page.evaluate(
                "window.scrollBy(0, window.innerHeight * 2)"
            )
            await page.wait_for_timeout(
                random.randint(1500, 3000)
            )

        await browser.close()
        return videos[:max_videos]
๐Ÿ’ก Pro Tip: Intercepting API Calls

TikTok's web app makes internal API calls as you scroll. Intercept these responses with page.on("response") to capture structured JSON data โ€” far cleaner than parsing DOM elements. Look for URLs containing /api/post/item_list/ or /api/comment/list/.

Method 3: Node.js + Puppeteer

Puppeteer with the stealth plugin provides excellent TikTok compatibility. The key advantage: intercepting network requests to capture TikTok's internal API responses as structured JSON.

Install

npm install puppeteer puppeteer-extra puppeteer-extra-plugin-stealth

Profile Scraper with API Interception

// tiktok-scraper.mjs
import puppeteer from "puppeteer-extra";
import StealthPlugin from "puppeteer-extra-plugin-stealth";

puppeteer.use(StealthPlugin());

async function scrapeTikTokProfile(username) {
  const browser = await puppeteer.launch({
    headless: "new",
    args: [
      "--no-sandbox",
      "--disable-setuid-sandbox",
      "--disable-blink-features=AutomationControlled",
    ],
  });

  const page = await browser.newPage();
  await page.setViewport({ width: 1920, height: 1080 });

  // Intercept TikTok's internal API responses
  let userData = null;
  let videoData = [];

  page.on("response", async (resp) => {
    const url = resp.url();
    if (url.includes("/api/user/detail") ||
        url.includes("user-detail")) {
      try {
        const json = await resp.json();
        userData = json;
      } catch (e) {}
    }
    if (url.includes("/api/post/item_list") ||
        url.includes("item_list")) {
      try {
        const json = await resp.json();
        if (json.itemList) {
          videoData.push(...json.itemList);
        }
      } catch (e) {}
    }
  });

  await page.goto(`https://www.tiktok.com/@${username}`, {
    waitUntil: "networkidle0",
    timeout: 30000,
  });

  // Dismiss cookie consent
  try {
    await page.click('button:has-text("Accept all")', {
      timeout: 3000,
    });
  } catch (e) {}

  await page.waitForTimeout(4000);

  // If API interception didn't work, parse from page
  if (!userData) {
    userData = await page.evaluate(() => {
      const script = document.querySelector(
        "#__UNIVERSAL_DATA_FOR_REHYDRATION__"
      );
      if (!script) return null;
      try {
        const data = JSON.parse(script.textContent);
        return data?.["__DEFAULT_SCOPE__"]?.[
          "webapp.user-detail"
        ];
      } catch (e) {
        return null;
      }
    });
  }

  let result;
  if (userData?.userInfo) {
    const user = userData.userInfo.user || {};
    const stats = userData.userInfo.stats || {};
    const items = userData.itemList || videoData;

    result = {
      username: user.uniqueId || username,
      display_name: user.nickname || null,
      bio: user.signature || null,
      verified: user.verified || false,
      followers: stats.followerCount || 0,
      following: stats.followingCount || 0,
      total_likes: stats.heartCount || 0,
      total_videos: stats.videoCount || 0,
      profile_pic: user.avatarLarger || null,
      region: user.region || null,
      videos: (items || []).slice(0, 12).map((v) => ({
        id: v.id,
        desc: v.desc || "",
        views: v.stats?.playCount || 0,
        likes: v.stats?.diggCount || 0,
        comments: v.stats?.commentCount || 0,
        shares: v.stats?.shareCount || 0,
        duration: v.video?.duration || 0,
        music: v.music?.title || "",
        music_author: v.music?.authorName || "",
      })),
    };
  } else {
    // Fallback: parse from DOM
    result = await page.evaluate(() => ({
      username: document.querySelector(
        'h2[data-e2e="user-subtitle"]'
      )?.textContent?.trim(),
      display_name: document.querySelector(
        'h1[data-e2e="user-title"]'
      )?.textContent?.trim(),
      bio: document.querySelector(
        'h2[data-e2e="user-bio"]'
      )?.textContent?.trim(),
      followers: document.querySelector(
        '[data-e2e="followers-count"]'
      )?.textContent?.trim(),
      following: document.querySelector(
        '[data-e2e="following-count"]'
      )?.textContent?.trim(),
      total_likes: document.querySelector(
        '[data-e2e="likes-count"]'
      )?.textContent?.trim(),
    }));
  }

  await browser.close();
  return result;
}

// Batch scrape with rate limiting
async function scrapeMultiple(usernames, delayMs = 8000) {
  const results = [];
  for (const username of usernames) {
    try {
      const profile = await scrapeTikTokProfile(username);
      results.push(profile);
      console.log(
        `โœ“ @${profile.username} โ€” ` +
        `${(profile.followers || 0).toLocaleString()} followers`
      );
    } catch (err) {
      console.error(`โœ— @${username}: ${err.message}`);
      results.push({ username, error: err.message });
    }
    await new Promise((r) => setTimeout(r, delayMs));
  }
  return results;
}

// Usage
const profiles = await scrapeMultiple([
  "tiktok", "charlidamelio", "khaby.lame"
]);
console.log(JSON.stringify(profiles, null, 2));

Method 4: Web Scraping API (Easiest)

The most reliable approach for production TikTok scraping. A web scraping API handles TikTok's aggressive anti-bot detection, TLS fingerprinting, CAPTCHA solving, and proxy rotation โ€” you send a URL, get structured data back.

Using the Mantis API

# One API call โ€” structured TikTok data
import requests

resp = requests.post(
    "https://api.mantisapi.com/v1/scrape",
    headers={
        "Authorization": "Bearer YOUR_API_KEY",
        "Content-Type": "application/json",
    },
    json={
        "url": "https://www.tiktok.com/@charlidamelio",
        "extract": {
            "username": "TikTok username",
            "display_name": "display name",
            "bio": "profile bio text",
            "verified": "whether account is verified",
            "followers": "follower count as integer",
            "following": "following count as integer",
            "total_likes": "total likes received as integer",
            "total_videos": "total video count",
            "recent_videos": (
                "array of recent videos with: "
                "caption, views, likes, comments, shares, "
                "duration_seconds, music_title"
            ),
        },
        "render_js": True,
    },
)

profile = resp.json()
print(f"@{profile.get('username')} โ€” "
      f"{profile.get('followers', 0):,} followers")

Skip the Bot Detection & CAPTCHAs

Mantis handles TikTok's TLS fingerprinting, puzzle CAPTCHAs, proxy rotation, and JavaScript rendering โ€” so you don't have to.

View Pricing Get Started Free

Node.js with Mantis

// mantis-tiktok.mjs
const resp = await fetch("https://api.mantisapi.com/v1/scrape", {
  method: "POST",
  headers: {
    Authorization: "Bearer YOUR_API_KEY",
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    url: "https://www.tiktok.com/@tiktok",
    extract: {
      username: "TikTok handle",
      followers: "follower count as number",
      total_likes: "total likes as number",
      recent_videos:
        "array of last 12 videos with caption, " +
        "views, likes, comments, shares, music",
    },
    render_js: true,
  }),
});

const profile = await resp.json();
console.log(profile);

cURL Example

curl -X POST https://api.mantisapi.com/v1/scrape \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://www.tiktok.com/tag/ai",
    "extract": {
      "hashtag": "hashtag name",
      "view_count": "total views as integer",
      "top_videos": "array of top 9 videos with: author, caption, views, likes"
    },
    "render_js": true
  }'

Beating TikTok's Anti-Bot Detection

TikTok has some of the most sophisticated anti-scraping defenses of any social platform โ€” arguably more aggressive than Instagram. Here's what you're up against:

TikTok's Defense Layers

DefenseWhat It DoesCountermeasure
TLS FingerprintingDetects non-browser TLS handshakes (JA3/JA4)Use real browser (Playwright/Puppeteer) or TLS-spoofing libraries
CAPTCHA (Puzzle Slider)Slide-to-verify challenges on suspicious requestsCAPTCHA solving services, or use a managed API
Device ID TrackingAssigns persistent device fingerprints across sessionsFresh browser profiles, randomized fingerprints
IP Rate LimitingBlocks IPs after rapid successive requestsRotating residential proxies, long random delays
Browser FingerprintingDetects WebDriver, headless indicators, automationStealth plugins (playwright-stealth, puppeteer-extra)
Behavioral AnalysisDetects non-human browsing patternsRandom delays, mouse movement simulation, scroll patterns
Geo-RestrictionsShows different content based on IP locationGeo-targeted proxies for specific markets
API Signature (X-Bogus)Signs API requests with encrypted parametersReverse-engineer signing algorithm (constantly changes)
msToken CookieSession validation token required for API callsGenerate via real browser session, rotate frequently

Essential Anti-Detection Techniques

# tiktok_stealth.py
import random
import time

PROXY_POOL = [
    # Residential proxies are essential for TikTok
    # Datacenter IPs are blocked almost immediately
    "http://user:pass@res-proxy1.example.com:8080",
    "http://user:pass@res-proxy2.example.com:8080",
]

def tiktok_delay():
    """TikTok needs moderate delays with occasional pauses."""
    base = random.uniform(3, 10)
    # 15% chance of a longer pause (human-like browsing)
    if random.random() < 0.15:
        base += random.uniform(15, 45)
    time.sleep(base)

def is_blocked(response) -> bool:
    """Detect TikTok blocking patterns."""
    if response.status_code == 429:
        return True
    if response.status_code == 403:
        return True
    if "captcha" in response.text.lower():
        return True
    if "verify" in response.url and "tiktok.com" in response.url:
        return True
    # Empty response often means shadowban
    if len(response.text) < 500:
        return True
    return False

def get_tiktok_session():
    """Create a session with TikTok-appropriate headers."""
    session = requests.Session()
    proxy = random.choice(PROXY_POOL)
    session.proxies = {"http": proxy, "https": proxy}
    session.headers.update({
        "User-Agent": random.choice([
            "Mozilla/5.0 (iPhone; CPU iPhone OS 17_5 like Mac OS X) "
            "AppleWebKit/605.1.15 (KHTML, like Gecko) "
            "Version/17.5 Mobile/15E148 Safari/604.1",
            "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
            "AppleWebKit/537.36 Chrome/125.0.0.0 Safari/537.36",
            "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
            "AppleWebKit/537.36 Chrome/125.0.0.0 Safari/537.36",
        ]),
        "Accept-Language": "en-US,en;q=0.9",
        "Accept": "text/html,application/xhtml+xml,*/*",
        "Sec-Fetch-Dest": "document",
        "Sec-Fetch-Mode": "navigate",
        "Sec-Fetch-Site": "none",
    })
    return session
โš ๏ธ Critical: TLS Fingerprinting

TikTok is one of the few platforms that actively checks TLS fingerprints (JA3/JA4 hashes). Python's requests library has a distinctive TLS fingerprint that TikTok blocks. For direct HTTP requests, you need libraries like curl_cffi or tls-client that spoof browser TLS handshakes. Full browser automation (Playwright/Puppeteer) avoids this issue entirely.

TikTok Official API vs Scraping

TikTok offers several official APIs, but they're heavily restricted compared to what's available through scraping:

FeatureResearch APILogin Kit / Display APIMarketing APIWeb ScrapingMantis API
AccessAcademics only (approved)User-authorized contentAd accounts onlyAny public contentAny public content
Requires ApprovalYes (academic affiliation)Yes (TikTok review)Yes (ad account)NoNo
Rate Limits1,000 requests/day100 requests/dayVaries by spendDepends on proxiesBased on plan
Public Profile Dataโœ… (limited fields)โŒ Own account onlyโŒโœ… Full dataโœ… Full data
Video Searchโœ… (keyword/hashtag)โŒโŒโœ…โœ…
Trending ContentโŒโŒโŒโœ…โœ…
Commentsโœ… (limited)โŒโŒโœ…โœ…
Sound/Music DataโŒโŒโŒโœ…โœ…
Competitor Analysisโœ… (research only)โŒโŒโœ…โœ…
Commercial UseโŒ (research only)LimitedAds onlyGray areaโœ…
CostFree (if approved)FreeTied to ad spend$100-500+/mo (proxies)$0-299/mo
๐Ÿ’ก The TikTok API Gap

TikTok's API ecosystem is the most restrictive of any major social platform. The Research API is academic-only with a multi-month approval process. The Display API only shows your own content. There's no commercial API for public data access โ€” making scraping essentially the only option for businesses doing market research, influencer marketing, or competitive intelligence on TikTok.

Method Comparison

CriteriaPython + RequestsPlaywrightNode.js + PuppeteerMantis API
Setup Time5 min10 min10 min2 min
JS RenderingโŒ (parses embedded JSON)โœ…โœ…โœ…
Anti-DetectionLow (TLS fingerprinted)Good (with stealth)Good (with stealth)Built-in
SpeedFast (when it works)SlowSlowMedium
MaintenanceVery High (JSON structure changes)HighHighNone
CAPTCHA HandlingโŒManual / external solverManual / external solverBuilt-in
ScaleLowLow-MediumLow-MediumHigh
Cost (5K profiles/mo)$100-300 (proxies + TLS lib)$200-500 (proxies + compute)$200-500 (proxies + compute)$99 (Pro plan)
Best ForQuick prototypesRich data + scrollingBackend servicesProduction

Real-World Use Cases

1. Trend Discovery Engine for AI Agents

Give an AI agent the ability to discover what's trending on TikTok โ€” a critical capability for marketing assistants, content strategists, and social media managers.

# agent_tiktok_trends.py โ€” LangChain tool
from langchain.tools import tool
import requests

MANTIS_KEY = "YOUR_API_KEY"

@tool
def discover_tiktok_trends(topic: str) -> str:
    """Discover trending TikTok content for a topic.
    Returns top videos, hashtag stats, and trending sounds."""

    # Scrape the hashtag page for the topic
    resp = requests.post(
        "https://api.mantisapi.com/v1/scrape",
        headers={
            "Authorization": f"Bearer {MANTIS_KEY}",
            "Content-Type": "application/json",
        },
        json={
            "url": f"https://www.tiktok.com/tag/{topic}",
            "extract": {
                "hashtag": "hashtag name",
                "total_views": "total view count for this hashtag",
                "video_count": "number of videos with this hashtag",
                "top_videos": (
                    "array of top 6 videos: author username, "
                    "caption (first 100 chars), views, likes, "
                    "comments, shares, music_title"
                ),
            },
            "render_js": True,
        },
    )
    data = resp.json()

    result = f"TikTok Trends: #{topic}\n"
    result += f"Total Views: {data.get('total_views', 'N/A')}\n"
    result += f"Videos: {data.get('video_count', 'N/A')}\n\n"

    top = data.get("top_videos", [])
    if top:
        result += "Top Performing Videos:\n"
        for i, v in enumerate(top[:5], 1):
            result += (
                f"  {i}. @{v.get('author', '?')} โ€” "
                f"{v.get('caption', '[no caption]')[:80]}\n"
                f"     ๐Ÿ‘ {v.get('views', 0):,} views | "
                f"โค๏ธ {v.get('likes', 0):,} likes | "
                f"๐Ÿ’ฌ {v.get('comments', 0):,}\n"
                f"     ๐ŸŽต {v.get('music_title', 'Original Sound')}\n"
            )

    return result

# Use in a LangChain agent:
# agent = create_agent(tools=[discover_tiktok_trends], ...)
# agent.run("What's trending on TikTok for AI?")

2. Influencer Performance Analyzer

Evaluate TikTok creators before sponsorship deals โ€” engagement rate, content consistency, audience quality, and estimated partnership value.

# tiktok_influencer.py
import requests

MANTIS_KEY = "YOUR_API_KEY"

def analyze_tiktok_creator(username: str) -> dict:
    """Calculate performance metrics for a TikTok creator."""
    resp = requests.post(
        "https://api.mantisapi.com/v1/scrape",
        headers={
            "Authorization": f"Bearer {MANTIS_KEY}",
            "Content-Type": "application/json",
        },
        json={
            "url": f"https://www.tiktok.com/@{username}",
            "extract": {
                "username": "TikTok username",
                "display_name": "display name",
                "verified": "verified status boolean",
                "followers": "follower count as integer",
                "following": "following count as integer",
                "total_likes": "total likes as integer",
                "total_videos": "total video count as integer",
                "recent_videos": (
                    "array of last 12 videos: "
                    "views (integer), likes (integer), "
                    "comments (integer), shares (integer), "
                    "duration_seconds (integer), "
                    "caption (string), music_title (string)"
                ),
            },
            "render_js": True,
        },
    )
    data = resp.json()

    videos = data.get("recent_videos", [])
    followers = data.get("followers", 1)

    if videos and followers > 0:
        # TikTok engagement = (likes + comments + shares) / views
        total_views = sum(v.get("views", 0) for v in videos)
        total_engagement = sum(
            v.get("likes", 0) + v.get("comments", 0) + v.get("shares", 0)
            for v in videos
        )

        # View-based engagement rate (TikTok standard)
        view_eng_rate = (
            (total_engagement / total_views * 100)
            if total_views > 0 else 0
        )

        # Follower-based engagement rate (cross-platform comparison)
        follower_eng_rate = (
            total_engagement / len(videos) / followers * 100
        )

        avg_views = total_views / len(videos)
        avg_likes = sum(v.get("likes", 0) for v in videos) / len(videos)
        avg_comments = sum(v.get("comments", 0) for v in videos) / len(videos)
        avg_shares = sum(v.get("shares", 0) for v in videos) / len(videos)

        # View-to-follower ratio (viral potential)
        view_ratio = avg_views / followers if followers > 0 else 0

        # Average video duration
        avg_duration = sum(
            v.get("duration_seconds", 0) for v in videos
        ) / len(videos)
    else:
        view_eng_rate = follower_eng_rate = 0
        avg_views = avg_likes = avg_comments = avg_shares = 0
        view_ratio = avg_duration = 0

    # Performance tier (TikTok benchmarks)
    if view_eng_rate > 10:
        tier = "Viral"
    elif view_eng_rate > 6:
        tier = "Excellent"
    elif view_eng_rate > 3:
        tier = "Good"
    elif view_eng_rate > 1:
        tier = "Average"
    else:
        tier = "Below Average"

    return {
        "username": data.get("username", username),
        "display_name": data.get("display_name"),
        "verified": data.get("verified", False),
        "followers": followers,
        "total_likes": data.get("total_likes", 0),
        "total_videos": data.get("total_videos", 0),
        "view_engagement_rate": round(view_eng_rate, 2),
        "follower_engagement_rate": round(follower_eng_rate, 2),
        "performance_tier": tier,
        "avg_views": round(avg_views),
        "avg_likes": round(avg_likes),
        "avg_comments": round(avg_comments),
        "avg_shares": round(avg_shares),
        "view_to_follower_ratio": round(view_ratio, 2),
        "avg_duration_seconds": round(avg_duration, 1),
        "estimated_cpm": "$5-15",
        "estimated_sponsored_post": (
            f"${round(avg_views * 0.01, 2):,.0f} - "
            f"${round(avg_views * 0.03, 2):,.0f}"
        ),
    }

# Compare multiple creators
creators = ["charlidamelio", "khaby.lame", "bellapoarch"]
for creator in creators:
    result = analyze_tiktok_creator(creator)
    print(f"\n@{result['username']}:")
    print(f"  Followers: {result['followers']:,}")
    print(f"  Avg Views: {result['avg_views']:,}")
    print(f"  Engagement: {result['view_engagement_rate']}% "
          f"({result['performance_tier']})")
    print(f"  View/Follower: {result['view_to_follower_ratio']}x")
    print(f"  Est. Sponsored Post: {result['estimated_sponsored_post']}")

3. Competitive Hashtag Tracker

Monitor hashtag performance over time to identify trends, measure campaign impact, and discover content gaps in your niche.

# tiktok_hashtag_tracker.py
import requests
import json
from datetime import datetime

MANTIS_KEY = "YOUR_API_KEY"

TRACKED_HASHTAGS = [
    "webscraping", "aiagents", "pythonprogramming",
    "automation", "apidevelopment", "datascience",
    "machinelearning", "devtools",
]

def track_hashtag(tag: str) -> dict:
    """Get current stats and top content for a hashtag."""
    resp = requests.post(
        "https://api.mantisapi.com/v1/scrape",
        headers={
            "Authorization": f"Bearer {MANTIS_KEY}",
            "Content-Type": "application/json",
        },
        json={
            "url": f"https://www.tiktok.com/tag/{tag}",
            "extract": {
                "hashtag": "hashtag name",
                "total_views": "total view count as integer",
                "video_count": "number of videos as integer",
                "top_videos": (
                    "array of top 6 videos: "
                    "author, caption (first 100 chars), "
                    "views, likes, comments, shares, "
                    "music_title"
                ),
            },
            "render_js": True,
        },
    )
    data = resp.json()

    # Calculate averages for top content
    top = data.get("top_videos", [])
    if top:
        avg_views = sum(v.get("views", 0) for v in top) / len(top)
        avg_eng = sum(
            v.get("likes", 0) + v.get("comments", 0) + v.get("shares", 0)
            for v in top
        ) / len(top)
        top_sounds = {}
        for v in top:
            sound = v.get("music_title", "Original")
            top_sounds[sound] = top_sounds.get(sound, 0) + 1
    else:
        avg_views = avg_eng = 0
        top_sounds = {}

    return {
        "hashtag": tag,
        "total_views": data.get("total_views", 0),
        "video_count": data.get("video_count", 0),
        "top_avg_views": round(avg_views),
        "top_avg_engagement": round(avg_eng),
        "trending_sounds": dict(
            sorted(top_sounds.items(), key=lambda x: -x[1])
        ),
        "top_creators": [
            v.get("author") for v in top[:3]
        ],
        "scraped_at": datetime.utcnow().isoformat(),
    }

# Run tracking for all hashtags
report = {
    "date": datetime.utcnow().strftime("%Y-%m-%d"),
    "hashtags": [],
}

for tag in TRACKED_HASHTAGS:
    result = track_hashtag(tag)
    report["hashtags"].append(result)
    print(f"#{result['hashtag']}: "
          f"{result.get('total_views', 0):,} views, "
          f"{result.get('video_count', 0):,} videos")

# Save daily snapshot
date = datetime.utcnow().strftime("%Y-%m-%d")
with open(f"tiktok_hashtags_{date}.json", "w") as f:
    json.dump(report, f, indent=2)

# Identify opportunities
print("\n๐Ÿ“Š Hashtag Opportunities (high views, low competition):")
sorted_tags = sorted(
    report["hashtags"],
    key=lambda x: (
        x.get("total_views", 0) /
        max(x.get("video_count", 1), 1)
    ),
    reverse=True,
)
for tag in sorted_tags[:3]:
    views_per_video = (
        tag.get("total_views", 0) /
        max(tag.get("video_count", 1), 1)
    )
    print(f"  #{tag['hashtag']}: "
          f"{views_per_video:,.0f} views/video avg")

TikTok scraping carries unique legal considerations beyond typical social media platforms, due to TikTok's regulatory environment:

โš–๏ธ Best Practices for Legal Safety

Only scrape publicly visible metadata (don't download videos). Don't scrape data identifiable as belonging to minors. Respect private accounts. Don't store personal data of EU residents without a legal basis. Use rate limiting to avoid service disruption. Never create fake accounts. Consult a lawyer for commercial use cases โ€” especially given TikTok's evolving regulatory landscape.

Production-Ready TikTok Scraping

Stop fighting TLS fingerprinting, puzzle CAPTCHAs, and API signature algorithms. Mantis extracts structured TikTok data with a single API call.

View Pricing Get Started Free

Frequently Asked Questions

Is it legal to scrape TikTok data?

Scraping publicly available TikTok data is in a legal gray area. The hiQ v. LinkedIn ruling supports scraping public data, but TikTok's Terms of Service prohibit it. TikTok also faces unique regulatory scrutiny (COPPA, national security). For commercial use, consult legal counsel and consider a managed API approach.

How do I scrape TikTok without getting blocked?

TikTok uses TLS fingerprinting, puzzle CAPTCHAs, device ID tracking, and behavioral analysis. Use headless browsers with stealth plugins (not raw HTTP requests โ€” TLS fingerprinting catches those), rotating residential proxies, and random delays of 3-10 seconds. Or use a web scraping API like Mantis that handles anti-blocking automatically.

Can I use TikTok's official API instead of scraping?

TikTok's Research API requires academic affiliation and multi-month approval. The Login Kit / Display API only shows your own content. The Marketing API is restricted to ad accounts. None provide commercial access to public profile/video data at scale. Scraping or a web scraping API is the practical option.

What data can I extract from TikTok?

From public profiles: username, display name, bio, follower/following/likes counts, verified status, and video list. From videos: caption, views, likes, comments, shares, duration, and music/sound info. From hashtags: view count and top videos. From trending: discover page videos and popular sounds.

What Python library is best for scraping TikTok?

Playwright with stealth plugins is the most reliable since TikTok requires JavaScript rendering and uses TLS fingerprinting that blocks standard HTTP libraries. For API-level requests, curl_cffi can spoof browser TLS fingerprints. For production, use a managed API like Mantis.

How do I scrape TikTok trending videos?

TikTok's trending/discover page requires JavaScript rendering. Use Playwright or Puppeteer to load the page, scroll to load content, and extract video metadata. TikTok's internal API endpoints (like /api/recommend/item_list/) return structured data but require valid signatures. Mantis can extract trending data with a single API call.

Related Guides