๐ Table of Contents
- Why Scrape TikTok Data?
- What Data Can You Extract?
- Method 1: Python + Requests (TikTok Web API)
- Method 2: Playwright (Headless Browser)
- Method 3: Node.js + Puppeteer
- Method 4: Web Scraping API (Easiest)
- Beating TikTok's Anti-Bot Detection
- TikTok Official API vs Scraping
- Method Comparison
- Real-World Use Cases
- Legal Considerations
- FAQ
Why Scrape TikTok Data?
TikTok has over 2 billion monthly active users and is the fastest-growing social media platform in history. It's not just a video app anymore โ it's a search engine, a commerce platform, and the cultural epicenter for Gen Z and Millennials. Businesses, researchers, and developers scrape TikTok for:
- Trend discovery โ Identify viral trends, sounds, and formats before they peak, giving brands a first-mover advantage
- Influencer marketing โ Evaluate creators by engagement rate, follower growth, content quality, and audience demographics before sponsorship deals
- Competitor monitoring โ Track competitor content strategies, posting frequency, and engagement benchmarks
- Market research โ Discover consumer preferences, product reviews, and emerging niches through TikTok's organic content
- Sound & music analytics โ Track which sounds are trending and which creators are using them โ critical for music marketing
- E-commerce intelligence โ Monitor TikTok Shop products, pricing, reviews, and sales performance
- AI agent social intelligence โ Give AI assistants the ability to research trends, creators, and cultural moments on TikTok
- Academic research โ Study content virality, recommendation algorithms, and social media behavior at scale
TikTok's official APIs are extremely restricted โ the Research API requires academic affiliation, the Display API only shows your own content, and neither provides the public discovery data that marketers and researchers need. Scraping is often the only practical option.
What Data Can You Extract?
TikTok profiles and videos contain rich metadata, all publicly accessible without authentication:
| Data Point | Available | Source |
|---|---|---|
| Username & Display Name | โ | Profile page |
| Bio & External Links | โ | Profile page |
| Follower / Following / Likes Count | โ | Profile page |
| Verified Badge | โ | Profile page |
| Profile Picture (HD) | โ | Profile page |
| Video List (recent 30+) | โ | Profile page + scroll |
| Video Caption & Hashtags | โ | Video page |
| Video Likes / Comments / Shares / Views | โ | Video page |
| Video Duration | โ | Video page |
| Sound / Music Info | โ | Video page |
| Hashtag View Count | โ | Hashtag page |
| Hashtag Top Videos | โ | Hashtag page |
| Comments (text, author, likes) | โ | Video page |
| Trending Videos | โ | Discover / For You |
| Sound/Music Usage Count | โ | Sound page |
| TikTok Shop Products | โ | Shop pages |
Method 1: Python + Requests (TikTok Web API)
TikTok's web app makes internal API calls that return JSON data. These undocumented endpoints are faster than full browser rendering โ but they require specific headers and signatures that TikTok rotates frequently.
Install Dependencies
pip install requests
Public Profile Scraper
# tiktok_scraper.py import requests import json import re HEADERS = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) " "AppleWebKit/537.36 (KHTML, like Gecko) " "Chrome/125.0.0.0 Safari/537.36", "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "Accept-Language": "en-US,en;q=0.9", "Referer": "https://www.tiktok.com/", } def scrape_tiktok_profile(username: str) -> dict: """Scrape public TikTok profile data via server-rendered HTML.""" url = f"https://www.tiktok.com/@{username}" resp = requests.get(url, headers=HEADERS, timeout=15) if resp.status_code == 404: return {"error": f"User '@{username}' not found"} if resp.status_code != 200: return {"error": f"Request failed: {resp.status_code}"} # TikTok embeds JSON data in a script tag match = re.search( r'<script id="__UNIVERSAL_DATA_FOR_REHYDRATION__"[^>]*>(.+?)</script>', resp.text ) if not match: # Try SIGI_STATE fallback match = re.search( r'<script id="SIGI_STATE"[^>]*>(.+?)</script>', resp.text ) if not match: return {"error": "Could not find embedded data โ TikTok may have changed the page structure"} try: data = json.loads(match.group(1)) except json.JSONDecodeError: return {"error": "Failed to parse embedded JSON"} # Navigate the nested data structure user_module = ( data.get("__DEFAULT_SCOPE__", {}) .get("webapp.user-detail", {}) ) user_info = user_module.get("userInfo", {}) user = user_info.get("user", {}) stats = user_info.get("stats", {}) if not user: return {"error": "User data not found in response"} # Extract recent videos if available item_module = ( data.get("__DEFAULT_SCOPE__", {}) .get("webapp.user-detail", {}) .get("itemList", []) ) videos = [] for item in item_module[:12]: videos.append({ "id": item.get("id"), "desc": item.get("desc", ""), "url": f"https://www.tiktok.com/@{username}/video/{item.get('id')}", "likes": item.get("stats", {}).get("diggCount", 0), "comments": item.get("stats", {}).get("commentCount", 0), "shares": item.get("stats", {}).get("shareCount", 0), "views": item.get("stats", {}).get("playCount", 0), "duration": item.get("video", {}).get("duration", 0), "music": item.get("music", {}).get("title", ""), "music_author": item.get("music", {}).get("authorName", ""), "create_time": item.get("createTime"), }) return { "username": user.get("uniqueId"), "display_name": user.get("nickname"), "bio": user.get("signature"), "verified": user.get("verified", False), "followers": stats.get("followerCount", 0), "following": stats.get("followingCount", 0), "total_likes": stats.get("heartCount", 0), "total_videos": stats.get("videoCount", 0), "profile_pic": user.get("avatarLarger"), "region": user.get("region"), "recent_videos": videos, } # Example usage profile = scrape_tiktok_profile("tiktok") print(json.dumps(profile, indent=2)) print(f"\nFollowers: {profile.get('followers', 0):,}") print(f"Total Likes: {profile.get('total_likes', 0):,}")
TikTok's embedded JSON structure (the __UNIVERSAL_DATA_FOR_REHYDRATION__ script) changes without notice. Previously it was SIGI_STATE, and before that __NEXT_DATA__. Always have fallback parsing logic and test regularly. A managed API abstracts this instability away.
Hashtag Explorer
# tiktok_hashtag.py def scrape_tiktok_hashtag(tag: str) -> dict: """Scrape TikTok hashtag page for view count and top videos.""" url = f"https://www.tiktok.com/tag/{tag}" resp = requests.get(url, headers=HEADERS, timeout=15) if resp.status_code != 200: return {"error": f"Failed to fetch #{tag}: {resp.status_code}"} match = re.search( r'<script id="__UNIVERSAL_DATA_FOR_REHYDRATION__"[^>]*>(.+?)</script>', resp.text ) if not match: return {"error": "Could not parse hashtag page"} data = json.loads(match.group(1)) challenge_module = ( data.get("__DEFAULT_SCOPE__", {}) .get("webapp.challenge-detail", {}) ) challenge_info = challenge_module.get("challengeInfo", {}) challenge = challenge_info.get("challenge", {}) stats = challenge_info.get("stats", {}) # Get top videos item_list = challenge_module.get("itemList", []) top_videos = [] for item in item_list[:9]: top_videos.append({ "id": item.get("id"), "desc": item.get("desc", "")[:200], "author": item.get("author", {}).get("uniqueId", ""), "views": item.get("stats", {}).get("playCount", 0), "likes": item.get("stats", {}).get("diggCount", 0), "comments": item.get("stats", {}).get("commentCount", 0), "shares": item.get("stats", {}).get("shareCount", 0), }) return { "hashtag": tag, "title": challenge.get("title", tag), "view_count": stats.get("viewCount", 0), "video_count": stats.get("videoCount", 0), "top_videos": top_videos, } result = scrape_tiktok_hashtag("webdevelopment") print(f"#{result['hashtag']}: {result.get('view_count', 0):,} views")
Method 2: Playwright (Headless Browser)
TikTok is a JavaScript-heavy single-page app with aggressive bot detection. Playwright renders the full page, handles dynamic content loading, and can scroll through video feeds to collect large datasets.
Install
pip install playwright playwright-stealth playwright install chromium
Full-Render TikTok Scraper
# playwright_tiktok.py import asyncio from playwright.async_api import async_playwright from playwright_stealth import stealth_async import json import random async def scrape_tiktok_profile(username: str) -> dict: async with async_playwright() as p: browser = await p.chromium.launch(headless=True) context = await browser.new_context( user_agent=( "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) " "AppleWebKit/537.36 (KHTML, like Gecko) " "Chrome/125.0.0.0 Safari/537.36" ), viewport={"width": 1920, "height": 1080}, locale="en-US", ) page = await context.new_page() await stealth_async(page) # Intercept API responses for richer data api_data = {} async def handle_response(response): url = response.url if "/api/user/detail" in url or "userInfo" in url: try: data = await response.json() api_data["user"] = data except Exception: pass elif "/api/post/item_list" in url or "itemList" in url: try: data = await response.json() api_data["items"] = data except Exception: pass page.on("response", handle_response) url = f"https://www.tiktok.com/@{username}" await page.goto(url, wait_until="networkidle") # Dismiss cookie banner if present try: cookie_btn = page.locator( 'button:has-text("Accept all")' ) await cookie_btn.click(timeout=3000) except Exception: pass await page.wait_for_timeout(3000) # Extract data from the page DOM profile = await page.evaluate("""() => { // Try to get data from the embedded JSON const script = document.querySelector( '#__UNIVERSAL_DATA_FOR_REHYDRATION__' ); if (script) { try { const data = JSON.parse(script.textContent); const userDetail = data?.['__DEFAULT_SCOPE__']?.['webapp.user-detail']; const user = userDetail?.userInfo?.user || {}; const stats = userDetail?.userInfo?.stats || {}; const items = userDetail?.itemList || []; return { username: user.uniqueId, display_name: user.nickname, bio: user.signature, verified: user.verified || false, followers: stats.followerCount || 0, following: stats.followingCount || 0, total_likes: stats.heartCount || 0, total_videos: stats.videoCount || 0, profile_pic: user.avatarLarger, videos: items.slice(0, 12).map(v => ({ id: v.id, desc: v.desc, views: v.stats?.playCount || 0, likes: v.stats?.diggCount || 0, comments: v.stats?.commentCount || 0, shares: v.stats?.shareCount || 0, duration: v.video?.duration || 0, music: v.music?.title || '', })), source: 'embedded_json', }; } catch (e) {} } // Fallback: parse from visible DOM elements const getName = () => { const h1 = document.querySelector('h1[data-e2e="user-title"]'); return h1 ? h1.textContent.trim() : null; }; const getSubtitle = () => { const h2 = document.querySelector('h2[data-e2e="user-subtitle"]'); return h2 ? h2.textContent.trim() : null; }; const getCount = (selector) => { const el = document.querySelector(selector); if (!el) return 0; const text = el.textContent.replace(/,/g, ''); const match = text.match(/([\d.]+)\s*(K|M|B)?/i); if (!match) return 0; let num = parseFloat(match[1]); const suffix = (match[2] || '').toUpperCase(); if (suffix === 'K') num *= 1000; if (suffix === 'M') num *= 1000000; if (suffix === 'B') num *= 1000000000; return Math.round(num); }; return { username: getSubtitle(), display_name: getName(), bio: document.querySelector( 'h2[data-e2e="user-bio"]' )?.textContent?.trim() || '', followers: getCount( '[data-e2e="followers-count"]' ), following: getCount( '[data-e2e="following-count"]' ), total_likes: getCount( '[data-e2e="likes-count"]' ), videos: [], source: 'dom_fallback', }; }""") await browser.close() return profile # Run it data = asyncio.run(scrape_tiktok_profile("tiktok")) print(json.dumps(data, indent=2))
Scroll & Collect Videos
# tiktok_scroll.py async def scrape_user_videos( username: str, max_videos: int = 50 ) -> list: """Scroll through a TikTok profile and collect video data.""" async with async_playwright() as p: browser = await p.chromium.launch(headless=True) context = await browser.new_context( user_agent=( "Mozilla/5.0 (Windows NT 10.0; Win64; x64) " "AppleWebKit/537.36 Chrome/125.0.0.0" ), viewport={"width": 1920, "height": 1080}, ) page = await context.new_page() await stealth_async(page) await page.goto( f"https://www.tiktok.com/@{username}", wait_until="networkidle", ) # Dismiss cookie banner try: await page.click( 'button:has-text("Accept all")', timeout=3000 ) except Exception: pass videos = [] prev_count = 0 scroll_attempts = 0 while len(videos) < max_videos and scroll_attempts < 30: # Collect video links and metadata new_videos = await page.evaluate("""() => { const items = document.querySelectorAll( '[data-e2e="user-post-item"]' ); return [...items].map(item => { const link = item.querySelector('a'); const views = item.querySelector( '[data-e2e="video-views"]' ); return { url: link ? link.href : null, views_text: views ? views.textContent.trim() : '0', }; }).filter(v => v.url); }""") for v in new_videos: if v not in videos: videos.append(v) if len(videos) == prev_count: scroll_attempts += 1 else: scroll_attempts = 0 prev_count = len(videos) # Scroll down await page.evaluate( "window.scrollBy(0, window.innerHeight * 2)" ) await page.wait_for_timeout( random.randint(1500, 3000) ) await browser.close() return videos[:max_videos]
TikTok's web app makes internal API calls as you scroll. Intercept these responses with page.on("response") to capture structured JSON data โ far cleaner than parsing DOM elements. Look for URLs containing /api/post/item_list/ or /api/comment/list/.
Method 3: Node.js + Puppeteer
Puppeteer with the stealth plugin provides excellent TikTok compatibility. The key advantage: intercepting network requests to capture TikTok's internal API responses as structured JSON.
Install
npm install puppeteer puppeteer-extra puppeteer-extra-plugin-stealth
Profile Scraper with API Interception
// tiktok-scraper.mjs import puppeteer from "puppeteer-extra"; import StealthPlugin from "puppeteer-extra-plugin-stealth"; puppeteer.use(StealthPlugin()); async function scrapeTikTokProfile(username) { const browser = await puppeteer.launch({ headless: "new", args: [ "--no-sandbox", "--disable-setuid-sandbox", "--disable-blink-features=AutomationControlled", ], }); const page = await browser.newPage(); await page.setViewport({ width: 1920, height: 1080 }); // Intercept TikTok's internal API responses let userData = null; let videoData = []; page.on("response", async (resp) => { const url = resp.url(); if (url.includes("/api/user/detail") || url.includes("user-detail")) { try { const json = await resp.json(); userData = json; } catch (e) {} } if (url.includes("/api/post/item_list") || url.includes("item_list")) { try { const json = await resp.json(); if (json.itemList) { videoData.push(...json.itemList); } } catch (e) {} } }); await page.goto(`https://www.tiktok.com/@${username}`, { waitUntil: "networkidle0", timeout: 30000, }); // Dismiss cookie consent try { await page.click('button:has-text("Accept all")', { timeout: 3000, }); } catch (e) {} await page.waitForTimeout(4000); // If API interception didn't work, parse from page if (!userData) { userData = await page.evaluate(() => { const script = document.querySelector( "#__UNIVERSAL_DATA_FOR_REHYDRATION__" ); if (!script) return null; try { const data = JSON.parse(script.textContent); return data?.["__DEFAULT_SCOPE__"]?.[ "webapp.user-detail" ]; } catch (e) { return null; } }); } let result; if (userData?.userInfo) { const user = userData.userInfo.user || {}; const stats = userData.userInfo.stats || {}; const items = userData.itemList || videoData; result = { username: user.uniqueId || username, display_name: user.nickname || null, bio: user.signature || null, verified: user.verified || false, followers: stats.followerCount || 0, following: stats.followingCount || 0, total_likes: stats.heartCount || 0, total_videos: stats.videoCount || 0, profile_pic: user.avatarLarger || null, region: user.region || null, videos: (items || []).slice(0, 12).map((v) => ({ id: v.id, desc: v.desc || "", views: v.stats?.playCount || 0, likes: v.stats?.diggCount || 0, comments: v.stats?.commentCount || 0, shares: v.stats?.shareCount || 0, duration: v.video?.duration || 0, music: v.music?.title || "", music_author: v.music?.authorName || "", })), }; } else { // Fallback: parse from DOM result = await page.evaluate(() => ({ username: document.querySelector( 'h2[data-e2e="user-subtitle"]' )?.textContent?.trim(), display_name: document.querySelector( 'h1[data-e2e="user-title"]' )?.textContent?.trim(), bio: document.querySelector( 'h2[data-e2e="user-bio"]' )?.textContent?.trim(), followers: document.querySelector( '[data-e2e="followers-count"]' )?.textContent?.trim(), following: document.querySelector( '[data-e2e="following-count"]' )?.textContent?.trim(), total_likes: document.querySelector( '[data-e2e="likes-count"]' )?.textContent?.trim(), })); } await browser.close(); return result; } // Batch scrape with rate limiting async function scrapeMultiple(usernames, delayMs = 8000) { const results = []; for (const username of usernames) { try { const profile = await scrapeTikTokProfile(username); results.push(profile); console.log( `โ @${profile.username} โ ` + `${(profile.followers || 0).toLocaleString()} followers` ); } catch (err) { console.error(`โ @${username}: ${err.message}`); results.push({ username, error: err.message }); } await new Promise((r) => setTimeout(r, delayMs)); } return results; } // Usage const profiles = await scrapeMultiple([ "tiktok", "charlidamelio", "khaby.lame" ]); console.log(JSON.stringify(profiles, null, 2));
Method 4: Web Scraping API (Easiest)
The most reliable approach for production TikTok scraping. A web scraping API handles TikTok's aggressive anti-bot detection, TLS fingerprinting, CAPTCHA solving, and proxy rotation โ you send a URL, get structured data back.
Using the Mantis API
# One API call โ structured TikTok data
import requests
resp = requests.post(
"https://api.mantisapi.com/v1/scrape",
headers={
"Authorization": "Bearer YOUR_API_KEY",
"Content-Type": "application/json",
},
json={
"url": "https://www.tiktok.com/@charlidamelio",
"extract": {
"username": "TikTok username",
"display_name": "display name",
"bio": "profile bio text",
"verified": "whether account is verified",
"followers": "follower count as integer",
"following": "following count as integer",
"total_likes": "total likes received as integer",
"total_videos": "total video count",
"recent_videos": (
"array of recent videos with: "
"caption, views, likes, comments, shares, "
"duration_seconds, music_title"
),
},
"render_js": True,
},
)
profile = resp.json()
print(f"@{profile.get('username')} โ "
f"{profile.get('followers', 0):,} followers")
Skip the Bot Detection & CAPTCHAs
Mantis handles TikTok's TLS fingerprinting, puzzle CAPTCHAs, proxy rotation, and JavaScript rendering โ so you don't have to.
View Pricing Get Started FreeNode.js with Mantis
// mantis-tiktok.mjs
const resp = await fetch("https://api.mantisapi.com/v1/scrape", {
method: "POST",
headers: {
Authorization: "Bearer YOUR_API_KEY",
"Content-Type": "application/json",
},
body: JSON.stringify({
url: "https://www.tiktok.com/@tiktok",
extract: {
username: "TikTok handle",
followers: "follower count as number",
total_likes: "total likes as number",
recent_videos:
"array of last 12 videos with caption, " +
"views, likes, comments, shares, music",
},
render_js: true,
}),
});
const profile = await resp.json();
console.log(profile);
cURL Example
curl -X POST https://api.mantisapi.com/v1/scrape \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://www.tiktok.com/tag/ai",
"extract": {
"hashtag": "hashtag name",
"view_count": "total views as integer",
"top_videos": "array of top 9 videos with: author, caption, views, likes"
},
"render_js": true
}'
Beating TikTok's Anti-Bot Detection
TikTok has some of the most sophisticated anti-scraping defenses of any social platform โ arguably more aggressive than Instagram. Here's what you're up against:
TikTok's Defense Layers
| Defense | What It Does | Countermeasure |
|---|---|---|
| TLS Fingerprinting | Detects non-browser TLS handshakes (JA3/JA4) | Use real browser (Playwright/Puppeteer) or TLS-spoofing libraries |
| CAPTCHA (Puzzle Slider) | Slide-to-verify challenges on suspicious requests | CAPTCHA solving services, or use a managed API |
| Device ID Tracking | Assigns persistent device fingerprints across sessions | Fresh browser profiles, randomized fingerprints |
| IP Rate Limiting | Blocks IPs after rapid successive requests | Rotating residential proxies, long random delays |
| Browser Fingerprinting | Detects WebDriver, headless indicators, automation | Stealth plugins (playwright-stealth, puppeteer-extra) |
| Behavioral Analysis | Detects non-human browsing patterns | Random delays, mouse movement simulation, scroll patterns |
| Geo-Restrictions | Shows different content based on IP location | Geo-targeted proxies for specific markets |
| API Signature (X-Bogus) | Signs API requests with encrypted parameters | Reverse-engineer signing algorithm (constantly changes) |
| msToken Cookie | Session validation token required for API calls | Generate via real browser session, rotate frequently |
Essential Anti-Detection Techniques
# tiktok_stealth.py import random import time PROXY_POOL = [ # Residential proxies are essential for TikTok # Datacenter IPs are blocked almost immediately "http://user:pass@res-proxy1.example.com:8080", "http://user:pass@res-proxy2.example.com:8080", ] def tiktok_delay(): """TikTok needs moderate delays with occasional pauses.""" base = random.uniform(3, 10) # 15% chance of a longer pause (human-like browsing) if random.random() < 0.15: base += random.uniform(15, 45) time.sleep(base) def is_blocked(response) -> bool: """Detect TikTok blocking patterns.""" if response.status_code == 429: return True if response.status_code == 403: return True if "captcha" in response.text.lower(): return True if "verify" in response.url and "tiktok.com" in response.url: return True # Empty response often means shadowban if len(response.text) < 500: return True return False def get_tiktok_session(): """Create a session with TikTok-appropriate headers.""" session = requests.Session() proxy = random.choice(PROXY_POOL) session.proxies = {"http": proxy, "https": proxy} session.headers.update({ "User-Agent": random.choice([ "Mozilla/5.0 (iPhone; CPU iPhone OS 17_5 like Mac OS X) " "AppleWebKit/605.1.15 (KHTML, like Gecko) " "Version/17.5 Mobile/15E148 Safari/604.1", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) " "AppleWebKit/537.36 Chrome/125.0.0.0 Safari/537.36", "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) " "AppleWebKit/537.36 Chrome/125.0.0.0 Safari/537.36", ]), "Accept-Language": "en-US,en;q=0.9", "Accept": "text/html,application/xhtml+xml,*/*", "Sec-Fetch-Dest": "document", "Sec-Fetch-Mode": "navigate", "Sec-Fetch-Site": "none", }) return session
TikTok is one of the few platforms that actively checks TLS fingerprints (JA3/JA4 hashes). Python's requests library has a distinctive TLS fingerprint that TikTok blocks. For direct HTTP requests, you need libraries like curl_cffi or tls-client that spoof browser TLS handshakes. Full browser automation (Playwright/Puppeteer) avoids this issue entirely.
TikTok Official API vs Scraping
TikTok offers several official APIs, but they're heavily restricted compared to what's available through scraping:
| Feature | Research API | Login Kit / Display API | Marketing API | Web Scraping | Mantis API |
|---|---|---|---|---|---|
| Access | Academics only (approved) | User-authorized content | Ad accounts only | Any public content | Any public content |
| Requires Approval | Yes (academic affiliation) | Yes (TikTok review) | Yes (ad account) | No | No |
| Rate Limits | 1,000 requests/day | 100 requests/day | Varies by spend | Depends on proxies | Based on plan |
| Public Profile Data | โ (limited fields) | โ Own account only | โ | โ Full data | โ Full data |
| Video Search | โ (keyword/hashtag) | โ | โ | โ | โ |
| Trending Content | โ | โ | โ | โ | โ |
| Comments | โ (limited) | โ | โ | โ | โ |
| Sound/Music Data | โ | โ | โ | โ | โ |
| Competitor Analysis | โ (research only) | โ | โ | โ | โ |
| Commercial Use | โ (research only) | Limited | Ads only | Gray area | โ |
| Cost | Free (if approved) | Free | Tied to ad spend | $100-500+/mo (proxies) | $0-299/mo |
TikTok's API ecosystem is the most restrictive of any major social platform. The Research API is academic-only with a multi-month approval process. The Display API only shows your own content. There's no commercial API for public data access โ making scraping essentially the only option for businesses doing market research, influencer marketing, or competitive intelligence on TikTok.
Method Comparison
| Criteria | Python + Requests | Playwright | Node.js + Puppeteer | Mantis API |
|---|---|---|---|---|
| Setup Time | 5 min | 10 min | 10 min | 2 min |
| JS Rendering | โ (parses embedded JSON) | โ | โ | โ |
| Anti-Detection | Low (TLS fingerprinted) | Good (with stealth) | Good (with stealth) | Built-in |
| Speed | Fast (when it works) | Slow | Slow | Medium |
| Maintenance | Very High (JSON structure changes) | High | High | None |
| CAPTCHA Handling | โ | Manual / external solver | Manual / external solver | Built-in |
| Scale | Low | Low-Medium | Low-Medium | High |
| Cost (5K profiles/mo) | $100-300 (proxies + TLS lib) | $200-500 (proxies + compute) | $200-500 (proxies + compute) | $99 (Pro plan) |
| Best For | Quick prototypes | Rich data + scrolling | Backend services | Production |
Real-World Use Cases
1. Trend Discovery Engine for AI Agents
Give an AI agent the ability to discover what's trending on TikTok โ a critical capability for marketing assistants, content strategists, and social media managers.
# agent_tiktok_trends.py โ LangChain tool from langchain.tools import tool import requests MANTIS_KEY = "YOUR_API_KEY" @tool def discover_tiktok_trends(topic: str) -> str: """Discover trending TikTok content for a topic. Returns top videos, hashtag stats, and trending sounds.""" # Scrape the hashtag page for the topic resp = requests.post( "https://api.mantisapi.com/v1/scrape", headers={ "Authorization": f"Bearer {MANTIS_KEY}", "Content-Type": "application/json", }, json={ "url": f"https://www.tiktok.com/tag/{topic}", "extract": { "hashtag": "hashtag name", "total_views": "total view count for this hashtag", "video_count": "number of videos with this hashtag", "top_videos": ( "array of top 6 videos: author username, " "caption (first 100 chars), views, likes, " "comments, shares, music_title" ), }, "render_js": True, }, ) data = resp.json() result = f"TikTok Trends: #{topic}\n" result += f"Total Views: {data.get('total_views', 'N/A')}\n" result += f"Videos: {data.get('video_count', 'N/A')}\n\n" top = data.get("top_videos", []) if top: result += "Top Performing Videos:\n" for i, v in enumerate(top[:5], 1): result += ( f" {i}. @{v.get('author', '?')} โ " f"{v.get('caption', '[no caption]')[:80]}\n" f" ๐ {v.get('views', 0):,} views | " f"โค๏ธ {v.get('likes', 0):,} likes | " f"๐ฌ {v.get('comments', 0):,}\n" f" ๐ต {v.get('music_title', 'Original Sound')}\n" ) return result # Use in a LangChain agent: # agent = create_agent(tools=[discover_tiktok_trends], ...) # agent.run("What's trending on TikTok for AI?")
2. Influencer Performance Analyzer
Evaluate TikTok creators before sponsorship deals โ engagement rate, content consistency, audience quality, and estimated partnership value.
# tiktok_influencer.py import requests MANTIS_KEY = "YOUR_API_KEY" def analyze_tiktok_creator(username: str) -> dict: """Calculate performance metrics for a TikTok creator.""" resp = requests.post( "https://api.mantisapi.com/v1/scrape", headers={ "Authorization": f"Bearer {MANTIS_KEY}", "Content-Type": "application/json", }, json={ "url": f"https://www.tiktok.com/@{username}", "extract": { "username": "TikTok username", "display_name": "display name", "verified": "verified status boolean", "followers": "follower count as integer", "following": "following count as integer", "total_likes": "total likes as integer", "total_videos": "total video count as integer", "recent_videos": ( "array of last 12 videos: " "views (integer), likes (integer), " "comments (integer), shares (integer), " "duration_seconds (integer), " "caption (string), music_title (string)" ), }, "render_js": True, }, ) data = resp.json() videos = data.get("recent_videos", []) followers = data.get("followers", 1) if videos and followers > 0: # TikTok engagement = (likes + comments + shares) / views total_views = sum(v.get("views", 0) for v in videos) total_engagement = sum( v.get("likes", 0) + v.get("comments", 0) + v.get("shares", 0) for v in videos ) # View-based engagement rate (TikTok standard) view_eng_rate = ( (total_engagement / total_views * 100) if total_views > 0 else 0 ) # Follower-based engagement rate (cross-platform comparison) follower_eng_rate = ( total_engagement / len(videos) / followers * 100 ) avg_views = total_views / len(videos) avg_likes = sum(v.get("likes", 0) for v in videos) / len(videos) avg_comments = sum(v.get("comments", 0) for v in videos) / len(videos) avg_shares = sum(v.get("shares", 0) for v in videos) / len(videos) # View-to-follower ratio (viral potential) view_ratio = avg_views / followers if followers > 0 else 0 # Average video duration avg_duration = sum( v.get("duration_seconds", 0) for v in videos ) / len(videos) else: view_eng_rate = follower_eng_rate = 0 avg_views = avg_likes = avg_comments = avg_shares = 0 view_ratio = avg_duration = 0 # Performance tier (TikTok benchmarks) if view_eng_rate > 10: tier = "Viral" elif view_eng_rate > 6: tier = "Excellent" elif view_eng_rate > 3: tier = "Good" elif view_eng_rate > 1: tier = "Average" else: tier = "Below Average" return { "username": data.get("username", username), "display_name": data.get("display_name"), "verified": data.get("verified", False), "followers": followers, "total_likes": data.get("total_likes", 0), "total_videos": data.get("total_videos", 0), "view_engagement_rate": round(view_eng_rate, 2), "follower_engagement_rate": round(follower_eng_rate, 2), "performance_tier": tier, "avg_views": round(avg_views), "avg_likes": round(avg_likes), "avg_comments": round(avg_comments), "avg_shares": round(avg_shares), "view_to_follower_ratio": round(view_ratio, 2), "avg_duration_seconds": round(avg_duration, 1), "estimated_cpm": "$5-15", "estimated_sponsored_post": ( f"${round(avg_views * 0.01, 2):,.0f} - " f"${round(avg_views * 0.03, 2):,.0f}" ), } # Compare multiple creators creators = ["charlidamelio", "khaby.lame", "bellapoarch"] for creator in creators: result = analyze_tiktok_creator(creator) print(f"\n@{result['username']}:") print(f" Followers: {result['followers']:,}") print(f" Avg Views: {result['avg_views']:,}") print(f" Engagement: {result['view_engagement_rate']}% " f"({result['performance_tier']})") print(f" View/Follower: {result['view_to_follower_ratio']}x") print(f" Est. Sponsored Post: {result['estimated_sponsored_post']}")
3. Competitive Hashtag Tracker
Monitor hashtag performance over time to identify trends, measure campaign impact, and discover content gaps in your niche.
# tiktok_hashtag_tracker.py import requests import json from datetime import datetime MANTIS_KEY = "YOUR_API_KEY" TRACKED_HASHTAGS = [ "webscraping", "aiagents", "pythonprogramming", "automation", "apidevelopment", "datascience", "machinelearning", "devtools", ] def track_hashtag(tag: str) -> dict: """Get current stats and top content for a hashtag.""" resp = requests.post( "https://api.mantisapi.com/v1/scrape", headers={ "Authorization": f"Bearer {MANTIS_KEY}", "Content-Type": "application/json", }, json={ "url": f"https://www.tiktok.com/tag/{tag}", "extract": { "hashtag": "hashtag name", "total_views": "total view count as integer", "video_count": "number of videos as integer", "top_videos": ( "array of top 6 videos: " "author, caption (first 100 chars), " "views, likes, comments, shares, " "music_title" ), }, "render_js": True, }, ) data = resp.json() # Calculate averages for top content top = data.get("top_videos", []) if top: avg_views = sum(v.get("views", 0) for v in top) / len(top) avg_eng = sum( v.get("likes", 0) + v.get("comments", 0) + v.get("shares", 0) for v in top ) / len(top) top_sounds = {} for v in top: sound = v.get("music_title", "Original") top_sounds[sound] = top_sounds.get(sound, 0) + 1 else: avg_views = avg_eng = 0 top_sounds = {} return { "hashtag": tag, "total_views": data.get("total_views", 0), "video_count": data.get("video_count", 0), "top_avg_views": round(avg_views), "top_avg_engagement": round(avg_eng), "trending_sounds": dict( sorted(top_sounds.items(), key=lambda x: -x[1]) ), "top_creators": [ v.get("author") for v in top[:3] ], "scraped_at": datetime.utcnow().isoformat(), } # Run tracking for all hashtags report = { "date": datetime.utcnow().strftime("%Y-%m-%d"), "hashtags": [], } for tag in TRACKED_HASHTAGS: result = track_hashtag(tag) report["hashtags"].append(result) print(f"#{result['hashtag']}: " f"{result.get('total_views', 0):,} views, " f"{result.get('video_count', 0):,} videos") # Save daily snapshot date = datetime.utcnow().strftime("%Y-%m-%d") with open(f"tiktok_hashtags_{date}.json", "w") as f: json.dump(report, f, indent=2) # Identify opportunities print("\n๐ Hashtag Opportunities (high views, low competition):") sorted_tags = sorted( report["hashtags"], key=lambda x: ( x.get("total_views", 0) / max(x.get("video_count", 1), 1) ), reverse=True, ) for tag in sorted_tags[:3]: views_per_video = ( tag.get("total_views", 0) / max(tag.get("video_count", 1), 1) ) print(f" #{tag['hashtag']}: " f"{views_per_video:,.0f} views/video avg")
Legal Considerations
TikTok scraping carries unique legal considerations beyond typical social media platforms, due to TikTok's regulatory environment:
- TikTok's Terms of Service โ Explicitly prohibit automated data collection, scraping, and crawling. Like Meta, TikTok can pursue breach of contract claims against scrapers
- hiQ Labs v. LinkedIn (2022) โ Ninth Circuit ruled scraping publicly accessible data doesn't violate the CFAA. This precedent generally supports scraping public TikTok content
- Van Buren v. United States (2021) โ Supreme Court narrowed CFAA's "exceeds authorized access" provision โ favorable for scraping public data, but doesn't address ToS/contract claims
- COPPA (Children's Online Privacy) โ TikTok has a significant underage user base. TikTok paid $5.7M FTC settlement in 2019 for COPPA violations. Scraping data from minors carries additional legal risk
- National Security Concerns โ TikTok faces ongoing scrutiny from US, EU, and other governments over data practices. Regulatory changes could affect data access and scraping legality
- GDPR (EU) โ TikTok profiles contain personal data. Scraping EU user data for commercial purposes without a legal basis likely violates GDPR
- CCPA (California) โ Similar obligations for California residents' personal data
- Video Copyright โ TikTok videos are copyrighted by creators. Scraping metadata is different from downloading and republishing video content
- Music & Sound Rights โ TikTok videos often use licensed music. Downloading audio content raises additional copyright concerns
Only scrape publicly visible metadata (don't download videos). Don't scrape data identifiable as belonging to minors. Respect private accounts. Don't store personal data of EU residents without a legal basis. Use rate limiting to avoid service disruption. Never create fake accounts. Consult a lawyer for commercial use cases โ especially given TikTok's evolving regulatory landscape.
Production-Ready TikTok Scraping
Stop fighting TLS fingerprinting, puzzle CAPTCHAs, and API signature algorithms. Mantis extracts structured TikTok data with a single API call.
View Pricing Get Started FreeFrequently Asked Questions
Is it legal to scrape TikTok data?
Scraping publicly available TikTok data is in a legal gray area. The hiQ v. LinkedIn ruling supports scraping public data, but TikTok's Terms of Service prohibit it. TikTok also faces unique regulatory scrutiny (COPPA, national security). For commercial use, consult legal counsel and consider a managed API approach.
How do I scrape TikTok without getting blocked?
TikTok uses TLS fingerprinting, puzzle CAPTCHAs, device ID tracking, and behavioral analysis. Use headless browsers with stealth plugins (not raw HTTP requests โ TLS fingerprinting catches those), rotating residential proxies, and random delays of 3-10 seconds. Or use a web scraping API like Mantis that handles anti-blocking automatically.
Can I use TikTok's official API instead of scraping?
TikTok's Research API requires academic affiliation and multi-month approval. The Login Kit / Display API only shows your own content. The Marketing API is restricted to ad accounts. None provide commercial access to public profile/video data at scale. Scraping or a web scraping API is the practical option.
What data can I extract from TikTok?
From public profiles: username, display name, bio, follower/following/likes counts, verified status, and video list. From videos: caption, views, likes, comments, shares, duration, and music/sound info. From hashtags: view count and top videos. From trending: discover page videos and popular sounds.
What Python library is best for scraping TikTok?
Playwright with stealth plugins is the most reliable since TikTok requires JavaScript rendering and uses TLS fingerprinting that blocks standard HTTP libraries. For API-level requests, curl_cffi can spoof browser TLS fingerprints. For production, use a managed API like Mantis.
How do I scrape TikTok trending videos?
TikTok's trending/discover page requires JavaScript rendering. Use Playwright or Puppeteer to load the page, scroll to load content, and extract video metadata. TikTok's internal API endpoints (like /api/recommend/item_list/) return structured data but require valid signatures. Mantis can extract trending data with a single API call.
Related Guides
- How to Scrape Google Search Results in 2026
- How to Scrape Amazon Product Data in 2026
- How to Scrape LinkedIn Profiles & Jobs in 2026
- How to Scrape Twitter (X) Data in 2026
- How to Scrape Instagram Data in 2026
- How to Scrape YouTube Data in 2026
- How to Scrape Reddit Data in 2026
- Web Scraping with Python: Complete Guide
- Web Scraping with Node.js: Complete Guide
- Anti-Blocking Web Scraping Guide