How to Scrape Instagram Data in 2026: Posts, Profiles & Reels

Q: Is it legal to scrape Instagram data?

Scraping publicly available Instagram profiles is in a legal gray area. The hiQ v. LinkedIn ruling supports scraping public data, but Meta's Terms of Service explicitly prohibit automated data collection. Meta has aggressively pursued legal action against scrapers. For commercial use, consult legal counsel and consider an API-based approach that handles compliance.

Q: How do I scrape Instagram without getting blocked?

Instagram has aggressive anti-bot detection including login walls, rate limiting, checkpoint challenges, and device ID tracking. To avoid blocks: use rotating residential proxies, headless browsers with stealth plugins, random delays (5-20 seconds), and avoid scraping while logged in. Alternatively, use a web scraping API like Mantis that handles anti-blocking automatically.

Q: Can I use Instagram's official API instead of scraping?

Instagram offers the Graph API for business/creator accounts, but it requires app review, only works with accounts you own or that authorize your app, and doesn't allow searching or discovering public profiles. The Basic Display API was deprecated in 2024. For public profile data at scale, scraping or a web scraping API is often the only option.

Q: What Python library is best for scraping Instagram?

For public profiles, Playwright with stealth plugins is the most reliable approach since Instagram requires JavaScript rendering. Libraries like instaloader work but face frequent breakage. For production use, a web scraping API like Mantis is recommended for reliability and maintenance-free operation.

Q: How many Instagram profiles can I scrape per day?

Without proxies, Instagram blocks after 20-50 requests. With rotating residential proxies and proper delays, you can scrape 500-2,000 profiles per day. With a web scraping API like Mantis, you can make up to 100,000 requests per month on the Scale plan without managing infrastructure.

📑 Table of Contents

Why Scrape Instagram Data?
What Data Can You Extract?
Method 1: Python + Requests (Public API Endpoints)
Method 2: Playwright (Headless Browser)
Method 3: Node.js + Puppeteer
Method 4: Web Scraping API (Easiest)
Beating Instagram's Anti-Bot Detection
Instagram Graph API vs Scraping
Method Comparison
Real-World Use Cases
Legal Considerations
FAQ

Why Scrape Instagram Data?

Instagram has over 2 billion monthly active users, making it one of the richest sources of social data on the internet. Businesses, researchers, and developers scrape Instagram for:

Influencer analytics — Evaluate engagement rates, follower growth, and content performance before sponsorship deals
Brand monitoring — Track mentions, hashtags, and competitor activity across Instagram in real time
Market research — Discover trending products, aesthetics, and consumer preferences through visual content analysis
Competitive intelligence — Monitor competitor posting strategies, engagement metrics, and audience growth
Lead generation — Find business accounts in specific niches with their contact info (email, website from bio)
Content curation — Aggregate UGC (user-generated content) for marketing campaigns with proper attribution
AI agent social intelligence — Give AI assistants the ability to research brands, people, and trends on Instagram
Academic research — Study social media behavior, visual trends, and platform dynamics at scale

Instagram's official API is extremely limited — it only works with accounts you own or that authorize your app. For public data at scale, scraping is often the only viable option.

What Data Can You Extract?

Instagram profiles and posts contain rich data, though access depends on privacy settings:

Data Point	Public Profiles	Private Profiles
Username & Full Name	✅	✅
Bio & External URL	✅	✅
Profile Picture (HD)	✅	✅
Follower / Following Count	✅	✅
Post Count	✅	✅
Verified Badge	✅	✅
Business Category	✅	✅
Recent Posts (12-50)	✅	❌
Post Captions	✅	❌
Like / Comment Counts	✅	❌
Post Images / Videos	✅	❌
Reels	✅	❌
Tagged Locations	✅	❌
Hashtags Used	✅	❌
Comment Text	✅ (limited)	❌

Method 1: Python + Requests (Public API Endpoints)

Instagram serves profile data through internal GraphQL endpoints that return JSON. These endpoints are undocumented and change frequently, but they're faster than full browser rendering.

Install Dependencies

pip install requests

Public Profile Scraper

# instagram_scraper.py
import requests
import json
import time
import random

HEADERS = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
                   "AppleWebKit/537.36 (KHTML, like Gecko) "
                   "Chrome/125.0.0.0 Safari/537.36",
    "Accept": "*/*",
    "Accept-Language": "en-US,en;q=0.9",
    "X-IG-App-ID": "936619743392459",
    "X-Requested-With": "XMLHttpRequest",
    "Referer": "https://www.instagram.com/",
}

def scrape_instagram_profile(username: str) -> dict:
    """Scrape public Instagram profile data via web API."""
    url = f"https://www.instagram.com/api/v1/users/web_profile_info/"
    params = {"username": username}

    resp = requests.get(
        url, headers=HEADERS, params=params, timeout=15
    )

    if resp.status_code == 404:
        return {"error": f"User '{username}' not found"}
    if resp.status_code != 200:
        return {"error": f"Request failed: {resp.status_code}"}

    data = resp.json()
    user = data.get("data", {}).get("user", {})

    if not user:
        return {"error": "Could not parse user data"}

    # Extract recent posts
    posts = []
    edges = (
        user.get("edge_owner_to_timeline_media", {})
        .get("edges", [])
    )
    for edge in edges[:12]:
        node = edge.get("node", {})
        caption_edges = (
            node.get("edge_media_to_caption", {})
            .get("edges", [])
        )
        posts.append({
            "id": node.get("shortcode"),
            "url": (
                f"https://www.instagram.com/p/"
                f"{node.get('shortcode')}/"
            ),
            "image": node.get("display_url"),
            "caption": (
                caption_edges[0]["node"]["text"]
                if caption_edges else None
            ),
            "likes": (
                node.get("edge_liked_by", {}).get("count", 0)
            ),
            "comments": (
                node.get("edge_media_to_comment", {})
                .get("count", 0)
            ),
            "timestamp": node.get("taken_at_timestamp"),
            "is_video": node.get("is_video", False),
            "video_views": node.get("video_view_count"),
        })

    return {
        "username": user.get("username"),
        "full_name": user.get("full_name"),
        "bio": user.get("biography"),
        "external_url": user.get("external_url"),
        "followers": (
            user.get("edge_followed_by", {}).get("count", 0)
        ),
        "following": (
            user.get("edge_follow", {}).get("count", 0)
        ),
        "posts_count": (
            user.get("edge_owner_to_timeline_media", {})
            .get("count", 0)
        ),
        "is_verified": user.get("is_verified", False),
        "is_business": user.get("is_business_account", False),
        "business_category": user.get("category_name"),
        "profile_pic_hd": user.get("profile_pic_url_hd"),
        "is_private": user.get("is_private", False),
        "recent_posts": posts,
    }

# Example usage
profile = scrape_instagram_profile("natgeo")
print(json.dumps(profile, indent=2))
print(f"\nFollowers: {profile.get('followers', 0):,}")
print(f"Posts: {profile.get('posts_count', 0):,}")

⚠️ Important: Endpoint Instability

Instagram's internal API endpoints change frequently without notice. The web_profile_info endpoint works as of early 2026, but Meta may modify or remove it at any time. Always have a fallback strategy — browser-based scraping or a managed API.

Hashtag Explorer

# hashtag_scraper.py
def scrape_hashtag(tag: str) -> dict:
    """Scrape top posts from an Instagram hashtag page."""
    url = f"https://www.instagram.com/explore/tags/{tag}/"
    params = {"__a": "1", "__d": "dis"}

    resp = requests.get(
        url, headers=HEADERS, params=params, timeout=15
    )

    if resp.status_code != 200:
        return {"error": f"Failed to fetch #{tag}"}

    try:
        data = resp.json()
    except json.JSONDecodeError:
        return {"error": "Instagram returned HTML — likely blocked"}

    hashtag_data = (
        data.get("graphql", {}).get("hashtag", {})
        or data.get("data", {}).get("hashtag", {})
    )

    if not hashtag_data:
        return {"error": "Could not parse hashtag data"}

    top_posts = []
    edges = (
        hashtag_data.get("edge_hashtag_to_top_posts", {})
        .get("edges", [])
    )
    for edge in edges[:9]:
        node = edge["node"]
        caption_edges = (
            node.get("edge_media_to_caption", {})
            .get("edges", [])
        )
        top_posts.append({
            "shortcode": node.get("shortcode"),
            "likes": (
                node.get("edge_liked_by", {}).get("count", 0)
            ),
            "comments": (
                node.get("edge_media_to_comment", {})
                .get("count", 0)
            ),
            "caption": (
                caption_edges[0]["node"]["text"][:200]
                if caption_edges else None
            ),
            "is_video": node.get("is_video", False),
        })

    return {
        "hashtag": tag,
        "post_count": (
            hashtag_data.get("edge_hashtag_to_media", {})
            .get("count", 0)
        ),
        "top_posts": top_posts,
    }

result = scrape_hashtag("webdevelopment")
print(f"#{result['hashtag']}: {result.get('post_count', 0):,} posts")

Method 2: Playwright (Headless Browser)

Instagram is a JavaScript-heavy single-page app. Playwright renders the full page, handles login walls, infinite scroll, and gives you access to all visible content — including stories, reels, and dynamically loaded posts.

Install

pip install playwright playwright-stealth
playwright install chromium

Full-Render Instagram Scraper

# playwright_instagram.py
import asyncio
from playwright.async_api import async_playwright
from playwright_stealth import stealth_async
import json

async def scrape_instagram_profile(username: str) -> dict:
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        context = await browser.new_context(
            user_agent=(
                "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
                "AppleWebKit/537.36 (KHTML, like Gecko) "
                "Chrome/125.0.0.0 Safari/537.36"
            ),
            viewport={"width": 1920, "height": 1080},
            locale="en-US",
        )

        page = await context.new_page()
        await stealth_async(page)

        # Block media for speed
        await page.route(
            "**/*.{mp4,webm,ogg}",
            lambda route: route.abort()
        )

        url = f"https://www.instagram.com/{username}/"
        await page.goto(url, wait_until="networkidle")

        # Dismiss login popup if it appears
        try:
            close_btn = page.locator(
                '[aria-label="Close"],'
                'button:has-text("Not Now")'
            )
            await close_btn.click(timeout=3000)
        except Exception:
            pass

        await page.wait_for_timeout(2000)

        # Extract profile data from the page
        profile = await page.evaluate("""() => {
            const getMeta = (prop) => {
                const el = document.querySelector(
                    `meta[property="${prop}"]`
                );
                return el ? el.content : null;
            };

            // Parse follower counts from header
            const stats = document.querySelectorAll(
                'header section ul li'
            );
            const parseCount = (el) => {
                if (!el) return 0;
                const text = el.textContent.replace(/,/g, '');
                const match = text.match(
                    /([\d.]+)\s*(K|M|B)?/i
                );
                if (!match) return 0;
                let num = parseFloat(match[1]);
                const suffix = (match[2] || '').toUpperCase();
                if (suffix === 'K') num *= 1000;
                if (suffix === 'M') num *= 1000000;
                if (suffix === 'B') num *= 1000000000;
                return Math.round(num);
            };

            // Get post thumbnails
            const posts = [];
            const articles = document.querySelectorAll(
                'article img[srcset], main img[srcset]'
            );
            articles.forEach((img, i) => {
                if (i < 12) {
                    posts.push({
                        image: img.src,
                        alt: img.alt || '',
                    });
                }
            });

            return {
                title: getMeta('og:title'),
                description: getMeta('og:description'),
                image: getMeta('og:image'),
                posts_count: stats[0]
                    ? parseCount(stats[0]) : 0,
                followers: stats[1]
                    ? parseCount(stats[1]) : 0,
                following: stats[2]
                    ? parseCount(stats[2]) : 0,
                posts: posts,
            };
        }""")

        # Parse bio from meta description
        desc = profile.get("description", "") or ""
        bio_parts = desc.split("on Instagram: ")
        bio = bio_parts[1] if len(bio_parts) > 1 else ""
        if bio.startswith('"') and bio.endswith('"'):
            bio = bio[1:-1]

        profile["username"] = username
        profile["bio"] = bio
        profile["url"] = url

        await browser.close()
        return profile

# Run it
data = asyncio.run(scrape_instagram_profile("natgeo"))
print(json.dumps(data, indent=2))

Scroll & Collect Posts

# scroll_posts.py
async def scrape_all_posts(
    username: str, max_posts: int = 50
) -> list:
    """Scroll through a profile and collect post data."""
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        context = await browser.new_context(
            user_agent=(
                "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
                "AppleWebKit/537.36 Chrome/125.0.0.0"
            ),
            viewport={"width": 1920, "height": 1080},
        )
        page = await context.new_page()
        await stealth_async(page)

        await page.goto(
            f"https://www.instagram.com/{username}/",
            wait_until="networkidle",
        )

        # Dismiss popups
        try:
            await page.click(
                'button:has-text("Not Now")', timeout=3000
            )
        except Exception:
            pass

        posts = set()
        prev_count = 0
        scroll_attempts = 0

        while len(posts) < max_posts and scroll_attempts < 20:
            # Collect post links
            links = await page.evaluate("""() => {
                return [...document.querySelectorAll(
                    'a[href*="/p/"]'
                )].map(a => a.href);
            }""")
            posts.update(links)

            if len(posts) == prev_count:
                scroll_attempts += 1
            else:
                scroll_attempts = 0
            prev_count = len(posts)

            # Scroll down
            await page.evaluate(
                "window.scrollBy(0, window.innerHeight * 2)"
            )
            await page.wait_for_timeout(
                random.randint(1500, 3000)
            )

        await browser.close()
        return list(posts)[:max_posts]

💡 Pro Tip: Session Cookies

Instagram shows more data to logged-in users. You can export cookies from a real browser session (use a burner account) and load them into Playwright with context.add_cookies(). This unlocks additional post data and higher rate limits — but increases the risk of account suspension.

Method 3: Node.js + Puppeteer

Puppeteer provides headless Chrome control from Node.js — ideal for building Instagram scraping into backend services, serverless functions, or data pipelines.

Install

npm install puppeteer puppeteer-extra puppeteer-extra-plugin-stealth

Profile Scraper with Stealth

// instagram-scraper.mjs
import puppeteer from "puppeteer-extra";
import StealthPlugin from "puppeteer-extra-plugin-stealth";

puppeteer.use(StealthPlugin());

async function scrapeProfile(username) {
  const browser = await puppeteer.launch({
    headless: "new",
    args: ["--no-sandbox", "--disable-setuid-sandbox"],
  });

  const page = await browser.newPage();
  await page.setViewport({ width: 1920, height: 1080 });

  // Block heavy resources
  await page.setRequestInterception(true);
  page.on("request", (req) => {
    const type = req.resourceType();
    if (["video", "font"].includes(type)) {
      req.abort();
    } else {
      req.continue();
    }
  });

  // Intercept the GraphQL response
  let profileData = null;
  page.on("response", async (resp) => {
    const url = resp.url();
    if (
      url.includes("web_profile_info") ||
      url.includes("graphql/query")
    ) {
      try {
        const json = await resp.json();
        if (json?.data?.user) {
          profileData = json.data.user;
        }
      } catch (e) {
        // Not JSON
      }
    }
  });

  await page.goto(
    `https://www.instagram.com/${username}/`,
    { waitUntil: "networkidle0", timeout: 30000 }
  );

  // Dismiss login modal
  try {
    await page.click('button:has-text("Not Now")', {
      timeout: 3000,
    });
  } catch (e) {}

  await page.waitForTimeout(3000);

  if (!profileData) {
    // Fallback: parse from page
    profileData = await page.evaluate(() => {
      const desc =
        document
          .querySelector('meta[property="og:description"]')
          ?.content || "";
      const match = desc.match(
        /([\d,.KMB]+) Followers, ([\d,.KMB]+) Following, ([\d,.KMB]+) Posts/i
      );
      return {
        username:
          document
            .querySelector('meta[property="og:title"]')
            ?.content?.split("(")[0]
            .trim() || "",
        description: desc,
        followers_text: match?.[1] || "0",
        following_text: match?.[2] || "0",
        posts_text: match?.[3] || "0",
      };
    });
  }

  const result = {
    username: profileData.username || username,
    full_name: profileData.full_name || null,
    bio: profileData.biography || null,
    external_url: profileData.external_url || null,
    followers:
      profileData.edge_followed_by?.count ||
      profileData.follower_count ||
      0,
    following:
      profileData.edge_follow?.count ||
      profileData.following_count ||
      0,
    posts_count:
      profileData.edge_owner_to_timeline_media?.count ||
      profileData.media_count ||
      0,
    is_verified: profileData.is_verified || false,
    is_private: profileData.is_private || false,
    is_business: profileData.is_business_account || false,
    category: profileData.category_name || null,
    profile_pic: profileData.profile_pic_url_hd || null,
  };

  await browser.close();
  return result;
}

// Batch scrape with rate limiting
async function scrapeMultiple(usernames, delayMs = 8000) {
  const results = [];
  for (const username of usernames) {
    try {
      const profile = await scrapeProfile(username);
      results.push(profile);
      console.log(
        `✓ @${profile.username} — ` +
        `${profile.followers.toLocaleString()} followers`
      );
    } catch (err) {
      console.error(`✗ @${username}: ${err.message}`);
      results.push({ username, error: err.message });
    }
    await new Promise((r) => setTimeout(r, delayMs));
  }
  return results;
}

// Usage
const profiles = await scrapeMultiple([
  "natgeo", "nasa", "nike"
]);
console.log(JSON.stringify(profiles, null, 2));

Method 4: Web Scraping API (Easiest)

The most reliable approach for production Instagram scraping. A web scraping API handles proxy rotation, login walls, browser rendering, and anti-bot detection — you send a URL, get structured data back.

Using the Mantis API

# One API call — structured Instagram data
import requests

resp = requests.post(
    "https://api.mantisapi.com/v1/scrape",
    headers={
        "Authorization": "Bearer YOUR_API_KEY",
        "Content-Type": "application/json",
    },
    json={
        "url": "https://www.instagram.com/natgeo/",
        "extract": {
            "username": "Instagram username",
            "full_name": "display name",
            "bio": "profile biography text",
            "followers": "follower count as integer",
            "following": "following count as integer",
            "posts_count": "total number of posts",
            "is_verified": "whether account is verified",
            "is_business": "whether it's a business account",
            "category": "business category if applicable",
            "external_url": "website URL from bio",
            "recent_posts": (
                "array of recent posts with: "
                "caption, likes, comments, image_url, "
                "is_video, timestamp"
            ),
        },
        "render_js": True,
    },
)

profile = resp.json()
print(f"@{profile.get('username')} — "
      f"{profile.get('followers', 0):,} followers")

Skip the Login Walls & Blocks

Mantis handles Instagram's anti-bot detection, login popups, proxy rotation, and JavaScript rendering — so you don't have to.

View Pricing Get Started Free

Node.js with Mantis

// mantis-instagram.mjs
const resp = await fetch("https://api.mantisapi.com/v1/scrape", {
  method: "POST",
  headers: {
    Authorization: "Bearer YOUR_API_KEY",
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    url: "https://www.instagram.com/nike/",
    extract: {
      username: "Instagram handle",
      followers: "follower count as number",
      bio: "profile bio text",
      recent_posts:
        "array of last 12 posts with caption, " +
        "likes, comments, is_video",
    },
    render_js: true,
  }),
});

const profile = await resp.json();
console.log(profile);

Beating Instagram's Anti-Bot Detection

Instagram (Meta) has some of the most sophisticated anti-scraping defenses of any social platform. Here's what you're up against:

Instagram's Defense Layers

Defense	What It Does	Countermeasure
Login Wall	Redirects to login after a few page views	Cookie rotation, session management
Rate Limiting	429 errors after rapid requests	Rotating residential proxies, long delays
Checkpoint Challenges	Phone/email verification prompts	Avoid logged-in scraping, use APIs
Device ID Tracking	Fingerprints devices across sessions	Fresh browser contexts, randomized fingerprints
IP Reputation	Blocks datacenter IPs and known proxy ranges	Residential or mobile proxies only
Browser Fingerprinting	Detects automation via WebDriver, plugins	Stealth plugins (playwright-stealth, puppeteer-extra)
API Endpoint Changes	Moves/renames internal API endpoints	Monitor changes, maintain multiple fallbacks
GraphQL Query Hash Rotation	Changes query hashes for GraphQL endpoints	Extract hashes from page source dynamically

Essential Anti-Detection Techniques

# instagram_stealth.py
import random
import time

PROXY_POOL = [
    # Use RESIDENTIAL proxies only
    # Datacenter IPs are instantly blocked
    "http://user:pass@res-proxy1.example.com:8080",
    "http://user:pass@res-proxy2.example.com:8080",
]

def instagram_delay():
    """Instagram requires longer delays than most sites."""
    base = random.uniform(5, 15)
    # Occasionally take a longer break
    if random.random() < 0.1:
        base += random.uniform(30, 60)
    time.sleep(base)

def is_rate_limited(response) -> bool:
    """Detect Instagram rate limiting."""
    if response.status_code == 429:
        return True
    if response.status_code == 401:
        return True
    if "checkpoint_required" in response.text:
        return True
    if "login" in response.url and "instagram.com" in response.url:
        return True
    return False

def get_fresh_session():
    """Create a session with residential proxy."""
    session = requests.Session()
    proxy = random.choice(PROXY_POOL)
    session.proxies = {"http": proxy, "https": proxy}
    session.headers.update({
        "User-Agent": random.choice([
            "Mozilla/5.0 (iPhone; CPU iPhone OS 17_5 like Mac OS X) "
            "AppleWebKit/605.1.15 (KHTML, like Gecko) "
            "Version/17.5 Mobile/15E148 Safari/604.1",
            "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
            "AppleWebKit/537.36 Chrome/125.0.0.0 Safari/537.36",
        ]),
        "Accept-Language": "en-US,en;q=0.9",
        "X-IG-App-ID": "936619743392459",
    })
    return session

⚠️ Critical: Residential Proxies Only

Instagram blocks virtually all datacenter IP ranges. You must use residential or mobile proxies for any meaningful scraping. This is the #1 reason DIY Instagram scrapers fail. A managed API like Mantis handles proxy infrastructure for you.

Instagram Graph API vs Scraping

Meta offers the Instagram Graph API and formerly the Basic Display API (deprecated December 2024). Here's how they compare:

Feature	Graph API	Basic Display (Deprecated)	Web Scraping	Mantis API
Access	Business/Creator accounts only	Shut down Dec 2024	Any public profile	Any public profile
Requires App Review	Yes (Meta approval)	N/A	No	No
Rate Limits	200 calls/user/hour	N/A	Depends on proxies	Based on plan
Discover/Search Profiles	❌ Only owned accounts	N/A	✅ Any public profile	✅ Any public profile
Follower Count	✅ Own account only	N/A	✅ Any public profile	✅ Any public profile
Post Content	✅ Own posts only	N/A	✅ Any public posts	✅ Any public posts
Competitor Data	❌	N/A	✅	✅
Hashtag Search	Limited (30 unique/7 days)	N/A	✅ Unlimited	✅ Unlimited
Reliability	High (official)	N/A	Medium (endpoints change)	High (maintained)
Cost	Free (but limited)	N/A	$100-500+/mo (proxies)	$0-299/mo

💡 The Instagram API Gap

Instagram's official API is designed for managing your own account — not for discovering or analyzing other accounts. If you need competitor data, influencer analytics, or market research across multiple profiles, scraping or a web scraping API is your only option.

Method Comparison

Criteria	Python + Requests	Playwright	Node.js + Puppeteer	Mantis API
Setup Time	5 min	10 min	10 min	2 min
JS Rendering	❌ (API endpoints)	✅	✅	✅
Anti-Detection	Low (easily blocked)	Good (with stealth)	Good (with stealth)	Built-in
Speed	Fast	Slow	Slow	Medium
Maintenance	Very High (endpoints change)	High	High	None
Scale	Low	Low-Medium	Low-Medium	High
Cost (5K profiles/mo)	$100-300 (proxies)	$200-500 (proxies + compute)	$200-500 (proxies + compute)	$99 (Pro plan)
Best For	Quick prototypes	Rich data extraction	Backend services	Production

Real-World Use Cases

1. Influencer Analytics Dashboard

Build a tool that evaluates influencer accounts — engagement rate, posting frequency, audience quality — before sponsorship deals.

# influencer_analytics.py
import requests
from datetime import datetime, timezone

MANTIS_KEY = "YOUR_API_KEY"

def analyze_influencer(username: str) -> dict:
    """Calculate engagement metrics for an influencer."""
    resp = requests.post(
        "https://api.mantisapi.com/v1/scrape",
        headers={
            "Authorization": f"Bearer {MANTIS_KEY}",
            "Content-Type": "application/json",
        },
        json={
            "url": f"https://www.instagram.com/{username}/",
            "extract": {
                "username": "Instagram username",
                "followers": "follower count as integer",
                "following": "following count as integer",
                "posts_count": "total posts as integer",
                "is_verified": "boolean",
                "bio": "biography text",
                "recent_posts": (
                    "array of last 12 posts with: "
                    "likes (integer), comments (integer), "
                    "caption (string), is_video (boolean)"
                ),
            },
            "render_js": True,
        },
    )
    data = resp.json()

    posts = data.get("recent_posts", [])
    followers = data.get("followers", 1)

    if posts and followers > 0:
        total_engagement = sum(
            p.get("likes", 0) + p.get("comments", 0)
            for p in posts
        )
        avg_engagement = total_engagement / len(posts)
        engagement_rate = (avg_engagement / followers) * 100

        avg_likes = sum(
            p.get("likes", 0) for p in posts
        ) / len(posts)
        avg_comments = sum(
            p.get("comments", 0) for p in posts
        ) / len(posts)

        video_ratio = sum(
            1 for p in posts if p.get("is_video")
        ) / len(posts)
    else:
        engagement_rate = 0
        avg_likes = 0
        avg_comments = 0
        video_ratio = 0

    # Engagement rate benchmarks
    if engagement_rate > 6:
        tier = "Excellent"
    elif engagement_rate > 3:
        tier = "Good"
    elif engagement_rate > 1:
        tier = "Average"
    else:
        tier = "Below Average"

    # Follower/following ratio (health signal)
    ff_ratio = followers / max(data.get("following", 1), 1)

    return {
        "username": username,
        "followers": followers,
        "engagement_rate": round(engagement_rate, 2),
        "engagement_tier": tier,
        "avg_likes": round(avg_likes),
        "avg_comments": round(avg_comments),
        "video_ratio": round(video_ratio * 100, 1),
        "ff_ratio": round(ff_ratio, 1),
        "verified": data.get("is_verified", False),
        "estimated_post_value": (
            f"${round(followers * engagement_rate / 100 * 0.05, 2)}"
        ),
    }

# Evaluate multiple influencers
influencers = ["natgeo", "nike", "airbnb"]
for username in influencers:
    result = analyze_influencer(username)
    print(f"\n@{result['username']}:")
    print(f"  Followers: {result['followers']:,}")
    print(f"  Engagement: {result['engagement_rate']}% "
          f"({result['engagement_tier']})")
    print(f"  Avg Likes: {result['avg_likes']:,}")
    print(f"  Video Mix: {result['video_ratio']}%")
    print(f"  Est. Post Value: {result['estimated_post_value']}")

2. Brand Mention Monitor

Track when your brand is mentioned in Instagram posts and hashtags — essential for reputation management and identifying UGC opportunities.

# brand_monitor.py
import requests
import json
from datetime import datetime

MANTIS_KEY = "YOUR_API_KEY"
BRAND_HASHTAGS = [
    "mantisapi", "mantis_api", "webscrapingapi"
]
COMPETITOR_ACCOUNTS = [
    "scrapingbee", "brightdata", "apify_official"
]

def monitor_hashtag(tag: str) -> dict:
    """Check a hashtag for recent brand mentions."""
    resp = requests.post(
        "https://api.mantisapi.com/v1/scrape",
        headers={
            "Authorization": f"Bearer {MANTIS_KEY}",
            "Content-Type": "application/json",
        },
        json={
            "url": (
                f"https://www.instagram.com/explore/tags/{tag}/"
            ),
            "extract": {
                "post_count": "total posts with this hashtag",
                "top_posts": (
                    "array of top 9 posts with: "
                    "author, caption, likes, comments, "
                    "post_url"
                ),
            },
            "render_js": True,
        },
    )
    return {"hashtag": tag, **resp.json()}

def monitor_competitor(username: str) -> dict:
    """Check competitor's recent posts."""
    resp = requests.post(
        "https://api.mantisapi.com/v1/scrape",
        headers={
            "Authorization": f"Bearer {MANTIS_KEY}",
            "Content-Type": "application/json",
        },
        json={
            "url": f"https://www.instagram.com/{username}/",
            "extract": {
                "followers": "follower count",
                "recent_posts": (
                    "last 6 posts: caption, likes, comments"
                ),
            },
            "render_js": True,
        },
    )
    return {"competitor": username, **resp.json()}

# Daily monitoring run
report = {
    "timestamp": datetime.utcnow().isoformat(),
    "hashtags": [
        monitor_hashtag(tag) for tag in BRAND_HASHTAGS
    ],
    "competitors": [
        monitor_competitor(acc) for acc in COMPETITOR_ACCOUNTS
    ],
}

# Save daily report
date = datetime.utcnow().strftime("%Y-%m-%d")
with open(f"instagram_report_{date}.json", "w") as f:
    json.dump(report, f, indent=2)

print(f"Report saved: instagram_report_{date}.json")

3. AI Agent Social Intelligence

Give an AI agent the ability to research brands, influencers, and trends on Instagram — a key capability for marketing AI assistants.

# agent_instagram.py — LangChain tool
from langchain.tools import tool
import requests

MANTIS_KEY = "YOUR_API_KEY"

@tool
def research_instagram_account(username: str) -> str:
    """Research an Instagram account. Returns profile info,
    engagement metrics, and recent post activity."""
    resp = requests.post(
        "https://api.mantisapi.com/v1/scrape",
        headers={
            "Authorization": f"Bearer {MANTIS_KEY}",
            "Content-Type": "application/json",
        },
        json={
            "url": f"https://www.instagram.com/{username}/",
            "extract": {
                "full_name": "display name",
                "bio": "biography",
                "followers": "follower count",
                "following": "following count",
                "posts_count": "total posts",
                "is_verified": "verified status",
                "category": "business category",
                "website": "external URL",
                "recent_posts": (
                    "last 6 posts: caption (first 100 chars), "
                    "likes, comments, is_video"
                ),
            },
            "render_js": True,
        },
    )
    data = resp.json()

    posts = data.get("recent_posts", [])
    followers = data.get("followers", 0)
    if posts and followers:
        avg_eng = sum(
            p.get("likes", 0) + p.get("comments", 0)
            for p in posts
        ) / len(posts)
        eng_rate = round((avg_eng / followers) * 100, 2)
    else:
        eng_rate = "N/A"

    result = f"Instagram Profile: @{username}\n"
    result += f"Name: {data.get('full_name', 'N/A')}\n"
    result += f"Bio: {data.get('bio', 'N/A')}\n"
    result += (
        f"Followers: {data.get('followers', 0):,} | "
        f"Following: {data.get('following', 0):,}\n"
    )
    result += f"Posts: {data.get('posts_count', 0):,}\n"
    result += f"Verified: {data.get('is_verified', False)}\n"
    result += f"Category: {data.get('category', 'N/A')}\n"
    result += f"Website: {data.get('website', 'N/A')}\n"
    result += f"Engagement Rate: {eng_rate}%\n\n"

    if posts:
        result += "Recent Posts:\n"
        for i, p in enumerate(posts[:3], 1):
            caption = (
                p.get("caption", "")[:80] + "..."
                if p.get("caption") else "[no caption]"
            )
            result += (
                f"  {i}. {caption}\n"
                f"     ❤️ {p.get('likes', 0):,} | "
                f"💬 {p.get('comments', 0):,}\n"
            )

    return result

# Use in a LangChain agent
# agent = create_agent(
#     tools=[research_instagram_account], ...
# )

Legal Considerations

Instagram scraping carries more legal risk than most platforms due to Meta's aggressive enforcement. Key considerations:

Meta's Terms of Service — Explicitly prohibit automated data collection. Meta has filed lawsuits against multiple scraping companies (including a $500M+ judgment against Voyager Labs in 2023)
hiQ Labs v. LinkedIn (2022) — Ninth Circuit ruled scraping public data doesn't violate CFAA, but this case involved LinkedIn, not Meta. Meta's ToS are more aggressive
Van Buren v. United States (2021) — Supreme Court narrowed CFAA "exceeds authorized access" — supports scraping public data, but doesn't address contract/ToS claims
Meta v. Voyager Labs (2023) — Meta won a massive judgment against a company that created fake accounts to scrape Instagram. Creating fake accounts crosses clear legal lines
GDPR (EU) — Instagram profiles contain personal data (names, photos, locations). Scraping EU user data for commercial purposes without consent likely violates GDPR
CCPA (California) — Similar obligations for California residents' personal data
Photos & Videos — Visual content is copyrighted by the creator. Scraping metadata is different from republishing content

⚖️ Best Practices for Legal Safety

Only scrape publicly visible profile data (don't create fake accounts). Don't store or republish photos/videos. Respect private accounts. Don't scrape personal data of EU residents for commercial use without a legal basis. Use rate limiting to avoid disrupting the service. Consult a lawyer for commercial use cases. Consider a web scraping API that handles compliance considerations.

Production-Ready Instagram Scraping

Stop fighting login walls, proxy blocks, and endpoint changes. Mantis extracts structured Instagram data with a single API call.

View Pricing Get Started Free

Frequently Asked Questions

Is it legal to scrape Instagram data?

Scraping publicly available Instagram data is in a legal gray area. While hiQ v. LinkedIn supports scraping public data, Meta aggressively enforces its ToS and has sued scrapers. Avoid creating fake accounts, respect private profiles, and consult legal counsel for commercial use.

How do I scrape Instagram without getting blocked?

Use rotating residential proxies (datacenter IPs are instantly blocked), headless browsers with stealth plugins, delays of 5-20 seconds between requests, and fresh browser contexts. Or use a web scraping API like Mantis that handles anti-blocking automatically.

Can I use Instagram's official API instead of scraping?

The Instagram Graph API only works with accounts you own or that authorize your app, requires Meta app review, and doesn't support discovering or analyzing other public profiles. The Basic Display API was deprecated in December 2024.

What data can I extract from Instagram?

From public profiles: username, full name, bio, follower/following counts, post count, profile picture, recent posts with captions/likes/comments, reels, tagged locations, and hashtags. Private accounts only show basic profile info.

What Python library is best for scraping Instagram?

Playwright with stealth plugins is most reliable since Instagram requires JS rendering. For quick prototypes, the web_profile_info API endpoint works but changes frequently. For production, use a managed API like Mantis.

How many Instagram profiles can I scrape per day?

Without proxies: 20-50 before blocking. With residential proxies: 500-2,000. With Mantis API: up to 100,000/month on the Scale plan without managing any infrastructure.

How to Scrape Instagram Data in 2026: Posts, Profiles & Reels

📑 Table of Contents

Why Scrape Instagram Data?

What Data Can You Extract?

Method 1: Python + Requests (Public API Endpoints)

Install Dependencies

Public Profile Scraper

Hashtag Explorer

Method 2: Playwright (Headless Browser)

Install

Full-Render Instagram Scraper

Scroll & Collect Posts

Method 3: Node.js + Puppeteer

Install

Profile Scraper with Stealth

Method 4: Web Scraping API (Easiest)

Using the Mantis API

Skip the Login Walls & Blocks

Node.js with Mantis

Beating Instagram's Anti-Bot Detection

Instagram's Defense Layers

Essential Anti-Detection Techniques

Instagram Graph API vs Scraping

Method Comparison

Real-World Use Cases

1. Influencer Analytics Dashboard

2. Brand Mention Monitor

3. AI Agent Social Intelligence

Legal Considerations

Production-Ready Instagram Scraping

Frequently Asked Questions

Is it legal to scrape Instagram data?

How do I scrape Instagram without getting blocked?

Can I use Instagram's official API instead of scraping?

What data can I extract from Instagram?

What Python library is best for scraping Instagram?

How many Instagram profiles can I scrape per day?

Related Guides