๐Ÿ”ฑ Mantis

How to Scrape Instagram Data in 2026: Posts, Profiles & Reels

Extract profiles, posts, reels, hashtags, and follower data using Python, Node.js, and API-based approaches โ€” with production-ready code.

๐Ÿ“‘ Table of Contents

Why Scrape Instagram Data?

Instagram has over 2 billion monthly active users, making it one of the richest sources of social data on the internet. Businesses, researchers, and developers scrape Instagram for:

Instagram's official API is extremely limited โ€” it only works with accounts you own or that authorize your app. For public data at scale, scraping is often the only viable option.

What Data Can You Extract?

Instagram profiles and posts contain rich data, though access depends on privacy settings:

Data PointPublic ProfilesPrivate Profiles
Username & Full Nameโœ…โœ…
Bio & External URLโœ…โœ…
Profile Picture (HD)โœ…โœ…
Follower / Following Countโœ…โœ…
Post Countโœ…โœ…
Verified Badgeโœ…โœ…
Business Categoryโœ…โœ…
Recent Posts (12-50)โœ…โŒ
Post Captionsโœ…โŒ
Like / Comment Countsโœ…โŒ
Post Images / Videosโœ…โŒ
Reelsโœ…โŒ
Tagged Locationsโœ…โŒ
Hashtags Usedโœ…โŒ
Comment Textโœ… (limited)โŒ

Method 1: Python + Requests (Public API Endpoints)

Instagram serves profile data through internal GraphQL endpoints that return JSON. These endpoints are undocumented and change frequently, but they're faster than full browser rendering.

Install Dependencies

pip install requests

Public Profile Scraper

# instagram_scraper.py
import requests
import json
import time
import random

HEADERS = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
                   "AppleWebKit/537.36 (KHTML, like Gecko) "
                   "Chrome/125.0.0.0 Safari/537.36",
    "Accept": "*/*",
    "Accept-Language": "en-US,en;q=0.9",
    "X-IG-App-ID": "936619743392459",
    "X-Requested-With": "XMLHttpRequest",
    "Referer": "https://www.instagram.com/",
}

def scrape_instagram_profile(username: str) -> dict:
    """Scrape public Instagram profile data via web API."""
    url = f"https://www.instagram.com/api/v1/users/web_profile_info/"
    params = {"username": username}

    resp = requests.get(
        url, headers=HEADERS, params=params, timeout=15
    )

    if resp.status_code == 404:
        return {"error": f"User '{username}' not found"}
    if resp.status_code != 200:
        return {"error": f"Request failed: {resp.status_code}"}

    data = resp.json()
    user = data.get("data", {}).get("user", {})

    if not user:
        return {"error": "Could not parse user data"}

    # Extract recent posts
    posts = []
    edges = (
        user.get("edge_owner_to_timeline_media", {})
        .get("edges", [])
    )
    for edge in edges[:12]:
        node = edge.get("node", {})
        caption_edges = (
            node.get("edge_media_to_caption", {})
            .get("edges", [])
        )
        posts.append({
            "id": node.get("shortcode"),
            "url": (
                f"https://www.instagram.com/p/"
                f"{node.get('shortcode')}/"
            ),
            "image": node.get("display_url"),
            "caption": (
                caption_edges[0]["node"]["text"]
                if caption_edges else None
            ),
            "likes": (
                node.get("edge_liked_by", {}).get("count", 0)
            ),
            "comments": (
                node.get("edge_media_to_comment", {})
                .get("count", 0)
            ),
            "timestamp": node.get("taken_at_timestamp"),
            "is_video": node.get("is_video", False),
            "video_views": node.get("video_view_count"),
        })

    return {
        "username": user.get("username"),
        "full_name": user.get("full_name"),
        "bio": user.get("biography"),
        "external_url": user.get("external_url"),
        "followers": (
            user.get("edge_followed_by", {}).get("count", 0)
        ),
        "following": (
            user.get("edge_follow", {}).get("count", 0)
        ),
        "posts_count": (
            user.get("edge_owner_to_timeline_media", {})
            .get("count", 0)
        ),
        "is_verified": user.get("is_verified", False),
        "is_business": user.get("is_business_account", False),
        "business_category": user.get("category_name"),
        "profile_pic_hd": user.get("profile_pic_url_hd"),
        "is_private": user.get("is_private", False),
        "recent_posts": posts,
    }

# Example usage
profile = scrape_instagram_profile("natgeo")
print(json.dumps(profile, indent=2))
print(f"\nFollowers: {profile.get('followers', 0):,}")
print(f"Posts: {profile.get('posts_count', 0):,}")
โš ๏ธ Important: Endpoint Instability

Instagram's internal API endpoints change frequently without notice. The web_profile_info endpoint works as of early 2026, but Meta may modify or remove it at any time. Always have a fallback strategy โ€” browser-based scraping or a managed API.

Hashtag Explorer

# hashtag_scraper.py
def scrape_hashtag(tag: str) -> dict:
    """Scrape top posts from an Instagram hashtag page."""
    url = f"https://www.instagram.com/explore/tags/{tag}/"
    params = {"__a": "1", "__d": "dis"}

    resp = requests.get(
        url, headers=HEADERS, params=params, timeout=15
    )

    if resp.status_code != 200:
        return {"error": f"Failed to fetch #{tag}"}

    try:
        data = resp.json()
    except json.JSONDecodeError:
        return {"error": "Instagram returned HTML โ€” likely blocked"}

    hashtag_data = (
        data.get("graphql", {}).get("hashtag", {})
        or data.get("data", {}).get("hashtag", {})
    )

    if not hashtag_data:
        return {"error": "Could not parse hashtag data"}

    top_posts = []
    edges = (
        hashtag_data.get("edge_hashtag_to_top_posts", {})
        .get("edges", [])
    )
    for edge in edges[:9]:
        node = edge["node"]
        caption_edges = (
            node.get("edge_media_to_caption", {})
            .get("edges", [])
        )
        top_posts.append({
            "shortcode": node.get("shortcode"),
            "likes": (
                node.get("edge_liked_by", {}).get("count", 0)
            ),
            "comments": (
                node.get("edge_media_to_comment", {})
                .get("count", 0)
            ),
            "caption": (
                caption_edges[0]["node"]["text"][:200]
                if caption_edges else None
            ),
            "is_video": node.get("is_video", False),
        })

    return {
        "hashtag": tag,
        "post_count": (
            hashtag_data.get("edge_hashtag_to_media", {})
            .get("count", 0)
        ),
        "top_posts": top_posts,
    }

result = scrape_hashtag("webdevelopment")
print(f"#{result['hashtag']}: {result.get('post_count', 0):,} posts")

Method 2: Playwright (Headless Browser)

Instagram is a JavaScript-heavy single-page app. Playwright renders the full page, handles login walls, infinite scroll, and gives you access to all visible content โ€” including stories, reels, and dynamically loaded posts.

Install

pip install playwright playwright-stealth
playwright install chromium

Full-Render Instagram Scraper

# playwright_instagram.py
import asyncio
from playwright.async_api import async_playwright
from playwright_stealth import stealth_async
import json

async def scrape_instagram_profile(username: str) -> dict:
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        context = await browser.new_context(
            user_agent=(
                "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
                "AppleWebKit/537.36 (KHTML, like Gecko) "
                "Chrome/125.0.0.0 Safari/537.36"
            ),
            viewport={"width": 1920, "height": 1080},
            locale="en-US",
        )

        page = await context.new_page()
        await stealth_async(page)

        # Block media for speed
        await page.route(
            "**/*.{mp4,webm,ogg}",
            lambda route: route.abort()
        )

        url = f"https://www.instagram.com/{username}/"
        await page.goto(url, wait_until="networkidle")

        # Dismiss login popup if it appears
        try:
            close_btn = page.locator(
                '[aria-label="Close"],'
                'button:has-text("Not Now")'
            )
            await close_btn.click(timeout=3000)
        except Exception:
            pass

        await page.wait_for_timeout(2000)

        # Extract profile data from the page
        profile = await page.evaluate("""() => {
            const getMeta = (prop) => {
                const el = document.querySelector(
                    `meta[property="${prop}"]`
                );
                return el ? el.content : null;
            };

            // Parse follower counts from header
            const stats = document.querySelectorAll(
                'header section ul li'
            );
            const parseCount = (el) => {
                if (!el) return 0;
                const text = el.textContent.replace(/,/g, '');
                const match = text.match(
                    /([\d.]+)\s*(K|M|B)?/i
                );
                if (!match) return 0;
                let num = parseFloat(match[1]);
                const suffix = (match[2] || '').toUpperCase();
                if (suffix === 'K') num *= 1000;
                if (suffix === 'M') num *= 1000000;
                if (suffix === 'B') num *= 1000000000;
                return Math.round(num);
            };

            // Get post thumbnails
            const posts = [];
            const articles = document.querySelectorAll(
                'article img[srcset], main img[srcset]'
            );
            articles.forEach((img, i) => {
                if (i < 12) {
                    posts.push({
                        image: img.src,
                        alt: img.alt || '',
                    });
                }
            });

            return {
                title: getMeta('og:title'),
                description: getMeta('og:description'),
                image: getMeta('og:image'),
                posts_count: stats[0]
                    ? parseCount(stats[0]) : 0,
                followers: stats[1]
                    ? parseCount(stats[1]) : 0,
                following: stats[2]
                    ? parseCount(stats[2]) : 0,
                posts: posts,
            };
        }""")

        # Parse bio from meta description
        desc = profile.get("description", "") or ""
        bio_parts = desc.split("on Instagram: ")
        bio = bio_parts[1] if len(bio_parts) > 1 else ""
        if bio.startswith('"') and bio.endswith('"'):
            bio = bio[1:-1]

        profile["username"] = username
        profile["bio"] = bio
        profile["url"] = url

        await browser.close()
        return profile

# Run it
data = asyncio.run(scrape_instagram_profile("natgeo"))
print(json.dumps(data, indent=2))

Scroll & Collect Posts

# scroll_posts.py
async def scrape_all_posts(
    username: str, max_posts: int = 50
) -> list:
    """Scroll through a profile and collect post data."""
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        context = await browser.new_context(
            user_agent=(
                "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
                "AppleWebKit/537.36 Chrome/125.0.0.0"
            ),
            viewport={"width": 1920, "height": 1080},
        )
        page = await context.new_page()
        await stealth_async(page)

        await page.goto(
            f"https://www.instagram.com/{username}/",
            wait_until="networkidle",
        )

        # Dismiss popups
        try:
            await page.click(
                'button:has-text("Not Now")', timeout=3000
            )
        except Exception:
            pass

        posts = set()
        prev_count = 0
        scroll_attempts = 0

        while len(posts) < max_posts and scroll_attempts < 20:
            # Collect post links
            links = await page.evaluate("""() => {
                return [...document.querySelectorAll(
                    'a[href*="/p/"]'
                )].map(a => a.href);
            }""")
            posts.update(links)

            if len(posts) == prev_count:
                scroll_attempts += 1
            else:
                scroll_attempts = 0
            prev_count = len(posts)

            # Scroll down
            await page.evaluate(
                "window.scrollBy(0, window.innerHeight * 2)"
            )
            await page.wait_for_timeout(
                random.randint(1500, 3000)
            )

        await browser.close()
        return list(posts)[:max_posts]
๐Ÿ’ก Pro Tip: Session Cookies

Instagram shows more data to logged-in users. You can export cookies from a real browser session (use a burner account) and load them into Playwright with context.add_cookies(). This unlocks additional post data and higher rate limits โ€” but increases the risk of account suspension.

Method 3: Node.js + Puppeteer

Puppeteer provides headless Chrome control from Node.js โ€” ideal for building Instagram scraping into backend services, serverless functions, or data pipelines.

Install

npm install puppeteer puppeteer-extra puppeteer-extra-plugin-stealth

Profile Scraper with Stealth

// instagram-scraper.mjs
import puppeteer from "puppeteer-extra";
import StealthPlugin from "puppeteer-extra-plugin-stealth";

puppeteer.use(StealthPlugin());

async function scrapeProfile(username) {
  const browser = await puppeteer.launch({
    headless: "new",
    args: ["--no-sandbox", "--disable-setuid-sandbox"],
  });

  const page = await browser.newPage();
  await page.setViewport({ width: 1920, height: 1080 });

  // Block heavy resources
  await page.setRequestInterception(true);
  page.on("request", (req) => {
    const type = req.resourceType();
    if (["video", "font"].includes(type)) {
      req.abort();
    } else {
      req.continue();
    }
  });

  // Intercept the GraphQL response
  let profileData = null;
  page.on("response", async (resp) => {
    const url = resp.url();
    if (
      url.includes("web_profile_info") ||
      url.includes("graphql/query")
    ) {
      try {
        const json = await resp.json();
        if (json?.data?.user) {
          profileData = json.data.user;
        }
      } catch (e) {
        // Not JSON
      }
    }
  });

  await page.goto(
    `https://www.instagram.com/${username}/`,
    { waitUntil: "networkidle0", timeout: 30000 }
  );

  // Dismiss login modal
  try {
    await page.click('button:has-text("Not Now")', {
      timeout: 3000,
    });
  } catch (e) {}

  await page.waitForTimeout(3000);

  if (!profileData) {
    // Fallback: parse from page
    profileData = await page.evaluate(() => {
      const desc =
        document
          .querySelector('meta[property="og:description"]')
          ?.content || "";
      const match = desc.match(
        /([\d,.KMB]+) Followers, ([\d,.KMB]+) Following, ([\d,.KMB]+) Posts/i
      );
      return {
        username:
          document
            .querySelector('meta[property="og:title"]')
            ?.content?.split("(")[0]
            .trim() || "",
        description: desc,
        followers_text: match?.[1] || "0",
        following_text: match?.[2] || "0",
        posts_text: match?.[3] || "0",
      };
    });
  }

  const result = {
    username: profileData.username || username,
    full_name: profileData.full_name || null,
    bio: profileData.biography || null,
    external_url: profileData.external_url || null,
    followers:
      profileData.edge_followed_by?.count ||
      profileData.follower_count ||
      0,
    following:
      profileData.edge_follow?.count ||
      profileData.following_count ||
      0,
    posts_count:
      profileData.edge_owner_to_timeline_media?.count ||
      profileData.media_count ||
      0,
    is_verified: profileData.is_verified || false,
    is_private: profileData.is_private || false,
    is_business: profileData.is_business_account || false,
    category: profileData.category_name || null,
    profile_pic: profileData.profile_pic_url_hd || null,
  };

  await browser.close();
  return result;
}

// Batch scrape with rate limiting
async function scrapeMultiple(usernames, delayMs = 8000) {
  const results = [];
  for (const username of usernames) {
    try {
      const profile = await scrapeProfile(username);
      results.push(profile);
      console.log(
        `โœ“ @${profile.username} โ€” ` +
        `${profile.followers.toLocaleString()} followers`
      );
    } catch (err) {
      console.error(`โœ— @${username}: ${err.message}`);
      results.push({ username, error: err.message });
    }
    await new Promise((r) => setTimeout(r, delayMs));
  }
  return results;
}

// Usage
const profiles = await scrapeMultiple([
  "natgeo", "nasa", "nike"
]);
console.log(JSON.stringify(profiles, null, 2));

Method 4: Web Scraping API (Easiest)

The most reliable approach for production Instagram scraping. A web scraping API handles proxy rotation, login walls, browser rendering, and anti-bot detection โ€” you send a URL, get structured data back.

Using the Mantis API

# One API call โ€” structured Instagram data
import requests

resp = requests.post(
    "https://api.mantisapi.com/v1/scrape",
    headers={
        "Authorization": "Bearer YOUR_API_KEY",
        "Content-Type": "application/json",
    },
    json={
        "url": "https://www.instagram.com/natgeo/",
        "extract": {
            "username": "Instagram username",
            "full_name": "display name",
            "bio": "profile biography text",
            "followers": "follower count as integer",
            "following": "following count as integer",
            "posts_count": "total number of posts",
            "is_verified": "whether account is verified",
            "is_business": "whether it's a business account",
            "category": "business category if applicable",
            "external_url": "website URL from bio",
            "recent_posts": (
                "array of recent posts with: "
                "caption, likes, comments, image_url, "
                "is_video, timestamp"
            ),
        },
        "render_js": True,
    },
)

profile = resp.json()
print(f"@{profile.get('username')} โ€” "
      f"{profile.get('followers', 0):,} followers")

Skip the Login Walls & Blocks

Mantis handles Instagram's anti-bot detection, login popups, proxy rotation, and JavaScript rendering โ€” so you don't have to.

View Pricing Get Started Free

Node.js with Mantis

// mantis-instagram.mjs
const resp = await fetch("https://api.mantisapi.com/v1/scrape", {
  method: "POST",
  headers: {
    Authorization: "Bearer YOUR_API_KEY",
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    url: "https://www.instagram.com/nike/",
    extract: {
      username: "Instagram handle",
      followers: "follower count as number",
      bio: "profile bio text",
      recent_posts:
        "array of last 12 posts with caption, " +
        "likes, comments, is_video",
    },
    render_js: true,
  }),
});

const profile = await resp.json();
console.log(profile);

Beating Instagram's Anti-Bot Detection

Instagram (Meta) has some of the most sophisticated anti-scraping defenses of any social platform. Here's what you're up against:

Instagram's Defense Layers

DefenseWhat It DoesCountermeasure
Login WallRedirects to login after a few page viewsCookie rotation, session management
Rate Limiting429 errors after rapid requestsRotating residential proxies, long delays
Checkpoint ChallengesPhone/email verification promptsAvoid logged-in scraping, use APIs
Device ID TrackingFingerprints devices across sessionsFresh browser contexts, randomized fingerprints
IP ReputationBlocks datacenter IPs and known proxy rangesResidential or mobile proxies only
Browser FingerprintingDetects automation via WebDriver, pluginsStealth plugins (playwright-stealth, puppeteer-extra)
API Endpoint ChangesMoves/renames internal API endpointsMonitor changes, maintain multiple fallbacks
GraphQL Query Hash RotationChanges query hashes for GraphQL endpointsExtract hashes from page source dynamically

Essential Anti-Detection Techniques

# instagram_stealth.py
import random
import time

PROXY_POOL = [
    # Use RESIDENTIAL proxies only
    # Datacenter IPs are instantly blocked
    "http://user:pass@res-proxy1.example.com:8080",
    "http://user:pass@res-proxy2.example.com:8080",
]

def instagram_delay():
    """Instagram requires longer delays than most sites."""
    base = random.uniform(5, 15)
    # Occasionally take a longer break
    if random.random() < 0.1:
        base += random.uniform(30, 60)
    time.sleep(base)

def is_rate_limited(response) -> bool:
    """Detect Instagram rate limiting."""
    if response.status_code == 429:
        return True
    if response.status_code == 401:
        return True
    if "checkpoint_required" in response.text:
        return True
    if "login" in response.url and "instagram.com" in response.url:
        return True
    return False

def get_fresh_session():
    """Create a session with residential proxy."""
    session = requests.Session()
    proxy = random.choice(PROXY_POOL)
    session.proxies = {"http": proxy, "https": proxy}
    session.headers.update({
        "User-Agent": random.choice([
            "Mozilla/5.0 (iPhone; CPU iPhone OS 17_5 like Mac OS X) "
            "AppleWebKit/605.1.15 (KHTML, like Gecko) "
            "Version/17.5 Mobile/15E148 Safari/604.1",
            "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
            "AppleWebKit/537.36 Chrome/125.0.0.0 Safari/537.36",
        ]),
        "Accept-Language": "en-US,en;q=0.9",
        "X-IG-App-ID": "936619743392459",
    })
    return session
โš ๏ธ Critical: Residential Proxies Only

Instagram blocks virtually all datacenter IP ranges. You must use residential or mobile proxies for any meaningful scraping. This is the #1 reason DIY Instagram scrapers fail. A managed API like Mantis handles proxy infrastructure for you.

Instagram Graph API vs Scraping

Meta offers the Instagram Graph API and formerly the Basic Display API (deprecated December 2024). Here's how they compare:

FeatureGraph APIBasic Display (Deprecated)Web ScrapingMantis API
AccessBusiness/Creator accounts onlyShut down Dec 2024Any public profileAny public profile
Requires App ReviewYes (Meta approval)N/ANoNo
Rate Limits200 calls/user/hourN/ADepends on proxiesBased on plan
Discover/Search ProfilesโŒ Only owned accountsN/Aโœ… Any public profileโœ… Any public profile
Follower Countโœ… Own account onlyN/Aโœ… Any public profileโœ… Any public profile
Post Contentโœ… Own posts onlyN/Aโœ… Any public postsโœ… Any public posts
Competitor DataโŒN/Aโœ…โœ…
Hashtag SearchLimited (30 unique/7 days)N/Aโœ… Unlimitedโœ… Unlimited
ReliabilityHigh (official)N/AMedium (endpoints change)High (maintained)
CostFree (but limited)N/A$100-500+/mo (proxies)$0-299/mo
๐Ÿ’ก The Instagram API Gap

Instagram's official API is designed for managing your own account โ€” not for discovering or analyzing other accounts. If you need competitor data, influencer analytics, or market research across multiple profiles, scraping or a web scraping API is your only option.

Method Comparison

CriteriaPython + RequestsPlaywrightNode.js + PuppeteerMantis API
Setup Time5 min10 min10 min2 min
JS RenderingโŒ (API endpoints)โœ…โœ…โœ…
Anti-DetectionLow (easily blocked)Good (with stealth)Good (with stealth)Built-in
SpeedFastSlowSlowMedium
MaintenanceVery High (endpoints change)HighHighNone
ScaleLowLow-MediumLow-MediumHigh
Cost (5K profiles/mo)$100-300 (proxies)$200-500 (proxies + compute)$200-500 (proxies + compute)$99 (Pro plan)
Best ForQuick prototypesRich data extractionBackend servicesProduction

Real-World Use Cases

1. Influencer Analytics Dashboard

Build a tool that evaluates influencer accounts โ€” engagement rate, posting frequency, audience quality โ€” before sponsorship deals.

# influencer_analytics.py
import requests
from datetime import datetime, timezone

MANTIS_KEY = "YOUR_API_KEY"

def analyze_influencer(username: str) -> dict:
    """Calculate engagement metrics for an influencer."""
    resp = requests.post(
        "https://api.mantisapi.com/v1/scrape",
        headers={
            "Authorization": f"Bearer {MANTIS_KEY}",
            "Content-Type": "application/json",
        },
        json={
            "url": f"https://www.instagram.com/{username}/",
            "extract": {
                "username": "Instagram username",
                "followers": "follower count as integer",
                "following": "following count as integer",
                "posts_count": "total posts as integer",
                "is_verified": "boolean",
                "bio": "biography text",
                "recent_posts": (
                    "array of last 12 posts with: "
                    "likes (integer), comments (integer), "
                    "caption (string), is_video (boolean)"
                ),
            },
            "render_js": True,
        },
    )
    data = resp.json()

    posts = data.get("recent_posts", [])
    followers = data.get("followers", 1)

    if posts and followers > 0:
        total_engagement = sum(
            p.get("likes", 0) + p.get("comments", 0)
            for p in posts
        )
        avg_engagement = total_engagement / len(posts)
        engagement_rate = (avg_engagement / followers) * 100

        avg_likes = sum(
            p.get("likes", 0) for p in posts
        ) / len(posts)
        avg_comments = sum(
            p.get("comments", 0) for p in posts
        ) / len(posts)

        video_ratio = sum(
            1 for p in posts if p.get("is_video")
        ) / len(posts)
    else:
        engagement_rate = 0
        avg_likes = 0
        avg_comments = 0
        video_ratio = 0

    # Engagement rate benchmarks
    if engagement_rate > 6:
        tier = "Excellent"
    elif engagement_rate > 3:
        tier = "Good"
    elif engagement_rate > 1:
        tier = "Average"
    else:
        tier = "Below Average"

    # Follower/following ratio (health signal)
    ff_ratio = followers / max(data.get("following", 1), 1)

    return {
        "username": username,
        "followers": followers,
        "engagement_rate": round(engagement_rate, 2),
        "engagement_tier": tier,
        "avg_likes": round(avg_likes),
        "avg_comments": round(avg_comments),
        "video_ratio": round(video_ratio * 100, 1),
        "ff_ratio": round(ff_ratio, 1),
        "verified": data.get("is_verified", False),
        "estimated_post_value": (
            f"${round(followers * engagement_rate / 100 * 0.05, 2)}"
        ),
    }

# Evaluate multiple influencers
influencers = ["natgeo", "nike", "airbnb"]
for username in influencers:
    result = analyze_influencer(username)
    print(f"\n@{result['username']}:")
    print(f"  Followers: {result['followers']:,}")
    print(f"  Engagement: {result['engagement_rate']}% "
          f"({result['engagement_tier']})")
    print(f"  Avg Likes: {result['avg_likes']:,}")
    print(f"  Video Mix: {result['video_ratio']}%")
    print(f"  Est. Post Value: {result['estimated_post_value']}")

2. Brand Mention Monitor

Track when your brand is mentioned in Instagram posts and hashtags โ€” essential for reputation management and identifying UGC opportunities.

# brand_monitor.py
import requests
import json
from datetime import datetime

MANTIS_KEY = "YOUR_API_KEY"
BRAND_HASHTAGS = [
    "mantisapi", "mantis_api", "webscrapingapi"
]
COMPETITOR_ACCOUNTS = [
    "scrapingbee", "brightdata", "apify_official"
]

def monitor_hashtag(tag: str) -> dict:
    """Check a hashtag for recent brand mentions."""
    resp = requests.post(
        "https://api.mantisapi.com/v1/scrape",
        headers={
            "Authorization": f"Bearer {MANTIS_KEY}",
            "Content-Type": "application/json",
        },
        json={
            "url": (
                f"https://www.instagram.com/explore/tags/{tag}/"
            ),
            "extract": {
                "post_count": "total posts with this hashtag",
                "top_posts": (
                    "array of top 9 posts with: "
                    "author, caption, likes, comments, "
                    "post_url"
                ),
            },
            "render_js": True,
        },
    )
    return {"hashtag": tag, **resp.json()}

def monitor_competitor(username: str) -> dict:
    """Check competitor's recent posts."""
    resp = requests.post(
        "https://api.mantisapi.com/v1/scrape",
        headers={
            "Authorization": f"Bearer {MANTIS_KEY}",
            "Content-Type": "application/json",
        },
        json={
            "url": f"https://www.instagram.com/{username}/",
            "extract": {
                "followers": "follower count",
                "recent_posts": (
                    "last 6 posts: caption, likes, comments"
                ),
            },
            "render_js": True,
        },
    )
    return {"competitor": username, **resp.json()}

# Daily monitoring run
report = {
    "timestamp": datetime.utcnow().isoformat(),
    "hashtags": [
        monitor_hashtag(tag) for tag in BRAND_HASHTAGS
    ],
    "competitors": [
        monitor_competitor(acc) for acc in COMPETITOR_ACCOUNTS
    ],
}

# Save daily report
date = datetime.utcnow().strftime("%Y-%m-%d")
with open(f"instagram_report_{date}.json", "w") as f:
    json.dump(report, f, indent=2)

print(f"Report saved: instagram_report_{date}.json")

3. AI Agent Social Intelligence

Give an AI agent the ability to research brands, influencers, and trends on Instagram โ€” a key capability for marketing AI assistants.

# agent_instagram.py โ€” LangChain tool
from langchain.tools import tool
import requests

MANTIS_KEY = "YOUR_API_KEY"

@tool
def research_instagram_account(username: str) -> str:
    """Research an Instagram account. Returns profile info,
    engagement metrics, and recent post activity."""
    resp = requests.post(
        "https://api.mantisapi.com/v1/scrape",
        headers={
            "Authorization": f"Bearer {MANTIS_KEY}",
            "Content-Type": "application/json",
        },
        json={
            "url": f"https://www.instagram.com/{username}/",
            "extract": {
                "full_name": "display name",
                "bio": "biography",
                "followers": "follower count",
                "following": "following count",
                "posts_count": "total posts",
                "is_verified": "verified status",
                "category": "business category",
                "website": "external URL",
                "recent_posts": (
                    "last 6 posts: caption (first 100 chars), "
                    "likes, comments, is_video"
                ),
            },
            "render_js": True,
        },
    )
    data = resp.json()

    posts = data.get("recent_posts", [])
    followers = data.get("followers", 0)
    if posts and followers:
        avg_eng = sum(
            p.get("likes", 0) + p.get("comments", 0)
            for p in posts
        ) / len(posts)
        eng_rate = round((avg_eng / followers) * 100, 2)
    else:
        eng_rate = "N/A"

    result = f"Instagram Profile: @{username}\n"
    result += f"Name: {data.get('full_name', 'N/A')}\n"
    result += f"Bio: {data.get('bio', 'N/A')}\n"
    result += (
        f"Followers: {data.get('followers', 0):,} | "
        f"Following: {data.get('following', 0):,}\n"
    )
    result += f"Posts: {data.get('posts_count', 0):,}\n"
    result += f"Verified: {data.get('is_verified', False)}\n"
    result += f"Category: {data.get('category', 'N/A')}\n"
    result += f"Website: {data.get('website', 'N/A')}\n"
    result += f"Engagement Rate: {eng_rate}%\n\n"

    if posts:
        result += "Recent Posts:\n"
        for i, p in enumerate(posts[:3], 1):
            caption = (
                p.get("caption", "")[:80] + "..."
                if p.get("caption") else "[no caption]"
            )
            result += (
                f"  {i}. {caption}\n"
                f"     โค๏ธ {p.get('likes', 0):,} | "
                f"๐Ÿ’ฌ {p.get('comments', 0):,}\n"
            )

    return result

# Use in a LangChain agent
# agent = create_agent(
#     tools=[research_instagram_account], ...
# )

Instagram scraping carries more legal risk than most platforms due to Meta's aggressive enforcement. Key considerations:

โš–๏ธ Best Practices for Legal Safety

Only scrape publicly visible profile data (don't create fake accounts). Don't store or republish photos/videos. Respect private accounts. Don't scrape personal data of EU residents for commercial use without a legal basis. Use rate limiting to avoid disrupting the service. Consult a lawyer for commercial use cases. Consider a web scraping API that handles compliance considerations.

Production-Ready Instagram Scraping

Stop fighting login walls, proxy blocks, and endpoint changes. Mantis extracts structured Instagram data with a single API call.

View Pricing Get Started Free

Frequently Asked Questions

Is it legal to scrape Instagram data?

Scraping publicly available Instagram data is in a legal gray area. While hiQ v. LinkedIn supports scraping public data, Meta aggressively enforces its ToS and has sued scrapers. Avoid creating fake accounts, respect private profiles, and consult legal counsel for commercial use.

How do I scrape Instagram without getting blocked?

Use rotating residential proxies (datacenter IPs are instantly blocked), headless browsers with stealth plugins, delays of 5-20 seconds between requests, and fresh browser contexts. Or use a web scraping API like Mantis that handles anti-blocking automatically.

Can I use Instagram's official API instead of scraping?

The Instagram Graph API only works with accounts you own or that authorize your app, requires Meta app review, and doesn't support discovering or analyzing other public profiles. The Basic Display API was deprecated in December 2024.

What data can I extract from Instagram?

From public profiles: username, full name, bio, follower/following counts, post count, profile picture, recent posts with captions/likes/comments, reels, tagged locations, and hashtags. Private accounts only show basic profile info.

What Python library is best for scraping Instagram?

Playwright with stealth plugins is most reliable since Instagram requires JS rendering. For quick prototypes, the web_profile_info API endpoint works but changes frequently. For production, use a managed API like Mantis.

How many Instagram profiles can I scrape per day?

Without proxies: 20-50 before blocking. With residential proxies: 500-2,000. With Mantis API: up to 100,000/month on the Scale plan without managing any infrastructure.

Related Guides