Web Scraping for Market Research: How AI Agents Analyze Competitors, Trends & Opportunities

March 10, 2026 · 12 min read Market Research AI Agents

Market research firms charge $5,000–$50,000 per report. Enterprise teams spend months manually tracking competitors. What if an AI agent could do 80% of that work in minutes?

In this guide, you'll build an AI-powered market research system that scrapes competitor websites, tracks industry trends, identifies market opportunities, and generates executive-ready reports — all running autonomously with Python and the Mantis WebPerception API.

Why Traditional Market Research Is Broken

Traditional market research has three fundamental problems:

It's slow. By the time a 40-page report is published, the market has moved.
It's expensive. Analyst time, data subscriptions, and consulting fees add up fast.
It's shallow. Most research covers the top 3–5 competitors. The real threats come from the 50 startups you're not tracking.

AI agents flip this model. They scrape hundreds of competitor pages in parallel, extract structured data with LLMs, identify patterns humans would miss, and deliver fresh insights on demand — for a fraction of the cost.

The AI Market Research Stack

Here's what we're building:

Competitor Discovery — Find and catalog competitors automatically
Deep Profiling — Scrape pricing, features, positioning, and messaging
Trend Tracking — Monitor changes over time (new features, pricing shifts, messaging pivots)
Opportunity Analysis — LLM-powered gap analysis and strategic recommendations
Report Generation — Executive-ready market intelligence reports

Step 1: Competitor Discovery Agent

First, we build an agent that discovers competitors in your market. It scrapes search results, directories, and review sites to build a comprehensive competitive landscape.

import requests
import json
from openai import OpenAI
from pydantic import BaseModel
from typing import Optional

MANTIS_API_KEY = "your-mantis-api-key"
MANTIS_BASE = "https://api.mantisapi.com"

client = OpenAI()

class Competitor(BaseModel):
    name: str
    url: str
    tagline: Optional[str] = None
    category: str  # direct, indirect, adjacent
    estimated_size: Optional[str] = None  # startup, mid-market, enterprise

def scrape_url(url: str) -> dict:
    """Scrape a URL using Mantis WebPerception API."""
    resp = requests.post(
        f"{MANTIS_BASE}/scrape",
        headers={"Authorization": f"Bearer {MANTIS_API_KEY}"},
        json={"url": url, "render_js": True}
    )
    return resp.json()

def extract_competitors(url: str, market_description: str) -> list[Competitor]:
    """Scrape a page and extract competitor information with AI."""
    page = scrape_url(url)

    response = client.beta.chat.completions.parse(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": f"""You are a market research analyst.
            Extract competitor companies from this page content.
            Market context: {market_description}
            Classify each as: direct (same product), indirect (different approach, same problem),
            or adjacent (related market, potential threat)."""},
            {"role": "user", "content": page.get("text", "")[:8000]}
        ],
        response_format=Competitor
    )
    return response.choices[0].message.parsed

# Discover competitors from multiple sources
sources = [
    "https://www.g2.com/categories/web-scraping",
    "https://alternativeto.net/software/scrapy/",
    "https://www.producthunt.com/topics/web-scraping"
]

all_competitors = []
for source in sources:
    competitors = extract_competitors(source, "Web scraping APIs for AI agents")
    all_competitors.extend(competitors)

# Deduplicate by domain
seen = set()
unique = []
for c in all_competitors:
    domain = c.url.split("//")[-1].split("/")[0]
    if domain not in seen:
        seen.add(domain)
        unique.append(c)
        print(f"  [{c.category}] {c.name}: {c.url}")

print(f"\nDiscovered {len(unique)} unique competitors")

Step 2: Deep Competitor Profiling

Once we have a list of competitors, we scrape their websites for pricing, features, positioning, and messaging. This is where AI extraction really shines — every competitor structures their site differently, but AI handles the variation effortlessly.

from pydantic import BaseModel
from typing import Optional

class CompetitorProfile(BaseModel):
    name: str
    tagline: str
    value_proposition: str
    target_audience: str
    key_features: list[str]
    pricing_model: str  # free, freemium, usage-based, subscription, enterprise
    starting_price: Optional[str] = None
    enterprise_plan: bool
    free_tier: bool
    differentiators: list[str]
    weaknesses: list[str]  # based on messaging gaps or reviews
    tech_stack_signals: list[str]  # any tech mentioned (APIs, SDKs, languages)
    recent_changes: list[str]  # new features, pivots, announcements

def profile_competitor(competitor_url: str) -> CompetitorProfile:
    """Build a deep profile of a competitor by scraping key pages."""

    # Scrape multiple pages for comprehensive data
    pages_to_scrape = [
        competitor_url,                    # Homepage
        f"{competitor_url}/pricing",       # Pricing
        f"{competitor_url}/features",      # Features
        f"{competitor_url}/about",         # About/team
        f"{competitor_url}/changelog",     # Recent changes
    ]

    combined_content = ""
    for page_url in pages_to_scrape:
        try:
            result = scrape_url(page_url)
            combined_content += f"\n\n--- {page_url} ---\n"
            combined_content += result.get("text", "")[:4000]
        except Exception:
            continue

    response = client.beta.chat.completions.parse(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": """You are a competitive intelligence analyst.
            Build a comprehensive competitor profile from these web pages.
            Be specific with pricing. Identify real differentiators vs marketing fluff.
            For weaknesses, infer from what they DON'T mention or emphasize."""},
            {"role": "user", "content": combined_content[:12000]}
        ],
        response_format=CompetitorProfile
    )
    return response.choices[0].message.parsed

# Profile each competitor
profiles = {}
for competitor in unique[:10]:  # Top 10 competitors
    print(f"Profiling {competitor.name}...")
    profile = profile_competitor(competitor.url)
    profiles[competitor.name] = profile
    print(f"  Pricing: {profile.pricing_model} (starts at {profile.starting_price})")
    print(f"  Features: {len(profile.key_features)} identified")
    print(f"  Differentiators: {', '.join(profile.differentiators[:3])}")

Step 3: Trend Tracking with Change Detection

Market research isn't a one-time event. The real value comes from tracking how competitors evolve over time. This system stores snapshots and uses AI to detect meaningful changes.

import sqlite3
import hashlib
from datetime import datetime

def init_db():
    conn = sqlite3.connect("market_research.db")
    conn.execute("""
        CREATE TABLE IF NOT EXISTS snapshots (
            id INTEGER PRIMARY KEY,
            competitor TEXT,
            page_type TEXT,
            content_hash TEXT,
            content TEXT,
            scraped_at TEXT,
            ai_summary TEXT
        )
    """)
    conn.execute("""
        CREATE TABLE IF NOT EXISTS changes (
            id INTEGER PRIMARY KEY,
            competitor TEXT,
            change_type TEXT,
            severity TEXT,
            description TEXT,
            detected_at TEXT,
            strategic_impact TEXT
        )
    """)
    conn.commit()
    return conn

def track_competitor_changes(competitor_name: str, url: str, page_type: str, conn):
    """Scrape, compare to last snapshot, and detect meaningful changes."""
    result = scrape_url(url)
    content = result.get("text", "")
    content_hash = hashlib.sha256(content.encode()).hexdigest()

    # Check last snapshot
    cursor = conn.execute(
        "SELECT content, ai_summary FROM snapshots WHERE competitor=? AND page_type=? ORDER BY scraped_at DESC LIMIT 1",
        (competitor_name, page_type)
    )
    last = cursor.fetchone()

    if last and hashlib.sha256(last[0].encode()).hexdigest() == content_hash:
        return None  # No change

    # Store new snapshot
    summary = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": f"Summarize this {page_type} page for {competitor_name} in 2-3 sentences. Focus on key data points."},
            {"role": "user", "content": content[:6000]}
        ]
    ).choices[0].message.content

    conn.execute(
        "INSERT INTO snapshots (competitor, page_type, content_hash, content, scraped_at, ai_summary) VALUES (?, ?, ?, ?, ?, ?)",
        (competitor_name, page_type, content_hash, content[:10000], datetime.now().isoformat(), summary)
    )

    if last:
        # Detect what changed using AI
        change_analysis = client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "system", "content": """You are a competitive intelligence analyst.
                Compare the old and new versions of this page. Identify:
                1. What specifically changed
                2. Severity: critical (pricing/positioning shift), important (new feature), minor (copy tweak)
                3. Strategic impact for competitors
                Be specific. Focus on business-relevant changes only."""},
                {"role": "user", "content": f"PREVIOUS:\n{last[0][:5000]}\n\nCURRENT:\n{content[:5000]}"}
            ]
        ).choices[0].message.content

        conn.execute(
            "INSERT INTO changes (competitor, change_type, severity, description, detected_at, strategic_impact) VALUES (?, ?, ?, ?, ?, ?)",
            (competitor_name, page_type, "pending", change_analysis, datetime.now().isoformat(), "")
        )

    conn.commit()
    return summary

Step 4: AI-Powered Opportunity Analysis

This is where market research becomes strategy. Feed all your competitive profiles into an LLM for gap analysis and opportunity identification.

def generate_opportunity_analysis(profiles: dict, your_product: dict) -> str:
    """Generate strategic opportunities based on competitive landscape."""

    profiles_summary = ""
    for name, profile in profiles.items():
        profiles_summary += f"""
        {name}:
        - Pricing: {profile.pricing_model} (starts at {profile.starting_price})
        - Target: {profile.target_audience}
        - Differentiators: {', '.join(profile.differentiators)}
        - Weaknesses: {', '.join(profile.weaknesses)}
        - Features: {', '.join(profile.key_features[:5])}
        """

    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": """You are a strategic market analyst.
            Analyze the competitive landscape and identify:
            1. MARKET GAPS — Features or segments no one serves well
            2. PRICING OPPORTUNITIES — Underserved price points or models
            3. POSITIONING ANGLES — Unique positioning no competitor owns
            4. TIMING PLAYS — Trends that create windows of opportunity
            5. THREATS — Competitive moves that could hurt our position

            Be specific, actionable, and honest. No generic advice."""},
            {"role": "user", "content": f"""
            OUR PRODUCT:
            {json.dumps(your_product, indent=2)}

            COMPETITIVE LANDSCAPE:
            {profiles_summary}

            Identify the top 5 strategic opportunities and top 3 threats."""}
        ]
    )
    return response.choices[0].message.content

our_product = {
    "name": "Mantis WebPerception API",
    "focus": "Web scraping API built for AI agents",
    "differentiator": "AI-powered data extraction, agent framework integrations",
    "pricing": "Usage-based, starts free, Pro at $99/mo"
}

analysis = generate_opportunity_analysis(profiles, our_product)
print(analysis)

Step 5: Executive Report Generation

Automate the final deliverable — a market intelligence report that any stakeholder can read.

def generate_market_report(
    profiles: dict,
    changes: list,
    opportunities: str,
    market_name: str
) -> str:
    """Generate an executive-ready market intelligence report."""

    changes_text = "\n".join([
        f"- [{c['competitor']}] {c['description'][:200]}"
        for c in changes[:20]
    ])

    profiles_text = ""
    for name, p in profiles.items():
        profiles_text += f"""
### {name}
- **Value Prop:** {p.value_proposition}
- **Pricing:** {p.pricing_model} ({p.starting_price or 'not public'})
- **Target:** {p.target_audience}
- **Key Differentiators:** {', '.join(p.differentiators[:3])}
"""

    report = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": """You are a market intelligence analyst writing
            an executive report. Structure it as:

            1. EXECUTIVE SUMMARY (3-4 sentences)
            2. MARKET OVERVIEW (size, growth, key trends)
            3. COMPETITIVE LANDSCAPE (organized by tier)
            4. RECENT COMPETITIVE MOVES (what changed this period)
            5. OPPORTUNITY ANALYSIS (where to invest)
            6. THREAT ASSESSMENT (what to watch)
            7. RECOMMENDED ACTIONS (specific next steps)

            Write for a CEO audience. Be concise, specific, and actionable.
            Use markdown formatting."""},
            {"role": "user", "content": f"""
            MARKET: {market_name}
            DATE: {datetime.now().strftime('%B %Y')}

            COMPETITOR PROFILES:
            {profiles_text}

            RECENT CHANGES:
            {changes_text}

            OPPORTUNITY ANALYSIS:
            {opportunities}
            """}
        ]
    ).choices[0].message.content

    return report

# Generate and save the report
report = generate_market_report(
    profiles=profiles,
    changes=[],  # Would come from your changes DB
    opportunities=analysis,
    market_name="Web Scraping APIs for AI Agents"
)

# Save as markdown
with open(f"market_report_{datetime.now().strftime('%Y%m')}.md", "w") as f:
    f.write(report)
print("Report generated!")

Cost Comparison: Traditional vs AI Agent Market Research

Approach	Cost	Time	Depth	Freshness
Consulting firm	$10,000–$50,000	4–8 weeks	5–10 competitors	Stale on delivery
In-house analyst team	$5,000–$15,000/mo	Ongoing	10–20 competitors	Weekly updates
Market research platforms	$500–$2,000/mo	Instant (limited)	Pre-built reports only	Monthly
AI agent + Mantis API	$99–$299/mo	Minutes	50+ competitors	Real-time

Real-World Use Cases

1. VC Due Diligence

Before investing, VCs need to understand the competitive landscape. An AI agent can profile every competitor in a market within hours — scraping pricing pages, feature lists, team pages, and press releases — and generate a competitive overview that would take an analyst a week.

2. Product Launch Intelligence

Before launching a new product or feature, scrape every competitor's messaging and positioning. Identify the white space — what language and positioning nobody owns yet. Launch into the gap.

3. Pricing Strategy

Track competitor pricing changes in real-time. When a competitor raises prices, that's your window to capture price-sensitive customers. When they launch a free tier, you need to respond. AI agents catch these changes within hours, not weeks.

4. Trend Monitoring for Strategic Planning

Scrape industry blogs, Product Hunt, GitHub trending, and Hacker News to identify emerging trends before they hit the mainstream. Feed this into quarterly strategy sessions with real data instead of gut feelings.

Production Architecture

For a production market research system, structure your agents like this:

# Production architecture
# ├── discovery_agent.py      — Finds new competitors weekly
# ├── profiling_agent.py      — Deep profiles on new competitors
# ├── monitoring_agent.py     — Daily change detection on key pages
# ├── analysis_agent.py       — Weekly opportunity/threat analysis
# ├── report_agent.py         — Monthly executive reports
# └── alert_agent.py          — Real-time Slack alerts for critical changes
#
# Schedule:
#   Daily:   monitoring_agent (scrape competitor pricing + features pages)
#   Weekly:  discovery_agent + analysis_agent
#   Monthly: profiling_agent (re-profile all) + report_agent
#
# Storage: SQLite or PostgreSQL for snapshots and changes
# Alerts: Slack webhook for critical competitive moves
# Reports: Generated as Markdown → converted to PDF for stakeholders

Best Practices for AI Market Research

Start narrow, expand later. Profile your top 5 direct competitors before trying to track 100. Get the system right, then scale.
Focus on decisions, not data. Every insight should answer "so what should we do?" If it doesn't, it's noise.
Validate AI analysis. LLMs can hallucinate competitive details. Cross-reference critical claims with the source data.
Track your own site too. Monitor how competitors might see you. Are your differentiators clear? Is your pricing competitive?
Respect robots.txt. Scrape responsibly. Use reasonable rate limits. Don't scrape content behind authentication.
Version your reports. Store every generated report. Being able to look back at "what did the market look like 6 months ago?" is incredibly valuable.

Build Your Market Research Agent

Start scraping competitor data with AI-powered extraction. Free tier includes 100 requests/month.

Get Your API Key →

What's Next

With a market research agent running, you'll have competitive intelligence that would cost thousands from a consulting firm — updated in real-time for a fraction of the price. The key is to go beyond data collection and into automated analysis. Let the AI tell you what matters.

For related guides, check out: