Web Scraping for Market Research: How AI Agents Analyze Competitors, Trends & Opportunities
Market research firms charge $5,000โ$50,000 per report. Enterprise teams spend months manually tracking competitors. What if an AI agent could do 80% of that work in minutes?
In this guide, you'll build an AI-powered market research system that scrapes competitor websites, tracks industry trends, identifies market opportunities, and generates executive-ready reports โ all running autonomously with Python and the Mantis WebPerception API.
Why Traditional Market Research Is Broken
Traditional market research has three fundamental problems:
- It's slow. By the time a 40-page report is published, the market has moved.
- It's expensive. Analyst time, data subscriptions, and consulting fees add up fast.
- It's shallow. Most research covers the top 3โ5 competitors. The real threats come from the 50 startups you're not tracking.
AI agents flip this model. They scrape hundreds of competitor pages in parallel, extract structured data with LLMs, identify patterns humans would miss, and deliver fresh insights on demand โ for a fraction of the cost.
The AI Market Research Stack
Here's what we're building:
- Competitor Discovery โ Find and catalog competitors automatically
- Deep Profiling โ Scrape pricing, features, positioning, and messaging
- Trend Tracking โ Monitor changes over time (new features, pricing shifts, messaging pivots)
- Opportunity Analysis โ LLM-powered gap analysis and strategic recommendations
- Report Generation โ Executive-ready market intelligence reports
Step 1: Competitor Discovery Agent
First, we build an agent that discovers competitors in your market. It scrapes search results, directories, and review sites to build a comprehensive competitive landscape.
import requests
import json
from openai import OpenAI
from pydantic import BaseModel
from typing import Optional
MANTIS_API_KEY = "your-mantis-api-key"
MANTIS_BASE = "https://api.mantisapi.com"
client = OpenAI()
class Competitor(BaseModel):
name: str
url: str
tagline: Optional[str] = None
category: str # direct, indirect, adjacent
estimated_size: Optional[str] = None # startup, mid-market, enterprise
def scrape_url(url: str) -> dict:
"""Scrape a URL using Mantis WebPerception API."""
resp = requests.post(
f"{MANTIS_BASE}/scrape",
headers={"Authorization": f"Bearer {MANTIS_API_KEY}"},
json={"url": url, "render_js": True}
)
return resp.json()
def extract_competitors(url: str, market_description: str) -> list[Competitor]:
"""Scrape a page and extract competitor information with AI."""
page = scrape_url(url)
response = client.beta.chat.completions.parse(
model="gpt-4o",
messages=[
{"role": "system", "content": f"""You are a market research analyst.
Extract competitor companies from this page content.
Market context: {market_description}
Classify each as: direct (same product), indirect (different approach, same problem),
or adjacent (related market, potential threat)."""},
{"role": "user", "content": page.get("text", "")[:8000]}
],
response_format=Competitor
)
return response.choices[0].message.parsed
# Discover competitors from multiple sources
sources = [
"https://www.g2.com/categories/web-scraping",
"https://alternativeto.net/software/scrapy/",
"https://www.producthunt.com/topics/web-scraping"
]
all_competitors = []
for source in sources:
competitors = extract_competitors(source, "Web scraping APIs for AI agents")
all_competitors.extend(competitors)
# Deduplicate by domain
seen = set()
unique = []
for c in all_competitors:
domain = c.url.split("//")[-1].split("/")[0]
if domain not in seen:
seen.add(domain)
unique.append(c)
print(f" [{c.category}] {c.name}: {c.url}")
print(f"\nDiscovered {len(unique)} unique competitors")
Step 2: Deep Competitor Profiling
Once we have a list of competitors, we scrape their websites for pricing, features, positioning, and messaging. This is where AI extraction really shines โ every competitor structures their site differently, but AI handles the variation effortlessly.
from pydantic import BaseModel
from typing import Optional
class CompetitorProfile(BaseModel):
name: str
tagline: str
value_proposition: str
target_audience: str
key_features: list[str]
pricing_model: str # free, freemium, usage-based, subscription, enterprise
starting_price: Optional[str] = None
enterprise_plan: bool
free_tier: bool
differentiators: list[str]
weaknesses: list[str] # based on messaging gaps or reviews
tech_stack_signals: list[str] # any tech mentioned (APIs, SDKs, languages)
recent_changes: list[str] # new features, pivots, announcements
def profile_competitor(competitor_url: str) -> CompetitorProfile:
"""Build a deep profile of a competitor by scraping key pages."""
# Scrape multiple pages for comprehensive data
pages_to_scrape = [
competitor_url, # Homepage
f"{competitor_url}/pricing", # Pricing
f"{competitor_url}/features", # Features
f"{competitor_url}/about", # About/team
f"{competitor_url}/changelog", # Recent changes
]
combined_content = ""
for page_url in pages_to_scrape:
try:
result = scrape_url(page_url)
combined_content += f"\n\n--- {page_url} ---\n"
combined_content += result.get("text", "")[:4000]
except Exception:
continue
response = client.beta.chat.completions.parse(
model="gpt-4o",
messages=[
{"role": "system", "content": """You are a competitive intelligence analyst.
Build a comprehensive competitor profile from these web pages.
Be specific with pricing. Identify real differentiators vs marketing fluff.
For weaknesses, infer from what they DON'T mention or emphasize."""},
{"role": "user", "content": combined_content[:12000]}
],
response_format=CompetitorProfile
)
return response.choices[0].message.parsed
# Profile each competitor
profiles = {}
for competitor in unique[:10]: # Top 10 competitors
print(f"Profiling {competitor.name}...")
profile = profile_competitor(competitor.url)
profiles[competitor.name] = profile
print(f" Pricing: {profile.pricing_model} (starts at {profile.starting_price})")
print(f" Features: {len(profile.key_features)} identified")
print(f" Differentiators: {', '.join(profile.differentiators[:3])}")
Step 3: Trend Tracking with Change Detection
Market research isn't a one-time event. The real value comes from tracking how competitors evolve over time. This system stores snapshots and uses AI to detect meaningful changes.
import sqlite3
import hashlib
from datetime import datetime
def init_db():
conn = sqlite3.connect("market_research.db")
conn.execute("""
CREATE TABLE IF NOT EXISTS snapshots (
id INTEGER PRIMARY KEY,
competitor TEXT,
page_type TEXT,
content_hash TEXT,
content TEXT,
scraped_at TEXT,
ai_summary TEXT
)
""")
conn.execute("""
CREATE TABLE IF NOT EXISTS changes (
id INTEGER PRIMARY KEY,
competitor TEXT,
change_type TEXT,
severity TEXT,
description TEXT,
detected_at TEXT,
strategic_impact TEXT
)
""")
conn.commit()
return conn
def track_competitor_changes(competitor_name: str, url: str, page_type: str, conn):
"""Scrape, compare to last snapshot, and detect meaningful changes."""
result = scrape_url(url)
content = result.get("text", "")
content_hash = hashlib.sha256(content.encode()).hexdigest()
# Check last snapshot
cursor = conn.execute(
"SELECT content, ai_summary FROM snapshots WHERE competitor=? AND page_type=? ORDER BY scraped_at DESC LIMIT 1",
(competitor_name, page_type)
)
last = cursor.fetchone()
if last and hashlib.sha256(last[0].encode()).hexdigest() == content_hash:
return None # No change
# Store new snapshot
summary = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": f"Summarize this {page_type} page for {competitor_name} in 2-3 sentences. Focus on key data points."},
{"role": "user", "content": content[:6000]}
]
).choices[0].message.content
conn.execute(
"INSERT INTO snapshots (competitor, page_type, content_hash, content, scraped_at, ai_summary) VALUES (?, ?, ?, ?, ?, ?)",
(competitor_name, page_type, content_hash, content[:10000], datetime.now().isoformat(), summary)
)
if last:
# Detect what changed using AI
change_analysis = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": """You are a competitive intelligence analyst.
Compare the old and new versions of this page. Identify:
1. What specifically changed
2. Severity: critical (pricing/positioning shift), important (new feature), minor (copy tweak)
3. Strategic impact for competitors
Be specific. Focus on business-relevant changes only."""},
{"role": "user", "content": f"PREVIOUS:\n{last[0][:5000]}\n\nCURRENT:\n{content[:5000]}"}
]
).choices[0].message.content
conn.execute(
"INSERT INTO changes (competitor, change_type, severity, description, detected_at, strategic_impact) VALUES (?, ?, ?, ?, ?, ?)",
(competitor_name, page_type, "pending", change_analysis, datetime.now().isoformat(), "")
)
conn.commit()
return summary
Step 4: AI-Powered Opportunity Analysis
This is where market research becomes strategy. Feed all your competitive profiles into an LLM for gap analysis and opportunity identification.
def generate_opportunity_analysis(profiles: dict, your_product: dict) -> str:
"""Generate strategic opportunities based on competitive landscape."""
profiles_summary = ""
for name, profile in profiles.items():
profiles_summary += f"""
{name}:
- Pricing: {profile.pricing_model} (starts at {profile.starting_price})
- Target: {profile.target_audience}
- Differentiators: {', '.join(profile.differentiators)}
- Weaknesses: {', '.join(profile.weaknesses)}
- Features: {', '.join(profile.key_features[:5])}
"""
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": """You are a strategic market analyst.
Analyze the competitive landscape and identify:
1. MARKET GAPS โ Features or segments no one serves well
2. PRICING OPPORTUNITIES โ Underserved price points or models
3. POSITIONING ANGLES โ Unique positioning no competitor owns
4. TIMING PLAYS โ Trends that create windows of opportunity
5. THREATS โ Competitive moves that could hurt our position
Be specific, actionable, and honest. No generic advice."""},
{"role": "user", "content": f"""
OUR PRODUCT:
{json.dumps(your_product, indent=2)}
COMPETITIVE LANDSCAPE:
{profiles_summary}
Identify the top 5 strategic opportunities and top 3 threats."""}
]
)
return response.choices[0].message.content
our_product = {
"name": "Mantis WebPerception API",
"focus": "Web scraping API built for AI agents",
"differentiator": "AI-powered data extraction, agent framework integrations",
"pricing": "Usage-based, starts free, Pro at $99/mo"
}
analysis = generate_opportunity_analysis(profiles, our_product)
print(analysis)
Step 5: Executive Report Generation
Automate the final deliverable โ a market intelligence report that any stakeholder can read.
def generate_market_report(
profiles: dict,
changes: list,
opportunities: str,
market_name: str
) -> str:
"""Generate an executive-ready market intelligence report."""
changes_text = "\n".join([
f"- [{c['competitor']}] {c['description'][:200]}"
for c in changes[:20]
])
profiles_text = ""
for name, p in profiles.items():
profiles_text += f"""
### {name}
- **Value Prop:** {p.value_proposition}
- **Pricing:** {p.pricing_model} ({p.starting_price or 'not public'})
- **Target:** {p.target_audience}
- **Key Differentiators:** {', '.join(p.differentiators[:3])}
"""
report = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": """You are a market intelligence analyst writing
an executive report. Structure it as:
1. EXECUTIVE SUMMARY (3-4 sentences)
2. MARKET OVERVIEW (size, growth, key trends)
3. COMPETITIVE LANDSCAPE (organized by tier)
4. RECENT COMPETITIVE MOVES (what changed this period)
5. OPPORTUNITY ANALYSIS (where to invest)
6. THREAT ASSESSMENT (what to watch)
7. RECOMMENDED ACTIONS (specific next steps)
Write for a CEO audience. Be concise, specific, and actionable.
Use markdown formatting."""},
{"role": "user", "content": f"""
MARKET: {market_name}
DATE: {datetime.now().strftime('%B %Y')}
COMPETITOR PROFILES:
{profiles_text}
RECENT CHANGES:
{changes_text}
OPPORTUNITY ANALYSIS:
{opportunities}
"""}
]
).choices[0].message.content
return report
# Generate and save the report
report = generate_market_report(
profiles=profiles,
changes=[], # Would come from your changes DB
opportunities=analysis,
market_name="Web Scraping APIs for AI Agents"
)
# Save as markdown
with open(f"market_report_{datetime.now().strftime('%Y%m')}.md", "w") as f:
f.write(report)
print("Report generated!")
Cost Comparison: Traditional vs AI Agent Market Research
| Approach | Cost | Time | Depth | Freshness |
|---|---|---|---|---|
| Consulting firm | $10,000โ$50,000 | 4โ8 weeks | 5โ10 competitors | Stale on delivery |
| In-house analyst team | $5,000โ$15,000/mo | Ongoing | 10โ20 competitors | Weekly updates |
| Market research platforms | $500โ$2,000/mo | Instant (limited) | Pre-built reports only | Monthly |
| AI agent + Mantis API | $99โ$299/mo | Minutes | 50+ competitors | Real-time |
Real-World Use Cases
1. VC Due Diligence
Before investing, VCs need to understand the competitive landscape. An AI agent can profile every competitor in a market within hours โ scraping pricing pages, feature lists, team pages, and press releases โ and generate a competitive overview that would take an analyst a week.
2. Product Launch Intelligence
Before launching a new product or feature, scrape every competitor's messaging and positioning. Identify the white space โ what language and positioning nobody owns yet. Launch into the gap.
3. Pricing Strategy
Track competitor pricing changes in real-time. When a competitor raises prices, that's your window to capture price-sensitive customers. When they launch a free tier, you need to respond. AI agents catch these changes within hours, not weeks.
4. Trend Monitoring for Strategic Planning
Scrape industry blogs, Product Hunt, GitHub trending, and Hacker News to identify emerging trends before they hit the mainstream. Feed this into quarterly strategy sessions with real data instead of gut feelings.
Production Architecture
For a production market research system, structure your agents like this:
# Production architecture
# โโโ discovery_agent.py โ Finds new competitors weekly
# โโโ profiling_agent.py โ Deep profiles on new competitors
# โโโ monitoring_agent.py โ Daily change detection on key pages
# โโโ analysis_agent.py โ Weekly opportunity/threat analysis
# โโโ report_agent.py โ Monthly executive reports
# โโโ alert_agent.py โ Real-time Slack alerts for critical changes
#
# Schedule:
# Daily: monitoring_agent (scrape competitor pricing + features pages)
# Weekly: discovery_agent + analysis_agent
# Monthly: profiling_agent (re-profile all) + report_agent
#
# Storage: SQLite or PostgreSQL for snapshots and changes
# Alerts: Slack webhook for critical competitive moves
# Reports: Generated as Markdown โ converted to PDF for stakeholders
Best Practices for AI Market Research
- Start narrow, expand later. Profile your top 5 direct competitors before trying to track 100. Get the system right, then scale.
- Focus on decisions, not data. Every insight should answer "so what should we do?" If it doesn't, it's noise.
- Validate AI analysis. LLMs can hallucinate competitive details. Cross-reference critical claims with the source data.
- Track your own site too. Monitor how competitors might see you. Are your differentiators clear? Is your pricing competitive?
- Respect robots.txt. Scrape responsibly. Use reasonable rate limits. Don't scrape content behind authentication.
- Version your reports. Store every generated report. Being able to look back at "what did the market look like 6 months ago?" is incredibly valuable.
Build Your Market Research Agent
Start scraping competitor data with AI-powered extraction. Free tier includes 100 requests/month.
Get Your API Key โWhat's Next
With a market research agent running, you'll have competitive intelligence that would cost thousands from a consulting firm โ updated in real-time for a fraction of the price. The key is to go beyond data collection and into automated analysis. Let the AI tell you what matters.
For related guides, check out: