Web Scraping for SEO Monitoring: Track Rankings, Competitors & Content Changes with AI
SEO teams spend hours manually checking rankings, reviewing competitor pages, and hunting for content gaps. Traditional SEO tools like Ahrefs and SEMrush cost $100β500/month and still miss the nuance β they tell you what changed, but not why it matters or what to do about it.
What if an AI agent could monitor your SEO landscape 24/7, detect meaningful changes, analyze their strategic impact, and send you actionable recommendations β all for a fraction of the cost?
In this guide, you'll build exactly that: an AI-powered SEO monitoring system using Python and the Mantis WebPerception API.
What You'll Build
By the end of this tutorial, you'll have a system that:
- Tracks keyword rankings β scrapes search results for your target keywords daily
- Monitors competitor content β detects when competitors publish, update, or restructure pages
- Finds content gaps β uses AI to identify topics your competitors cover that you don't
- Analyzes SERP changes β understands featured snippets, People Also Ask, and SERP feature shifts
- Sends strategic alerts β LLM-powered analysis tells you what changed and what to do about it
Why AI Changes SEO Monitoring
Traditional SEO tools track metrics β positions, traffic estimates, backlink counts. But they can't understand the content itself. AI-powered monitoring adds a layer that traditional tools miss:
| Capability | Traditional SEO Tools | AI-Powered Monitoring |
|---|---|---|
| Ranking tracking | β Position numbers | β Position + context (why it moved) |
| Content change detection | β Diff-based (noisy) | β Semantic understanding of changes |
| Competitor analysis | β οΈ Keyword overlap only | β Full content gap + intent analysis |
| SERP feature tracking | β Feature presence | β Feature analysis + optimization tips |
| Actionable recommendations | β Raw data only | β Prioritized action items |
| Cost | $100β500/mo | ~$40β80/mo |
Architecture Overview
The system runs on a simple loop: scrape β store β compare β analyze β alert.
# System architecture
#
# βββββββββββββββ βββββββββββββββββ ββββββββββββ
# β SERP ScraperβββββΆβ Content βββββΆβ SQLite β
# β (Mantis API)β β Extractor β β Database β
# βββββββββββββββ βββββββββββββββββ ββββββ¬ββββββ
# β
# βββββββββββββββββ βββββββΌββββββ
# β Alert System ββββββ AI Change β
# β (Slack/Email) β β Analyzer β
# βββββββββββββββββ βββββββββββββ
Step 1: Set Up the SERP Scraper
First, we'll build a function that scrapes Google search results for any keyword and extracts structured ranking data using AI.
import requests
import json
from datetime import datetime
from pydantic import BaseModel
from typing import Optional
MANTIS_API_KEY = "your-api-key"
BASE_URL = "https://api.mantisapi.com/v1"
class SERPResult(BaseModel):
position: int
title: str
url: str
description: str
domain: str
is_featured_snippet: bool = False
serp_feature: Optional[str] = None # "featured_snippet", "paa", "video", etc.
class SERPData(BaseModel):
keyword: str
results: list[SERPResult]
total_results_estimate: Optional[str] = None
featured_snippet_present: bool = False
people_also_ask: list[str] = []
scraped_at: str
def scrape_serp(keyword: str, num_results: int = 20) -> SERPData:
"""Scrape Google SERP for a keyword and extract structured data."""
# Use Mantis to scrape the search results page
search_url = f"https://www.google.com/search?q={keyword.replace(' ', '+')}&num={num_results}"
response = requests.post(
f"{BASE_URL}/extract",
headers={"Authorization": f"Bearer {MANTIS_API_KEY}"},
json={
"url": search_url,
"prompt": f"""Extract all organic search results for the keyword "{keyword}".
For each result, provide:
- position (1-indexed ranking)
- title (the blue link text)
- url (the destination URL)
- description (the snippet text)
- domain (just the domain name)
- is_featured_snippet (true if this is the featured snippet)
- serp_feature (if this result has a special SERP feature like "video", "paa", "image_pack", etc.)
Also extract:
- total_results_estimate (the "About X results" text)
- featured_snippet_present (true/false)
- people_also_ask (list of PAA questions if present)""",
"schema": SERPData.model_json_schema()
}
)
data = response.json()["data"]
data["keyword"] = keyword
data["scraped_at"] = datetime.now().isoformat()
return SERPData(**data)
Step 2: Store Rankings in SQLite
We need historical data to detect changes. SQLite keeps it simple and portable.
import sqlite3
def init_db(db_path: str = "seo_monitor.db"):
conn = sqlite3.connect(db_path)
conn.execute("""
CREATE TABLE IF NOT EXISTS rankings (
id INTEGER PRIMARY KEY AUTOINCREMENT,
keyword TEXT NOT NULL,
position INTEGER NOT NULL,
title TEXT,
url TEXT NOT NULL,
domain TEXT,
description TEXT,
is_featured_snippet BOOLEAN DEFAULT FALSE,
serp_feature TEXT,
scraped_at TEXT NOT NULL,
created_at TEXT DEFAULT CURRENT_TIMESTAMP
)
""")
conn.execute("""
CREATE TABLE IF NOT EXISTS content_snapshots (
id INTEGER PRIMARY KEY AUTOINCREMENT,
url TEXT NOT NULL,
content_hash TEXT,
title TEXT,
word_count INTEGER,
headings TEXT, -- JSON array of h1/h2/h3
main_topics TEXT, -- AI-extracted topics
scraped_at TEXT NOT NULL
)
""")
conn.execute("""
CREATE TABLE IF NOT EXISTS alerts (
id INTEGER PRIMARY KEY AUTOINCREMENT,
alert_type TEXT NOT NULL,
severity TEXT NOT NULL,
keyword TEXT,
url TEXT,
summary TEXT NOT NULL,
recommendation TEXT,
created_at TEXT DEFAULT CURRENT_TIMESTAMP,
acknowledged BOOLEAN DEFAULT FALSE
)
""")
conn.commit()
return conn
def save_rankings(conn, serp_data: SERPData):
for result in serp_data.results:
conn.execute("""
INSERT INTO rankings (keyword, position, title, url, domain,
description, is_featured_snippet, serp_feature, scraped_at)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)
""", (serp_data.keyword, result.position, result.title, result.url,
result.domain, result.description, result.is_featured_snippet,
result.serp_feature, serp_data.scraped_at))
conn.commit()
Step 3: Detect Ranking Changes
Compare today's rankings against yesterday's to find meaningful movements.
from dataclasses import dataclass
@dataclass
class RankingChange:
keyword: str
url: str
domain: str
old_position: int | None
new_position: int | None
change_type: str # "improved", "dropped", "new_entry", "fell_off", "stable"
positions_changed: int
def detect_ranking_changes(conn, keyword: str) -> list[RankingChange]:
"""Compare latest two scrapes for a keyword to detect rank changes."""
# Get the two most recent scrape timestamps
cursor = conn.execute("""
SELECT DISTINCT scraped_at FROM rankings
WHERE keyword = ? ORDER BY scraped_at DESC LIMIT 2
""", (keyword,))
timestamps = [row[0] for row in cursor.fetchall()]
if len(timestamps) < 2:
return [] # Need at least two snapshots
latest, previous = timestamps[0], timestamps[1]
# Get rankings for both snapshots
def get_rankings(ts):
cursor = conn.execute("""
SELECT url, domain, position FROM rankings
WHERE keyword = ? AND scraped_at = ? ORDER BY position
""", (keyword, ts))
return {row[0]: {"domain": row[1], "position": row[2]} for row in cursor}
current = get_rankings(latest)
old = get_rankings(previous)
changes = []
all_urls = set(list(current.keys()) + list(old.keys()))
for url in all_urls:
in_current = url in current
in_old = url in old
if in_current and in_old:
old_pos = old[url]["position"]
new_pos = current[url]["position"]
diff = old_pos - new_pos # positive = improved
change_type = "improved" if diff > 0 else "dropped" if diff < 0 else "stable"
changes.append(RankingChange(
keyword=keyword, url=url, domain=current[url]["domain"],
old_position=old_pos, new_position=new_pos,
change_type=change_type, positions_changed=abs(diff)
))
elif in_current and not in_old:
changes.append(RankingChange(
keyword=keyword, url=url, domain=current[url]["domain"],
old_position=None, new_position=current[url]["position"],
change_type="new_entry", positions_changed=0
))
elif not in_current and in_old:
changes.append(RankingChange(
keyword=keyword, url=url, domain=old[url]["domain"],
old_position=old[url]["position"], new_position=None,
change_type="fell_off", positions_changed=0
))
# Sort by significance: drops and new entries first
changes.sort(key=lambda c: (
0 if c.change_type in ("dropped", "fell_off") else 1,
-c.positions_changed
))
return changes
Step 4: Monitor Competitor Content Changes
Rankings shift because content changes. Monitor your top competitors' pages for updates.
import hashlib
class ContentAnalysis(BaseModel):
title: str
word_count: int
headings: list[str]
main_topics: list[str]
key_points: list[str]
content_type: str # "guide", "comparison", "tutorial", "listicle", etc.
content_quality_signals: list[str] # "has_code_examples", "has_images", "has_video", etc.
def monitor_competitor_page(url: str) -> ContentAnalysis:
"""Scrape a competitor page and extract structured content analysis."""
response = requests.post(
f"{BASE_URL}/extract",
headers={"Authorization": f"Bearer {MANTIS_API_KEY}"},
json={
"url": url,
"prompt": """Analyze this page for SEO monitoring. Extract:
- title: the page title / H1
- word_count: approximate word count of main content
- headings: all H1, H2, H3 headings as a flat list
- main_topics: the 5-8 key topics/themes covered
- key_points: the 3-5 most important claims or arguments
- content_type: classify as guide/comparison/tutorial/listicle/news/review
- content_quality_signals: list signals like has_code_examples, has_images,
has_video, has_table, has_schema_markup, has_author_bio, recently_updated""",
"schema": ContentAnalysis.model_json_schema()
}
)
return ContentAnalysis(**response.json()["data"])
def detect_content_changes(conn, url: str, new_analysis: ContentAnalysis) -> dict | None:
"""Compare new content analysis against stored snapshot."""
cursor = conn.execute("""
SELECT title, word_count, headings, main_topics FROM content_snapshots
WHERE url = ? ORDER BY scraped_at DESC LIMIT 1
""", (url,))
row = cursor.fetchone()
if not row:
# First snapshot β store and return
save_content_snapshot(conn, url, new_analysis)
return None
old_title, old_word_count, old_headings_json, old_topics_json = row
old_headings = json.loads(old_headings_json) if old_headings_json else []
old_topics = json.loads(old_topics_json) if old_topics_json else []
changes = {}
if new_analysis.title != old_title:
changes["title_changed"] = {"old": old_title, "new": new_analysis.title}
word_diff = new_analysis.word_count - old_word_count
if abs(word_diff) > 100:
changes["word_count_changed"] = {
"old": old_word_count, "new": new_analysis.word_count, "diff": word_diff
}
new_headings_set = set(new_analysis.headings)
old_headings_set = set(old_headings)
added_headings = new_headings_set - old_headings_set
removed_headings = old_headings_set - new_headings_set
if added_headings or removed_headings:
changes["headings_changed"] = {
"added": list(added_headings), "removed": list(removed_headings)
}
new_topics_set = set(new_analysis.main_topics)
old_topics_set = set(old_topics)
if new_topics_set != old_topics_set:
changes["topics_changed"] = {
"added": list(new_topics_set - old_topics_set),
"removed": list(old_topics_set - new_topics_set)
}
# Store the new snapshot
save_content_snapshot(conn, url, new_analysis)
return changes if changes else None
def save_content_snapshot(conn, url: str, analysis: ContentAnalysis):
content_str = json.dumps(analysis.model_dump())
conn.execute("""
INSERT INTO content_snapshots (url, content_hash, title, word_count,
headings, main_topics, scraped_at)
VALUES (?, ?, ?, ?, ?, ?, ?)
""", (url, hashlib.md5(content_str.encode()).hexdigest(),
analysis.title, analysis.word_count,
json.dumps(analysis.headings), json.dumps(analysis.main_topics),
datetime.now().isoformat()))
conn.commit()
Step 5: AI-Powered Strategic Analysis
This is where AI monitoring truly shines. Instead of raw data, you get strategic recommendations.
from openai import OpenAI
openai_client = OpenAI()
def analyze_seo_changes(
ranking_changes: list[RankingChange],
content_changes: dict[str, dict],
your_domain: str = "mantisapi.com"
) -> dict:
"""Use GPT-4o to analyze SEO changes and provide strategic recommendations."""
# Format changes for the LLM
changes_summary = []
for change in ranking_changes:
if change.change_type != "stable":
changes_summary.append(
f"- [{change.keyword}] {change.domain}: "
f"{change.old_position or 'NEW'} β {change.new_position or 'GONE'} "
f"({change.change_type})"
)
for url, changes in content_changes.items():
changes_summary.append(f"- Content change at {url}: {json.dumps(changes)}")
if not changes_summary:
return {"severity": "none", "summary": "No significant changes detected."}
response = openai_client.chat.completions.create(
model="gpt-4o",
messages=[{
"role": "system",
"content": f"""You are an SEO analyst monitoring changes for {your_domain}.
Analyze ranking and content changes, then provide:
1. severity: "critical" (your rankings dropped significantly or competitor made major move),
"important" (notable changes worth investigating), "minor" (small fluctuations), "noise"
2. summary: 2-3 sentence overview of what happened
3. recommendations: List of 3-5 specific, actionable steps to take
4. competitor_moves: What competitors did that you should pay attention to
5. opportunities: Content gaps or ranking opportunities you should exploit
Be specific and tactical. Don't give generic advice."""
}, {
"role": "user",
"content": f"Here are the SEO changes detected:\n\n" + "\n".join(changes_summary)
}],
response_format={"type": "json_object"}
)
return json.loads(response.choices[0].message.content)
Step 6: Alert System
Send alerts to Slack when the AI detects meaningful changes.
def send_slack_alert(webhook_url: str, analysis: dict, keyword: str = None):
"""Send a formatted SEO alert to Slack."""
severity_emoji = {
"critical": "π¨", "important": "β οΈ", "minor": "βΉοΈ", "noise": "π"
}
emoji = severity_emoji.get(analysis.get("severity", "minor"), "π")
blocks = [
{
"type": "header",
"text": {"type": "plain_text", "text": f"{emoji} SEO Alert: {analysis['severity'].upper()}"}
},
{
"type": "section",
"text": {"type": "mrkdwn", "text": f"*Summary:*\n{analysis['summary']}"}
}
]
if analysis.get("recommendations"):
recs = "\n".join(f"β’ {r}" for r in analysis["recommendations"])
blocks.append({
"type": "section",
"text": {"type": "mrkdwn", "text": f"*Recommendations:*\n{recs}"}
})
if analysis.get("opportunities"):
opps = "\n".join(f"β’ {o}" for o in analysis["opportunities"])
blocks.append({
"type": "section",
"text": {"type": "mrkdwn", "text": f"*Opportunities:*\n{opps}"}
})
requests.post(webhook_url, json={"blocks": blocks})
Step 7: Content Gap Analysis
One of the highest-value SEO monitoring tasks: find what competitors rank for that you don't.
def find_content_gaps(conn, your_domain: str, competitor_domains: list[str]) -> list[dict]:
"""Identify keywords where competitors rank but you don't."""
# Get all keywords where competitors appear
cursor = conn.execute("""
SELECT DISTINCT keyword, domain, MIN(position) as best_position
FROM rankings
WHERE scraped_at = (SELECT MAX(scraped_at) FROM rankings)
GROUP BY keyword, domain
""")
keyword_rankings = {}
for keyword, domain, position in cursor:
if keyword not in keyword_rankings:
keyword_rankings[keyword] = {}
keyword_rankings[keyword][domain] = position
gaps = []
for keyword, domains in keyword_rankings.items():
your_position = domains.get(your_domain)
for comp_domain in competitor_domains:
comp_position = domains.get(comp_domain)
if comp_position and comp_position <= 10 and (not your_position or your_position > 20):
gaps.append({
"keyword": keyword,
"competitor": comp_domain,
"competitor_position": comp_position,
"your_position": your_position or "Not ranking",
"opportunity_score": 10 - comp_position # Higher = easier to compete
})
# Sort by opportunity score
gaps.sort(key=lambda g: g["opportunity_score"], reverse=True)
return gaps
Step 8: Run the Full Monitor
Tie everything together into a scheduled monitoring job.
def run_seo_monitor(
keywords: list[str],
competitor_urls: dict[str, list[str]], # keyword -> list of competitor URLs to monitor
your_domain: str = "mantisapi.com",
competitor_domains: list[str] = None,
slack_webhook: str = None
):
"""Run a complete SEO monitoring cycle."""
conn = init_db()
all_ranking_changes = []
all_content_changes = {}
# 1. Scrape SERPs for each keyword
print(f"π Scraping SERPs for {len(keywords)} keywords...")
for keyword in keywords:
serp_data = scrape_serp(keyword)
save_rankings(conn, serp_data)
# Detect ranking changes
changes = detect_ranking_changes(conn, keyword)
significant = [c for c in changes if c.change_type != "stable" and c.positions_changed >= 2]
all_ranking_changes.extend(significant)
if significant:
print(f" π {keyword}: {len(significant)} significant changes")
# 2. Monitor competitor content
print(f"\nπ Monitoring competitor content...")
for keyword, urls in competitor_urls.items():
for url in urls:
try:
analysis = monitor_competitor_page(url)
changes = detect_content_changes(conn, url, analysis)
if changes:
all_content_changes[url] = changes
print(f" π Content change: {url}")
except Exception as e:
print(f" β Error monitoring {url}: {e}")
# 3. AI analysis
if all_ranking_changes or all_content_changes:
print(f"\nπ€ Running AI analysis...")
analysis = analyze_seo_changes(
all_ranking_changes, all_content_changes, your_domain
)
print(f" Severity: {analysis.get('severity', 'unknown')}")
print(f" Summary: {analysis.get('summary', 'N/A')}")
# 4. Send alerts for important+ changes
if slack_webhook and analysis.get("severity") in ("critical", "important"):
send_slack_alert(slack_webhook, analysis)
print(f" π’ Alert sent to Slack")
# Store the alert
conn.execute("""
INSERT INTO alerts (alert_type, severity, summary, recommendation)
VALUES (?, ?, ?, ?)
""", ("seo_change", analysis.get("severity", "minor"),
analysis.get("summary", ""),
json.dumps(analysis.get("recommendations", []))))
conn.commit()
else:
print("β
No significant changes detected.")
# 5. Content gap analysis
if competitor_domains:
print(f"\nπ Running content gap analysis...")
gaps = find_content_gaps(conn, your_domain, competitor_domains)
if gaps:
print(f" Found {len(gaps)} content gaps:")
for gap in gaps[:5]:
print(f" - {gap['keyword']}: {gap['competitor']} at #{gap['competitor_position']}, you: {gap['your_position']}")
conn.close()
print("\nβ
SEO monitoring cycle complete.")
# Example usage
if __name__ == "__main__":
run_seo_monitor(
keywords=[
"web scraping API",
"AI data extraction",
"web scraping for AI agents",
"best web scraping tool 2026",
"automated web scraping python",
],
competitor_urls={
"web scraping API": [
"https://scrapingbee.com",
"https://apify.com",
"https://brightdata.com",
],
},
your_domain="mantisapi.com",
competitor_domains=["scrapingbee.com", "apify.com", "brightdata.com"],
slack_webhook="https://hooks.slack.com/services/YOUR/WEBHOOK/URL"
)
Scheduling: Run Daily with Cron or Lambda
Set up automated daily monitoring:
| Method | Best For | Setup |
|---|---|---|
| Cron (Linux) | VPS/dedicated server | 0 7 * * * python seo_monitor.py |
| AWS Lambda | Serverless, cost-efficient | EventBridge rule, daily trigger |
| GitHub Actions | Free for public repos | Scheduled workflow, daily |
| Docker + systemd | Self-hosted, reliable | Timer unit, daily |
Cost Comparison
Here's what this system costs vs. traditional SEO tools:
| Solution | Monthly Cost | Keyword Limit | AI Analysis |
|---|---|---|---|
| Ahrefs Standard | $199/mo | 1,500 keywords | β |
| SEMrush Pro | $140/mo | 500 keywords | β |
| Moz Pro | $99/mo | 300 keywords | β |
| AI Agent + Mantis | ~$40β80/mo | Unlimited | β |
Breakdown: Mantis Pro plan ($99/mo for 25K API calls, but you'll likely need only Starter at $29/mo for daily monitoring of 50-100 keywords) + OpenAI GPT-4o for analysis (~$10-50/mo depending on volume).
Use Cases
1. SaaS Competitive Intelligence
Track how competitor landing pages evolve. Detect when they add features, change pricing, or update positioning. Get alerts when a competitor starts ranking for your core keywords.
2. Content Marketing Teams
Monitor your content's ranking performance over time. Identify when articles need refreshing. Find content gaps to fill before competitors do.
3. Agency Client Reporting
Automate SEO reporting for clients with AI-generated insights. Instead of raw data dumps, send strategic analysis with clear recommendations.
4. E-commerce Category Monitoring
Track product category rankings. Detect when competitors change product descriptions, pricing pages, or category structures that affect search visibility.
Start Monitoring Your SEO with AI
The Mantis WebPerception API gives your AI agents the ability to scrape, screenshot, and extract data from any webpage. Build your SEO monitor in minutes.
Get Your Free API Key βNext Steps
- Read the Complete Guide to Web Scraping with AI for the full landscape
- See Automate Website Monitoring with AI Agents for general monitoring patterns
- Learn about Structured Data Extraction with AI for cleaner data pipelines
- Check out Price Monitoring with AI for a similar monitoring use case
- Explore the Mantis API documentation to get started