Web Scraping for SEO Monitoring: Track Rankings, Competitors & Content Changes with AI

March 10, 2026 · 12 min read SEO AI Agents Monitoring

SEO teams spend hours manually checking rankings, reviewing competitor pages, and hunting for content gaps. Traditional SEO tools like Ahrefs and SEMrush cost $100–500/month and still miss the nuance — they tell you what changed, but not why it matters or what to do about it.

What if an AI agent could monitor your SEO landscape 24/7, detect meaningful changes, analyze their strategic impact, and send you actionable recommendations — all for a fraction of the cost?

In this guide, you'll build exactly that: an AI-powered SEO monitoring system using Python and the Mantis WebPerception API.

What You'll Build

By the end of this tutorial, you'll have a system that:

Tracks keyword rankings — scrapes search results for your target keywords daily
Monitors competitor content — detects when competitors publish, update, or restructure pages
Finds content gaps — uses AI to identify topics your competitors cover that you don't
Analyzes SERP changes — understands featured snippets, People Also Ask, and SERP feature shifts
Sends strategic alerts — LLM-powered analysis tells you what changed and what to do about it

Why AI Changes SEO Monitoring

Traditional SEO tools track metrics — positions, traffic estimates, backlink counts. But they can't understand the content itself. AI-powered monitoring adds a layer that traditional tools miss:

Capability	Traditional SEO Tools	AI-Powered Monitoring
Ranking tracking	✅ Position numbers	✅ Position + context (why it moved)
Content change detection	❌ Diff-based (noisy)	✅ Semantic understanding of changes
Competitor analysis	⚠️ Keyword overlap only	✅ Full content gap + intent analysis
SERP feature tracking	✅ Feature presence	✅ Feature analysis + optimization tips
Actionable recommendations	❌ Raw data only	✅ Prioritized action items
Cost	$100–500/mo	~$40–80/mo

Architecture Overview

The system runs on a simple loop: scrape → store → compare → analyze → alert.

# System architecture
#
# ┌─────────────┐    ┌───────────────┐    ┌──────────┐
# │ SERP Scraper│───▶│ Content       │───▶│ SQLite   │
# │ (Mantis API)│    │ Extractor     │    │ Database │
# └─────────────┘    └───────────────┘    └────┬─────┘
#                                              │
#                    ┌───────────────┐    ┌─────▼─────┐
#                    │ Alert System  │◀───│ AI Change │
#                    │ (Slack/Email) │    │ Analyzer  │
#                    └───────────────┘    └───────────┘

Step 1: Set Up the SERP Scraper

First, we'll build a function that scrapes Google search results for any keyword and extracts structured ranking data using AI.

import requests
import json
from datetime import datetime
from pydantic import BaseModel
from typing import Optional

MANTIS_API_KEY = "your-api-key"
BASE_URL = "https://api.mantisapi.com/v1"

class SERPResult(BaseModel):
    position: int
    title: str
    url: str
    description: str
    domain: str
    is_featured_snippet: bool = False
    serp_feature: Optional[str] = None  # "featured_snippet", "paa", "video", etc.

class SERPData(BaseModel):
    keyword: str
    results: list[SERPResult]
    total_results_estimate: Optional[str] = None
    featured_snippet_present: bool = False
    people_also_ask: list[str] = []
    scraped_at: str

def scrape_serp(keyword: str, num_results: int = 20) -> SERPData:
    """Scrape Google SERP for a keyword and extract structured data."""

    # Use Mantis to scrape the search results page
    search_url = f"https://www.google.com/search?q={keyword.replace(' ', '+')}&num={num_results}"

    response = requests.post(
        f"{BASE_URL}/extract",
        headers={"Authorization": f"Bearer {MANTIS_API_KEY}"},
        json={
            "url": search_url,
            "prompt": f"""Extract all organic search results for the keyword "{keyword}".
For each result, provide:
- position (1-indexed ranking)
- title (the blue link text)
- url (the destination URL)
- description (the snippet text)
- domain (just the domain name)
- is_featured_snippet (true if this is the featured snippet)
- serp_feature (if this result has a special SERP feature like "video", "paa", "image_pack", etc.)

Also extract:
- total_results_estimate (the "About X results" text)
- featured_snippet_present (true/false)
- people_also_ask (list of PAA questions if present)""",
            "schema": SERPData.model_json_schema()
        }
    )

    data = response.json()["data"]
    data["keyword"] = keyword
    data["scraped_at"] = datetime.now().isoformat()
    return SERPData(**data)

Step 2: Store Rankings in SQLite

We need historical data to detect changes. SQLite keeps it simple and portable.

import sqlite3

def init_db(db_path: str = "seo_monitor.db"):
    conn = sqlite3.connect(db_path)
    conn.execute("""
        CREATE TABLE IF NOT EXISTS rankings (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            keyword TEXT NOT NULL,
            position INTEGER NOT NULL,
            title TEXT,
            url TEXT NOT NULL,
            domain TEXT,
            description TEXT,
            is_featured_snippet BOOLEAN DEFAULT FALSE,
            serp_feature TEXT,
            scraped_at TEXT NOT NULL,
            created_at TEXT DEFAULT CURRENT_TIMESTAMP
        )
    """)
    conn.execute("""
        CREATE TABLE IF NOT EXISTS content_snapshots (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            url TEXT NOT NULL,
            content_hash TEXT,
            title TEXT,
            word_count INTEGER,
            headings TEXT,  -- JSON array of h1/h2/h3
            main_topics TEXT,  -- AI-extracted topics
            scraped_at TEXT NOT NULL
        )
    """)
    conn.execute("""
        CREATE TABLE IF NOT EXISTS alerts (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            alert_type TEXT NOT NULL,
            severity TEXT NOT NULL,
            keyword TEXT,
            url TEXT,
            summary TEXT NOT NULL,
            recommendation TEXT,
            created_at TEXT DEFAULT CURRENT_TIMESTAMP,
            acknowledged BOOLEAN DEFAULT FALSE
        )
    """)
    conn.commit()
    return conn

def save_rankings(conn, serp_data: SERPData):
    for result in serp_data.results:
        conn.execute("""
            INSERT INTO rankings (keyword, position, title, url, domain, 
                                  description, is_featured_snippet, serp_feature, scraped_at)
            VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)
        """, (serp_data.keyword, result.position, result.title, result.url,
              result.domain, result.description, result.is_featured_snippet,
              result.serp_feature, serp_data.scraped_at))
    conn.commit()

Step 3: Detect Ranking Changes

Compare today's rankings against yesterday's to find meaningful movements.

from dataclasses import dataclass

@dataclass
class RankingChange:
    keyword: str
    url: str
    domain: str
    old_position: int | None
    new_position: int | None
    change_type: str  # "improved", "dropped", "new_entry", "fell_off", "stable"
    positions_changed: int

def detect_ranking_changes(conn, keyword: str) -> list[RankingChange]:
    """Compare latest two scrapes for a keyword to detect rank changes."""

    # Get the two most recent scrape timestamps
    cursor = conn.execute("""
        SELECT DISTINCT scraped_at FROM rankings 
        WHERE keyword = ? ORDER BY scraped_at DESC LIMIT 2
    """, (keyword,))
    timestamps = [row[0] for row in cursor.fetchall()]

    if len(timestamps) < 2:
        return []  # Need at least two snapshots

    latest, previous = timestamps[0], timestamps[1]

    # Get rankings for both snapshots
    def get_rankings(ts):
        cursor = conn.execute("""
            SELECT url, domain, position FROM rankings 
            WHERE keyword = ? AND scraped_at = ? ORDER BY position
        """, (keyword, ts))
        return {row[0]: {"domain": row[1], "position": row[2]} for row in cursor}

    current = get_rankings(latest)
    old = get_rankings(previous)

    changes = []
    all_urls = set(list(current.keys()) + list(old.keys()))

    for url in all_urls:
        in_current = url in current
        in_old = url in old

        if in_current and in_old:
            old_pos = old[url]["position"]
            new_pos = current[url]["position"]
            diff = old_pos - new_pos  # positive = improved
            change_type = "improved" if diff > 0 else "dropped" if diff < 0 else "stable"
            changes.append(RankingChange(
                keyword=keyword, url=url, domain=current[url]["domain"],
                old_position=old_pos, new_position=new_pos,
                change_type=change_type, positions_changed=abs(diff)
            ))
        elif in_current and not in_old:
            changes.append(RankingChange(
                keyword=keyword, url=url, domain=current[url]["domain"],
                old_position=None, new_position=current[url]["position"],
                change_type="new_entry", positions_changed=0
            ))
        elif not in_current and in_old:
            changes.append(RankingChange(
                keyword=keyword, url=url, domain=old[url]["domain"],
                old_position=old[url]["position"], new_position=None,
                change_type="fell_off", positions_changed=0
            ))

    # Sort by significance: drops and new entries first
    changes.sort(key=lambda c: (
        0 if c.change_type in ("dropped", "fell_off") else 1,
        -c.positions_changed
    ))

    return changes

Step 4: Monitor Competitor Content Changes

Rankings shift because content changes. Monitor your top competitors' pages for updates.

import hashlib

class ContentAnalysis(BaseModel):
    title: str
    word_count: int
    headings: list[str]
    main_topics: list[str]
    key_points: list[str]
    content_type: str  # "guide", "comparison", "tutorial", "listicle", etc.
    content_quality_signals: list[str]  # "has_code_examples", "has_images", "has_video", etc.

def monitor_competitor_page(url: str) -> ContentAnalysis:
    """Scrape a competitor page and extract structured content analysis."""

    response = requests.post(
        f"{BASE_URL}/extract",
        headers={"Authorization": f"Bearer {MANTIS_API_KEY}"},
        json={
            "url": url,
            "prompt": """Analyze this page for SEO monitoring. Extract:
- title: the page title / H1
- word_count: approximate word count of main content
- headings: all H1, H2, H3 headings as a flat list
- main_topics: the 5-8 key topics/themes covered
- key_points: the 3-5 most important claims or arguments
- content_type: classify as guide/comparison/tutorial/listicle/news/review
- content_quality_signals: list signals like has_code_examples, has_images, 
  has_video, has_table, has_schema_markup, has_author_bio, recently_updated""",
            "schema": ContentAnalysis.model_json_schema()
        }
    )

    return ContentAnalysis(**response.json()["data"])

def detect_content_changes(conn, url: str, new_analysis: ContentAnalysis) -> dict | None:
    """Compare new content analysis against stored snapshot."""

    cursor = conn.execute("""
        SELECT title, word_count, headings, main_topics FROM content_snapshots
        WHERE url = ? ORDER BY scraped_at DESC LIMIT 1
    """, (url,))
    row = cursor.fetchone()

    if not row:
        # First snapshot — store and return
        save_content_snapshot(conn, url, new_analysis)
        return None

    old_title, old_word_count, old_headings_json, old_topics_json = row
    old_headings = json.loads(old_headings_json) if old_headings_json else []
    old_topics = json.loads(old_topics_json) if old_topics_json else []

    changes = {}

    if new_analysis.title != old_title:
        changes["title_changed"] = {"old": old_title, "new": new_analysis.title}

    word_diff = new_analysis.word_count - old_word_count
    if abs(word_diff) > 100:
        changes["word_count_changed"] = {
            "old": old_word_count, "new": new_analysis.word_count, "diff": word_diff
        }

    new_headings_set = set(new_analysis.headings)
    old_headings_set = set(old_headings)
    added_headings = new_headings_set - old_headings_set
    removed_headings = old_headings_set - new_headings_set
    if added_headings or removed_headings:
        changes["headings_changed"] = {
            "added": list(added_headings), "removed": list(removed_headings)
        }

    new_topics_set = set(new_analysis.main_topics)
    old_topics_set = set(old_topics)
    if new_topics_set != old_topics_set:
        changes["topics_changed"] = {
            "added": list(new_topics_set - old_topics_set),
            "removed": list(old_topics_set - new_topics_set)
        }

    # Store the new snapshot
    save_content_snapshot(conn, url, new_analysis)

    return changes if changes else None

def save_content_snapshot(conn, url: str, analysis: ContentAnalysis):
    content_str = json.dumps(analysis.model_dump())
    conn.execute("""
        INSERT INTO content_snapshots (url, content_hash, title, word_count, 
                                        headings, main_topics, scraped_at)
        VALUES (?, ?, ?, ?, ?, ?, ?)
    """, (url, hashlib.md5(content_str.encode()).hexdigest(),
          analysis.title, analysis.word_count,
          json.dumps(analysis.headings), json.dumps(analysis.main_topics),
          datetime.now().isoformat()))
    conn.commit()

Step 5: AI-Powered Strategic Analysis

This is where AI monitoring truly shines. Instead of raw data, you get strategic recommendations.

from openai import OpenAI

openai_client = OpenAI()

def analyze_seo_changes(
    ranking_changes: list[RankingChange],
    content_changes: dict[str, dict],
    your_domain: str = "mantisapi.com"
) -> dict:
    """Use GPT-4o to analyze SEO changes and provide strategic recommendations."""

    # Format changes for the LLM
    changes_summary = []

    for change in ranking_changes:
        if change.change_type != "stable":
            changes_summary.append(
                f"- [{change.keyword}] {change.domain}: "
                f"{change.old_position or 'NEW'} → {change.new_position or 'GONE'} "
                f"({change.change_type})"
            )

    for url, changes in content_changes.items():
        changes_summary.append(f"- Content change at {url}: {json.dumps(changes)}")

    if not changes_summary:
        return {"severity": "none", "summary": "No significant changes detected."}

    response = openai_client.chat.completions.create(
        model="gpt-4o",
        messages=[{
            "role": "system",
            "content": f"""You are an SEO analyst monitoring changes for {your_domain}.
Analyze ranking and content changes, then provide:
1. severity: "critical" (your rankings dropped significantly or competitor made major move), 
   "important" (notable changes worth investigating), "minor" (small fluctuations), "noise"
2. summary: 2-3 sentence overview of what happened
3. recommendations: List of 3-5 specific, actionable steps to take
4. competitor_moves: What competitors did that you should pay attention to
5. opportunities: Content gaps or ranking opportunities you should exploit

Be specific and tactical. Don't give generic advice."""
        }, {
            "role": "user",
            "content": f"Here are the SEO changes detected:\n\n" + "\n".join(changes_summary)
        }],
        response_format={"type": "json_object"}
    )

    return json.loads(response.choices[0].message.content)

Step 6: Alert System

Send alerts to Slack when the AI detects meaningful changes.

def send_slack_alert(webhook_url: str, analysis: dict, keyword: str = None):
    """Send a formatted SEO alert to Slack."""

    severity_emoji = {
        "critical": "🚨", "important": "⚠️", "minor": "ℹ️", "noise": "🔇"
    }
    emoji = severity_emoji.get(analysis.get("severity", "minor"), "📊")

    blocks = [
        {
            "type": "header",
            "text": {"type": "plain_text", "text": f"{emoji} SEO Alert: {analysis['severity'].upper()}"}
        },
        {
            "type": "section",
            "text": {"type": "mrkdwn", "text": f"*Summary:*\n{analysis['summary']}"}
        }
    ]

    if analysis.get("recommendations"):
        recs = "\n".join(f"• {r}" for r in analysis["recommendations"])
        blocks.append({
            "type": "section",
            "text": {"type": "mrkdwn", "text": f"*Recommendations:*\n{recs}"}
        })

    if analysis.get("opportunities"):
        opps = "\n".join(f"• {o}" for o in analysis["opportunities"])
        blocks.append({
            "type": "section",
            "text": {"type": "mrkdwn", "text": f"*Opportunities:*\n{opps}"}
        })

    requests.post(webhook_url, json={"blocks": blocks})

Step 7: Content Gap Analysis

One of the highest-value SEO monitoring tasks: find what competitors rank for that you don't.

def find_content_gaps(conn, your_domain: str, competitor_domains: list[str]) -> list[dict]:
    """Identify keywords where competitors rank but you don't."""

    # Get all keywords where competitors appear
    cursor = conn.execute("""
        SELECT DISTINCT keyword, domain, MIN(position) as best_position
        FROM rankings
        WHERE scraped_at = (SELECT MAX(scraped_at) FROM rankings)
        GROUP BY keyword, domain
    """)

    keyword_rankings = {}
    for keyword, domain, position in cursor:
        if keyword not in keyword_rankings:
            keyword_rankings[keyword] = {}
        keyword_rankings[keyword][domain] = position

    gaps = []
    for keyword, domains in keyword_rankings.items():
        your_position = domains.get(your_domain)
        for comp_domain in competitor_domains:
            comp_position = domains.get(comp_domain)
            if comp_position and comp_position <= 10 and (not your_position or your_position > 20):
                gaps.append({
                    "keyword": keyword,
                    "competitor": comp_domain,
                    "competitor_position": comp_position,
                    "your_position": your_position or "Not ranking",
                    "opportunity_score": 10 - comp_position  # Higher = easier to compete
                })

    # Sort by opportunity score
    gaps.sort(key=lambda g: g["opportunity_score"], reverse=True)
    return gaps

Step 8: Run the Full Monitor

Tie everything together into a scheduled monitoring job.

def run_seo_monitor(
    keywords: list[str],
    competitor_urls: dict[str, list[str]],  # keyword -> list of competitor URLs to monitor
    your_domain: str = "mantisapi.com",
    competitor_domains: list[str] = None,
    slack_webhook: str = None
):
    """Run a complete SEO monitoring cycle."""

    conn = init_db()
    all_ranking_changes = []
    all_content_changes = {}

    # 1. Scrape SERPs for each keyword
    print(f"📊 Scraping SERPs for {len(keywords)} keywords...")
    for keyword in keywords:
        serp_data = scrape_serp(keyword)
        save_rankings(conn, serp_data)

        # Detect ranking changes
        changes = detect_ranking_changes(conn, keyword)
        significant = [c for c in changes if c.change_type != "stable" and c.positions_changed >= 2]
        all_ranking_changes.extend(significant)

        if significant:
            print(f"  🔄 {keyword}: {len(significant)} significant changes")

    # 2. Monitor competitor content
    print(f"\n📝 Monitoring competitor content...")
    for keyword, urls in competitor_urls.items():
        for url in urls:
            try:
                analysis = monitor_competitor_page(url)
                changes = detect_content_changes(conn, url, analysis)
                if changes:
                    all_content_changes[url] = changes
                    print(f"  📝 Content change: {url}")
            except Exception as e:
                print(f"  ❌ Error monitoring {url}: {e}")

    # 3. AI analysis
    if all_ranking_changes or all_content_changes:
        print(f"\n🤖 Running AI analysis...")
        analysis = analyze_seo_changes(
            all_ranking_changes, all_content_changes, your_domain
        )
        print(f"  Severity: {analysis.get('severity', 'unknown')}")
        print(f"  Summary: {analysis.get('summary', 'N/A')}")

        # 4. Send alerts for important+ changes
        if slack_webhook and analysis.get("severity") in ("critical", "important"):
            send_slack_alert(slack_webhook, analysis)
            print(f"  📢 Alert sent to Slack")

        # Store the alert
        conn.execute("""
            INSERT INTO alerts (alert_type, severity, summary, recommendation)
            VALUES (?, ?, ?, ?)
        """, ("seo_change", analysis.get("severity", "minor"),
              analysis.get("summary", ""), 
              json.dumps(analysis.get("recommendations", []))))
        conn.commit()
    else:
        print("✅ No significant changes detected.")

    # 5. Content gap analysis
    if competitor_domains:
        print(f"\n🔍 Running content gap analysis...")
        gaps = find_content_gaps(conn, your_domain, competitor_domains)
        if gaps:
            print(f"  Found {len(gaps)} content gaps:")
            for gap in gaps[:5]:
                print(f"    - {gap['keyword']}: {gap['competitor']} at #{gap['competitor_position']}, you: {gap['your_position']}")

    conn.close()
    print("\n✅ SEO monitoring cycle complete.")

# Example usage
if __name__ == "__main__":
    run_seo_monitor(
        keywords=[
            "web scraping API",
            "AI data extraction",
            "web scraping for AI agents",
            "best web scraping tool 2026",
            "automated web scraping python",
        ],
        competitor_urls={
            "web scraping API": [
                "https://scrapingbee.com",
                "https://apify.com",
                "https://brightdata.com",
            ],
        },
        your_domain="mantisapi.com",
        competitor_domains=["scrapingbee.com", "apify.com", "brightdata.com"],
        slack_webhook="https://hooks.slack.com/services/YOUR/WEBHOOK/URL"
    )

Scheduling: Run Daily with Cron or Lambda

Set up automated daily monitoring:

Method	Best For	Setup
Cron (Linux)	VPS/dedicated server	`0 7 * * * python seo_monitor.py`
AWS Lambda	Serverless, cost-efficient	EventBridge rule, daily trigger
GitHub Actions	Free for public repos	Scheduled workflow, daily
Docker + systemd	Self-hosted, reliable	Timer unit, daily

Cost Comparison

Here's what this system costs vs. traditional SEO tools:

Solution	Monthly Cost	Keyword Limit	AI Analysis
Ahrefs Standard	$199/mo	1,500 keywords	❌
SEMrush Pro	$140/mo	500 keywords	❌
Moz Pro	$99/mo	300 keywords	❌
AI Agent + Mantis	~$40–80/mo	Unlimited	✅

Breakdown: Mantis Pro plan ($99/mo for 25K API calls, but you'll likely need only Starter at $29/mo for daily monitoring of 50-100 keywords) + OpenAI GPT-4o for analysis (~$10-50/mo depending on volume).

Use Cases

1. SaaS Competitive Intelligence

Track how competitor landing pages evolve. Detect when they add features, change pricing, or update positioning. Get alerts when a competitor starts ranking for your core keywords.

2. Content Marketing Teams

Monitor your content's ranking performance over time. Identify when articles need refreshing. Find content gaps to fill before competitors do.

3. Agency Client Reporting

Automate SEO reporting for clients with AI-generated insights. Instead of raw data dumps, send strategic analysis with clear recommendations.

4. E-commerce Category Monitoring

Track product category rankings. Detect when competitors change product descriptions, pricing pages, or category structures that affect search visibility.

Start Monitoring Your SEO with AI

The Mantis WebPerception API gives your AI agents the ability to scrape, screenshot, and extract data from any webpage. Build your SEO monitor in minutes.

Get Your Free API Key →

Next Steps

Read the Complete Guide to Web Scraping with AI for the full landscape
See Automate Website Monitoring with AI Agents for general monitoring patterns
Learn about Structured Data Extraction with AI for cleaner data pipelines
Check out Price Monitoring with AI for a similar monitoring use case
Explore the Mantis API documentation to get started