Automate Website Monitoring with AI Agents: Detect Changes That Matter
Traditional website monitoring tools tell you something changed. AI-powered monitoring tells you what changed and whether you should care.
A competitor updates their pricing page? Traditional monitors fire an alert for every CSS tweak, cookie banner update, and footer change. An AI monitor tells you: "Competitor X raised their Pro plan from $79 to $99/month and removed the annual discount."
In this guide, you'll build a Python website monitor that uses AI to understand changes semantically β not just detect them.
Architecture Overview
The AI website monitor has four components:
- Scraper β Fetches current page content via WebPerception API
- Store β Saves snapshots with timestamps (SQLite)
- Analyzer β AI compares current vs previous snapshot and explains the difference
- Alerter β Sends notifications when meaningful changes are detected
ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ
β Scraper βββββΆβ Store βββββΆβ Analyzer βββββΆβ Alerter β
β (Mantis) β β (SQLite) β β (LLM) β β (Slack) β
ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ
Setup
pip install requests openai schedule sqlite-utils
The Complete Monitor
import requests
import hashlib
import json
import sqlite3
from datetime import datetime
from openai import OpenAI
MANTIS_KEY = "sk_live_your_mantis_key"
OPENAI_KEY = "sk-your_openai_key"
MANTIS_API = "https://api.mantisapi.com/v1"
openai = OpenAI(api_key=OPENAI_KEY)
# --- Database ---
def init_db():
conn = sqlite3.connect("monitor.db")
conn.execute("""
CREATE TABLE IF NOT EXISTS snapshots (
id INTEGER PRIMARY KEY,
url TEXT NOT NULL,
content TEXT NOT NULL,
content_hash TEXT NOT NULL,
extracted_data TEXT,
timestamp TEXT NOT NULL
)
""")
conn.execute("""
CREATE TABLE IF NOT EXISTS alerts (
id INTEGER PRIMARY KEY,
url TEXT NOT NULL,
change_summary TEXT NOT NULL,
severity TEXT NOT NULL,
timestamp TEXT NOT NULL
)
""")
conn.commit()
return conn
# --- Scraper ---
def scrape_page(url: str) -> dict:
"""Scrape a page and extract structured data."""
# Get page content
resp = requests.post(f"{MANTIS_API}/scrape", json={
"url": url,
"format": "markdown",
"js_rendering": True
}, headers={"Authorization": f"Bearer {MANTIS_KEY}"})
content = resp.json()["content"]
# Extract key data points
extract_resp = requests.post(f"{MANTIS_API}/extract", json={
"url": url,
"prompt": "Extract all key information: prices, features, "
"plan names, limits, dates, announcements, "
"and any notable claims or statistics",
"schema": {
"key_data": [{"label": "string", "value": "string"}],
"pricing": [{"plan": "string", "price": "string", "features": ["string"]}],
"announcements": ["string"]
}
}, headers={"Authorization": f"Bearer {MANTIS_KEY}"})
extracted = extract_resp.json().get("extracted", {})
return {
"content": content,
"extracted": extracted,
"hash": hashlib.sha256(content.encode()).hexdigest()
}
The AI Analyzer
This is where it gets interesting. Instead of diffing text, we ask an LLM to understand the difference:
def analyze_change(url: str, old_data: dict, new_data: dict) -> dict:
"""Use AI to analyze what changed and whether it matters."""
response = openai.chat.completions.create(
model="gpt-4o",
messages=[{
"role": "system",
"content": """You are a website change analyst. Compare two snapshots
of the same webpage and provide:
1. A concise summary of what changed
2. Severity: "critical", "important", "minor", or "noise"
3. Whether this requires human attention
Critical = pricing changes, product discontinuation, major policy changes
Important = new features, significant content updates, competitive moves
Minor = small text edits, new blog posts, UI tweaks
Noise = cookie banners, timestamps, session-specific content"""
}, {
"role": "user",
"content": f"""URL: {url}
PREVIOUS SNAPSHOT:
{json.dumps(old_data['extracted'], indent=2)}
---
CURRENT SNAPSHOT:
{json.dumps(new_data['extracted'], indent=2)}
---
What changed? How important is it?"""
}],
response_format={"type": "json_object"},
temperature=0.1
)
return json.loads(response.choices[0].message.content)
Alert System
def send_alert(url: str, analysis: dict):
"""Send alert via Slack webhook for important changes."""
severity = analysis.get("severity", "minor")
if severity in ("noise", "minor"):
return # Don't alert on noise
emoji = {"critical": "π¨", "important": "π’"}.get(severity, "π")
color = {"critical": "#ff0000", "important": "#ff9900"}.get(severity, "#00ff88")
# Slack webhook
webhook_url = "https://hooks.slack.com/services/YOUR/WEBHOOK/URL"
requests.post(webhook_url, json={
"attachments": [{
"color": color,
"title": f"{emoji} Website Change Detected",
"fields": [
{"title": "URL", "value": url, "short": True},
{"title": "Severity", "value": severity.upper(), "short": True},
{"title": "Summary", "value": analysis.get("summary", "Unknown change")},
{"title": "Action Required", "value": analysis.get("action_required", "Review the change")}
]
}]
})
Putting It Together
import schedule
import time
def check_url(conn: sqlite3.Connection, url: str):
"""Check a URL for meaningful changes."""
print(f"Checking {url}...")
# Scrape current state
current = scrape_page(url)
# Get previous snapshot
cursor = conn.execute(
"SELECT content, extracted_data, content_hash FROM snapshots "
"WHERE url = ? ORDER BY timestamp DESC LIMIT 1",
(url,)
)
row = cursor.fetchone()
if row is None:
# First time seeing this URL
conn.execute(
"INSERT INTO snapshots (url, content, content_hash, extracted_data, timestamp) "
"VALUES (?, ?, ?, ?, ?)",
(url, current["content"], current["hash"],
json.dumps(current["extracted"]), datetime.utcnow().isoformat())
)
conn.commit()
print(f" First snapshot saved for {url}")
return
old_hash = row[2]
if current["hash"] == old_hash:
print(f" No changes detected")
return
# Content changed β analyze with AI
old_data = {"extracted": json.loads(row[1])} if row[1] else {"extracted": {}}
old_data["extracted"] = json.loads(row[1]) if row[1] else {}
analysis = analyze_change(url, old_data, current)
print(f" Change detected: {analysis.get('severity', '?')} β {analysis.get('summary', '?')}")
# Save new snapshot
conn.execute(
"INSERT INTO snapshots (url, content, content_hash, extracted_data, timestamp) "
"VALUES (?, ?, ?, ?, ?)",
(url, current["content"], current["hash"],
json.dumps(current["extracted"]), datetime.utcnow().isoformat())
)
# Save alert
conn.execute(
"INSERT INTO alerts (url, change_summary, severity, timestamp) "
"VALUES (?, ?, ?, ?)",
(url, analysis.get("summary", ""), analysis.get("severity", "minor"),
datetime.utcnow().isoformat())
)
conn.commit()
# Send notification
send_alert(url, analysis)
# --- Main ---
WATCH_URLS = [
"https://competitor-a.com/pricing",
"https://competitor-b.com/pricing",
"https://news-site.com/industry-news",
"https://regulatory-body.gov/guidelines",
]
conn = init_db()
def run_checks():
for url in WATCH_URLS:
try:
check_url(conn, url)
except Exception as e:
print(f" Error checking {url}: {e}")
# Check every hour
schedule.every(1).hour.do(run_checks)
# Initial run
run_checks()
while True:
schedule.run_pending()
time.sleep(60)
Use Cases
1. Competitor Price Monitoring
Watch competitor pricing pages. Get alerted when they change prices, add/remove plans, or adjust feature limits. AI tells you "Competitor raised Enterprise plan by 20% and added SOC2 compliance" instead of "Page changed."
2. Regulatory Compliance
Monitor government and regulatory websites for policy updates. AI classifies changes by relevance to your business and flags action items.
3. Job Board Tracking
Monitor career pages of target companies. Know when they're hiring for roles that signal strategic direction β new AI team lead means they're investing in AI.
4. News and PR Monitoring
Watch industry news pages and press release sections. AI summarizes new articles and rates their relevance to your business.
5. Supply Chain Alerts
Monitor supplier websites for stock changes, pricing updates, or discontinuation notices before they hit your inbox.
Traditional vs AI Monitoring
| Feature | Hash-Based Monitoring | AI Monitoring |
|---|---|---|
| Detects changes | β Yes | β Yes |
| Understands changes | β No | β Yes |
| Filters noise | β No | β Yes |
| Severity classification | β No | β Yes |
| Natural language summaries | β No | β Yes |
| False positive rate | High | Low |
| Cost per check | ~$0.001 | ~$0.01 |
| Setup complexity | Low | Medium |
Cost Breakdown
For monitoring 20 URLs every hour:
- Scraping: 20 URLs Γ 24 hours Γ 30 days = 14,400 API calls/month
- Extraction: 14,400 calls (one per scrape)
- Total Mantis calls: 28,800/month β Scale plan ($299) or Pro ($99) with selective extraction
- LLM analysis: Only when changes are detected (~5-10% of checks) β minimal cost
Optimization tip: Use hash comparison first (free), then only run AI extraction and analysis when the hash changes. This cuts API usage by 90%+.
Deploying to Production
Run as a systemd service, Docker container, or cron job:
# crontab β check every hour
0 * * * * cd /opt/monitor && python monitor.py --once
# Docker
FROM python:3.12-slim
COPY . /app
WORKDIR /app
RUN pip install -r requirements.txt
CMD ["python", "monitor.py"]
For serverless, deploy the check function to AWS Lambda with EventBridge scheduling β pay only when checks run.
Start Monitoring with AI
100 API calls/month free. Build an intelligent website monitor in minutes.
Get Your API Key β