Web Scraping for E-Commerce: How AI Agents Monitor Products, Prices & Reviews in 2026
E-commerce runs on data. Which products are trending? What are competitors charging? What do customers love — or hate — about similar products? The brands that answer these questions fastest win.
Traditional e-commerce intelligence tools charge $500–$5,000/month for dashboard access and stale data refreshed once a day. With AI agents and a web scraping API, you can build custom intelligence systems that monitor exactly what you need, in real time, for a fraction of the cost.
In this guide, you'll build a complete e-commerce intelligence system that scrapes product listings, tracks price changes, analyzes customer reviews with AI, and alerts you to competitive opportunities — all running autonomously.
Why E-Commerce Teams Need AI-Powered Scraping
E-commerce data challenges are unique:
- Product pages change constantly — prices, availability, and descriptions update multiple times per day
- Catalogs are massive — competitors may have thousands to millions of SKUs
- Reviews contain gold — unstructured customer feedback reveals product gaps, feature requests, and quality issues
- Anti-bot protection is aggressive — Amazon, Shopify stores, and marketplaces invest heavily in blocking scrapers
AI agents solve these problems by understanding page context (not just HTML selectors), extracting structured data from any layout, and analyzing unstructured text like reviews at scale.
The E-Commerce Intelligence Stack
Here's what we'll build:
- Product Discovery — Find competitor products and catalog pages
- AI Data Extraction — Pull structured product data from any e-commerce site
- Price Tracking — Monitor price changes and detect patterns
- Review Analysis — AI-powered sentiment analysis and insight extraction
- Competitive Alerts — Automated notifications for price drops, new products, stockouts
- Strategic Reports — LLM-generated competitive intelligence summaries
Step 1: Product Discovery & Catalog Scraping
Start by discovering competitor products. The agent scrapes category pages and extracts product URLs:
import httpx
from pydantic import BaseModel
from typing import Optional
MANTIS_API_KEY = "your-api-key"
MANTIS_BASE = "https://api.mantisapi.com/v1"
class Product(BaseModel):
name: str
url: str
price: float
currency: str = "USD"
rating: Optional[float] = None
review_count: Optional[int] = None
availability: str = "in_stock"
image_url: Optional[str] = None
brand: Optional[str] = None
sku: Optional[str] = None
class CatalogPage(BaseModel):
products: list[Product]
next_page_url: Optional[str] = None
total_results: Optional[int] = None
async def discover_products(category_url: str) -> list[Product]:
"""Scrape a category page and extract all product listings."""
all_products = []
current_url = category_url
async with httpx.AsyncClient() as client:
while current_url:
response = await client.post(
f"{MANTIS_BASE}/extract",
headers={"Authorization": f"Bearer {MANTIS_API_KEY}"},
json={
"url": current_url,
"schema": CatalogPage.model_json_schema(),
"prompt": "Extract all product listings from this e-commerce category page. Include name, URL, price, rating, review count, and availability status."
}
)
page = CatalogPage(**response.json()["data"])
all_products.extend(page.products)
# Follow pagination (limit to 10 pages)
current_url = page.next_page_url
if len(all_products) > 500:
break
return all_products
Step 2: Deep Product Data Extraction
Once you have product URLs, extract detailed data from each product page:
class ProductDetail(BaseModel):
name: str
price: float
original_price: Optional[float] = None # Before discount
discount_pct: Optional[float] = None
currency: str = "USD"
availability: str # in_stock, out_of_stock, limited, preorder
description: str
features: list[str]
specifications: dict[str, str]
rating: Optional[float] = None
review_count: Optional[int] = None
brand: str
sku: Optional[str] = None
category: str
images: list[str]
seller: Optional[str] = None
shipping_info: Optional[str] = None
return_policy: Optional[str] = None
async def extract_product_detail(product_url: str) -> ProductDetail:
"""Extract comprehensive product data from a product page."""
async with httpx.AsyncClient() as client:
response = await client.post(
f"{MANTIS_BASE}/extract",
headers={"Authorization": f"Bearer {MANTIS_API_KEY}"},
json={
"url": product_url,
"schema": ProductDetail.model_json_schema(),
"prompt": "Extract complete product information including price (current and original if discounted), all features, specifications as key-value pairs, rating, reviews, shipping, and return policy."
}
)
return ProductDetail(**response.json()["data"])
Step 3: Price Tracking & Change Detection
Store prices over time and detect meaningful changes:
import sqlite3
from datetime import datetime, timedelta
def init_price_db(db_path: str = "ecommerce_intel.db"):
conn = sqlite3.connect(db_path)
conn.execute("""
CREATE TABLE IF NOT EXISTS price_history (
id INTEGER PRIMARY KEY AUTOINCREMENT,
product_url TEXT NOT NULL,
product_name TEXT,
price REAL NOT NULL,
original_price REAL,
availability TEXT,
scraped_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
)
""")
conn.execute("""
CREATE TABLE IF NOT EXISTS price_alerts (
id INTEGER PRIMARY KEY AUTOINCREMENT,
product_url TEXT,
product_name TEXT,
alert_type TEXT, -- price_drop, price_increase, stockout, restock, new_discount
old_value TEXT,
new_value TEXT,
change_pct REAL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
)
""")
conn.commit()
return conn
def track_price(conn, product: ProductDetail, url: str) -> list[dict]:
"""Record price and detect changes. Returns list of alerts."""
alerts = []
# Get previous price
prev = conn.execute(
"SELECT price, availability FROM price_history WHERE product_url = ? ORDER BY scraped_at DESC LIMIT 1",
(url,)
).fetchone()
# Record current price
conn.execute(
"INSERT INTO price_history (product_url, product_name, price, original_price, availability) VALUES (?, ?, ?, ?, ?)",
(url, product.name, product.price, product.original_price, product.availability)
)
if prev:
old_price, old_avail = prev
# Price drop
if product.price < old_price:
change_pct = ((old_price - product.price) / old_price) * 100
if change_pct >= 5: # Only alert on 5%+ drops
alert = {
"type": "price_drop",
"product": product.name,
"url": url,
"old_price": old_price,
"new_price": product.price,
"change_pct": round(change_pct, 1)
}
alerts.append(alert)
# Price increase
elif product.price > old_price:
change_pct = ((product.price - old_price) / old_price) * 100
if change_pct >= 10:
alert = {
"type": "price_increase",
"product": product.name,
"url": url,
"old_price": old_price,
"new_price": product.price,
"change_pct": round(change_pct, 1)
}
alerts.append(alert)
# Stock changes
if old_avail == "in_stock" and product.availability == "out_of_stock":
alerts.append({"type": "stockout", "product": product.name, "url": url})
elif old_avail == "out_of_stock" and product.availability == "in_stock":
alerts.append({"type": "restock", "product": product.name, "url": url, "price": product.price})
# Record alerts
for alert in alerts:
conn.execute(
"INSERT INTO price_alerts (product_url, product_name, alert_type, old_value, new_value, change_pct) VALUES (?, ?, ?, ?, ?, ?)",
(url, alert.get("product"), alert["type"],
str(alert.get("old_price", "")), str(alert.get("new_price", "")),
alert.get("change_pct"))
)
conn.commit()
return alerts
Step 4: AI-Powered Review Analysis
This is where AI agents truly shine. Instead of simple star ratings, extract actionable insights from customer reviews:
from openai import OpenAI
class ReviewInsights(BaseModel):
total_reviews_analyzed: int
avg_sentiment: float # -1.0 to 1.0
top_praised_features: list[str]
top_complaints: list[str]
feature_requests: list[str]
quality_issues: list[str]
competitor_mentions: list[str]
purchase_drivers: list[str] # Why people bought it
summary: str
openai_client = OpenAI()
async def analyze_reviews(product_url: str) -> ReviewInsights:
"""Scrape reviews and analyze with AI."""
# Step 1: Scrape review pages
async with httpx.AsyncClient() as client:
response = await client.post(
f"{MANTIS_BASE}/extract",
headers={"Authorization": f"Bearer {MANTIS_API_KEY}"},
json={
"url": product_url,
"schema": {
"type": "object",
"properties": {
"reviews": {
"type": "array",
"items": {
"type": "object",
"properties": {
"rating": {"type": "number"},
"title": {"type": "string"},
"text": {"type": "string"},
"date": {"type": "string"},
"verified": {"type": "boolean"}
}
}
}
}
},
"prompt": "Extract all customer reviews from this product page including rating, title, full review text, date, and whether the purchase is verified."
}
)
reviews = response.json()["data"]["reviews"]
# Step 2: AI analysis
reviews_text = "\n\n".join([
f"[{r.get('rating', '?')}/5] {r.get('title', '')}: {r.get('text', '')}"
for r in reviews[:50] # Analyze up to 50 reviews
])
completion = openai_client.beta.chat.completions.parse(
model="gpt-4o",
response_format=ReviewInsights,
messages=[
{"role": "system", "content": "You are an e-commerce analyst. Analyze these product reviews and extract actionable intelligence. Focus on patterns, not individual reviews."},
{"role": "user", "content": f"Analyze these {len(reviews)} reviews:\n\n{reviews_text}"}
]
)
return completion.choices[0].message.parsed
Step 5: Competitive Alerts via Slack
Wire up alerts so your team knows instantly when competitors make moves:
import httpx
SLACK_WEBHOOK = "https://hooks.slack.com/services/YOUR/WEBHOOK/URL"
async def send_ecommerce_alert(alerts: list[dict]):
"""Send formatted alerts to Slack."""
if not alerts:
return
emoji_map = {
"price_drop": "📉",
"price_increase": "📈",
"stockout": "🚫",
"restock": "✅",
"new_product": "🆕",
}
blocks = [{"type": "header", "text": {"type": "plain_text", "text": "🛒 E-Commerce Intelligence Alert"}}]
for alert in alerts:
emoji = emoji_map.get(alert["type"], "⚡")
if alert["type"] == "price_drop":
text = f"{emoji} *Price Drop* — {alert['product']}\n${alert['old_price']:.2f} → ${alert['new_price']:.2f} (-{alert['change_pct']}%)\n<{alert['url']}|View Product>"
elif alert["type"] == "stockout":
text = f"{emoji} *Stockout* — {alert['product']}\nCompetitor product is now out of stock — opportunity to capture demand\n<{alert['url']}|View Product>"
elif alert["type"] == "restock":
text = f"{emoji} *Restock* — {alert['product']}\nBack in stock at ${alert['price']:.2f}\n<{alert['url']}|View Product>"
else:
text = f"{emoji} *{alert['type'].replace('_', ' ').title()}* — {alert.get('product', 'Unknown')}"
blocks.append({"type": "section", "text": {"type": "mrkdwn", "text": text}})
async with httpx.AsyncClient() as client:
await client.post(SLACK_WEBHOOK, json={"blocks": blocks})
Step 6: AI Strategic Reports
Generate weekly competitive intelligence reports using an LLM to interpret the raw data:
async def generate_weekly_report(conn) -> str:
"""Generate an AI-powered competitive intelligence report."""
week_ago = (datetime.now() - timedelta(days=7)).isoformat()
# Gather week's data
price_changes = conn.execute("""
SELECT product_name, alert_type, old_value, new_value, change_pct
FROM price_alerts WHERE created_at > ? ORDER BY created_at DESC
""", (week_ago,)).fetchall()
# Get price trends
trends = conn.execute("""
SELECT product_name, MIN(price) as low, MAX(price) as high,
AVG(price) as avg, COUNT(*) as datapoints
FROM price_history WHERE scraped_at > ?
GROUP BY product_url
HAVING COUNT(*) > 1
""", (week_ago,)).fetchall()
report_data = f"""
Price changes this week: {len(price_changes)}
Changes: {[{"product": r[0], "type": r[1], "from": r[2], "to": r[3], "pct": r[4]} for r in price_changes]}
Price trends: {[{"product": r[0], "low": r[1], "high": r[2], "avg": r[3], "points": r[4]} for r in trends]}
"""
completion = openai_client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": """You are a senior e-commerce analyst. Write a concise weekly competitive intelligence report. Include:
1. Executive summary (2-3 sentences)
2. Key price movements and what they signal
3. Stock availability patterns
4. Strategic recommendations (what should WE do?)
Keep it actionable. No fluff."""},
{"role": "user", "content": f"Generate the weekly report from this data:\n{report_data}"}
]
)
return completion.choices[0].message.content
Putting It All Together: The Daily Pipeline
Here's the complete daily pipeline that ties everything together:
import asyncio
async def daily_ecommerce_pipeline():
"""Run the complete e-commerce intelligence pipeline."""
conn = init_price_db()
# 1. Define competitor URLs to monitor
competitors = [
{"name": "Competitor A", "category_urls": [
"https://competitor-a.com/products/category-1",
"https://competitor-a.com/products/category-2",
]},
{"name": "Competitor B", "category_urls": [
"https://competitor-b.com/collections/all",
]},
]
all_alerts = []
for competitor in competitors:
print(f"\n📡 Scanning {competitor['name']}...")
for cat_url in competitor["category_urls"]:
# Discover products
products = await discover_products(cat_url)
print(f" Found {len(products)} products in {cat_url}")
# Track prices for each product
for product_listing in products[:50]: # Limit per category
try:
detail = await extract_product_detail(product_listing.url)
alerts = track_price(conn, detail, product_listing.url)
all_alerts.extend(alerts)
except Exception as e:
print(f" ⚠️ Error on {product_listing.url}: {e}")
# 2. Send alerts
if all_alerts:
await send_ecommerce_alert(all_alerts)
print(f"\n🔔 Sent {len(all_alerts)} alerts")
# 3. Weekly report (on Mondays)
if datetime.now().weekday() == 0:
report = await generate_weekly_report(conn)
print(f"\n📊 Weekly Report:\n{report}")
conn.close()
print("\n✅ Pipeline complete")
# Run daily via cron: 0 8 * * * python ecommerce_pipeline.py
if __name__ == "__main__":
asyncio.run(daily_ecommerce_pipeline())
Cost Comparison: Traditional vs AI Agent Approach
| Approach | Monthly Cost | Products Tracked | Update Frequency | Review Analysis |
|---|---|---|---|---|
| Prisync / Competera | $500–$2,000 | 500–5,000 | Daily | ❌ None |
| Jungle Scout / Helium 10 | $50–$200 | Amazon only | Daily | Basic keywords |
| Custom scrapers (DIY) | $200–$500+ | Unlimited | Any | Manual |
| AI Agent + Mantis API | $29–$99 | Unlimited | Any | ✅ AI-powered |
With Mantis's Starter plan ($29/month for 5,000 API calls), you can track ~150 products daily with full price history, review analysis, and AI-powered alerts — capabilities that cost $500+/month with traditional e-commerce intelligence platforms.
Use Cases by E-Commerce Role
1. Brand Owners & D2C
Monitor MAP (Minimum Advertised Price) compliance across authorized resellers. Detect unauthorized sellers listing your products. Track how competitors price similar products in your category.
2. Marketplace Sellers (Amazon, Shopify)
Track competitor pricing in real time for repricing strategies. Monitor review sentiment to identify product improvement opportunities. Detect when competitors go out of stock — perfect time to increase ad spend.
3. Dropshippers & Arbitrage
Scan supplier sites for price drops automatically. Compare prices across multiple marketplaces to find arbitrage opportunities. Monitor supplier stock levels to avoid listing out-of-stock items.
4. Category Managers & Buyers
Track pricing trends across an entire product category. Analyze which features drive positive reviews (and which cause returns). Generate competitive assortment reports for buying decisions.
Build Your E-Commerce Intelligence System
Mantis WebPerception API handles JavaScript rendering, anti-bot protection, and AI data extraction — so you can focus on the intelligence layer.
Start Free — 100 API Calls/MonthBest Practices for E-Commerce Scraping
- Respect rate limits — Space requests 2-5 seconds apart. E-commerce sites are aggressive about blocking scrapers.
- Track SKUs, not URLs — Product URLs change. Use SKU or product ID as the stable identifier for price history.
- Store raw + structured data — Keep the raw scraped HTML alongside extracted data. When extraction schemas change, you can re-process historical data.
- Set meaningful thresholds — A $0.50 price change on a $500 product is noise. A $0.50 change on a $5 product is a 10% shift. Use percentage-based alerts.
- Monitor availability separately from price — Stockouts are often more actionable than price changes. A competitor stockout is your opportunity.
- Schedule strategically — Run price checks during business hours when prices are most likely to change. Run review analysis weekly (reviews don't change fast).
What's Next
This guide gives you a production-ready e-commerce intelligence system. To go deeper:
- Web Scraping for Price Monitoring — Advanced price tracking with AI change analysis
- Structured Data Extraction with AI — Master AI-powered data extraction techniques
- The Complete Guide to Web Scraping with AI — Our comprehensive overview
- Automate Website Monitoring with AI — Semantic change detection beyond e-commerce
- Web Scraping for Market Research — Broader competitive intelligence systems