Web Scraping for E-Commerce: How AI Agents Monitor Products, Prices & Reviews in 2026

Published: March 10, 2026 · 12 min read · By the Mantis Team

E-commerce runs on data. Which products are trending? What are competitors charging? What do customers love — or hate — about similar products? The brands that answer these questions fastest win.

Traditional e-commerce intelligence tools charge $500–$5,000/month for dashboard access and stale data refreshed once a day. With AI agents and a web scraping API, you can build custom intelligence systems that monitor exactly what you need, in real time, for a fraction of the cost.

In this guide, you'll build a complete e-commerce intelligence system that scrapes product listings, tracks price changes, analyzes customer reviews with AI, and alerts you to competitive opportunities — all running autonomously.

Why E-Commerce Teams Need AI-Powered Scraping

E-commerce data challenges are unique:

Product pages change constantly — prices, availability, and descriptions update multiple times per day
Catalogs are massive — competitors may have thousands to millions of SKUs
Reviews contain gold — unstructured customer feedback reveals product gaps, feature requests, and quality issues
Anti-bot protection is aggressive — Amazon, Shopify stores, and marketplaces invest heavily in blocking scrapers

AI agents solve these problems by understanding page context (not just HTML selectors), extracting structured data from any layout, and analyzing unstructured text like reviews at scale.

The E-Commerce Intelligence Stack

Here's what we'll build:

Product Discovery — Find competitor products and catalog pages
AI Data Extraction — Pull structured product data from any e-commerce site
Price Tracking — Monitor price changes and detect patterns
Review Analysis — AI-powered sentiment analysis and insight extraction
Competitive Alerts — Automated notifications for price drops, new products, stockouts
Strategic Reports — LLM-generated competitive intelligence summaries

Step 1: Product Discovery & Catalog Scraping

Start by discovering competitor products. The agent scrapes category pages and extracts product URLs:

import httpx
from pydantic import BaseModel
from typing import Optional

MANTIS_API_KEY = "your-api-key"
MANTIS_BASE = "https://api.mantisapi.com/v1"

class Product(BaseModel):
    name: str
    url: str
    price: float
    currency: str = "USD"
    rating: Optional[float] = None
    review_count: Optional[int] = None
    availability: str = "in_stock"
    image_url: Optional[str] = None
    brand: Optional[str] = None
    sku: Optional[str] = None

class CatalogPage(BaseModel):
    products: list[Product]
    next_page_url: Optional[str] = None
    total_results: Optional[int] = None

async def discover_products(category_url: str) -> list[Product]:
    """Scrape a category page and extract all product listings."""
    all_products = []
    current_url = category_url

    async with httpx.AsyncClient() as client:
        while current_url:
            response = await client.post(
                f"{MANTIS_BASE}/extract",
                headers={"Authorization": f"Bearer {MANTIS_API_KEY}"},
                json={
                    "url": current_url,
                    "schema": CatalogPage.model_json_schema(),
                    "prompt": "Extract all product listings from this e-commerce category page. Include name, URL, price, rating, review count, and availability status."
                }
            )
            page = CatalogPage(**response.json()["data"])
            all_products.extend(page.products)

            # Follow pagination (limit to 10 pages)
            current_url = page.next_page_url
            if len(all_products) > 500:
                break

    return all_products

Step 2: Deep Product Data Extraction

Once you have product URLs, extract detailed data from each product page:

class ProductDetail(BaseModel):
    name: str
    price: float
    original_price: Optional[float] = None  # Before discount
    discount_pct: Optional[float] = None
    currency: str = "USD"
    availability: str  # in_stock, out_of_stock, limited, preorder
    description: str
    features: list[str]
    specifications: dict[str, str]
    rating: Optional[float] = None
    review_count: Optional[int] = None
    brand: str
    sku: Optional[str] = None
    category: str
    images: list[str]
    seller: Optional[str] = None
    shipping_info: Optional[str] = None
    return_policy: Optional[str] = None

async def extract_product_detail(product_url: str) -> ProductDetail:
    """Extract comprehensive product data from a product page."""
    async with httpx.AsyncClient() as client:
        response = await client.post(
            f"{MANTIS_BASE}/extract",
            headers={"Authorization": f"Bearer {MANTIS_API_KEY}"},
            json={
                "url": product_url,
                "schema": ProductDetail.model_json_schema(),
                "prompt": "Extract complete product information including price (current and original if discounted), all features, specifications as key-value pairs, rating, reviews, shipping, and return policy."
            }
        )
        return ProductDetail(**response.json()["data"])

Step 3: Price Tracking & Change Detection

Store prices over time and detect meaningful changes:

import sqlite3
from datetime import datetime, timedelta

def init_price_db(db_path: str = "ecommerce_intel.db"):
    conn = sqlite3.connect(db_path)
    conn.execute("""
        CREATE TABLE IF NOT EXISTS price_history (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            product_url TEXT NOT NULL,
            product_name TEXT,
            price REAL NOT NULL,
            original_price REAL,
            availability TEXT,
            scraped_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
        )
    """)
    conn.execute("""
        CREATE TABLE IF NOT EXISTS price_alerts (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            product_url TEXT,
            product_name TEXT,
            alert_type TEXT,  -- price_drop, price_increase, stockout, restock, new_discount
            old_value TEXT,
            new_value TEXT,
            change_pct REAL,
            created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
        )
    """)
    conn.commit()
    return conn

def track_price(conn, product: ProductDetail, url: str) -> list[dict]:
    """Record price and detect changes. Returns list of alerts."""
    alerts = []

    # Get previous price
    prev = conn.execute(
        "SELECT price, availability FROM price_history WHERE product_url = ? ORDER BY scraped_at DESC LIMIT 1",
        (url,)
    ).fetchone()

    # Record current price
    conn.execute(
        "INSERT INTO price_history (product_url, product_name, price, original_price, availability) VALUES (?, ?, ?, ?, ?)",
        (url, product.name, product.price, product.original_price, product.availability)
    )

    if prev:
        old_price, old_avail = prev

        # Price drop
        if product.price < old_price:
            change_pct = ((old_price - product.price) / old_price) * 100
            if change_pct >= 5:  # Only alert on 5%+ drops
                alert = {
                    "type": "price_drop",
                    "product": product.name,
                    "url": url,
                    "old_price": old_price,
                    "new_price": product.price,
                    "change_pct": round(change_pct, 1)
                }
                alerts.append(alert)

        # Price increase
        elif product.price > old_price:
            change_pct = ((product.price - old_price) / old_price) * 100
            if change_pct >= 10:
                alert = {
                    "type": "price_increase",
                    "product": product.name,
                    "url": url,
                    "old_price": old_price,
                    "new_price": product.price,
                    "change_pct": round(change_pct, 1)
                }
                alerts.append(alert)

        # Stock changes
        if old_avail == "in_stock" and product.availability == "out_of_stock":
            alerts.append({"type": "stockout", "product": product.name, "url": url})
        elif old_avail == "out_of_stock" and product.availability == "in_stock":
            alerts.append({"type": "restock", "product": product.name, "url": url, "price": product.price})

    # Record alerts
    for alert in alerts:
        conn.execute(
            "INSERT INTO price_alerts (product_url, product_name, alert_type, old_value, new_value, change_pct) VALUES (?, ?, ?, ?, ?, ?)",
            (url, alert.get("product"), alert["type"],
             str(alert.get("old_price", "")), str(alert.get("new_price", "")),
             alert.get("change_pct"))
        )

    conn.commit()
    return alerts

Step 4: AI-Powered Review Analysis

This is where AI agents truly shine. Instead of simple star ratings, extract actionable insights from customer reviews:

from openai import OpenAI

class ReviewInsights(BaseModel):
    total_reviews_analyzed: int
    avg_sentiment: float  # -1.0 to 1.0
    top_praised_features: list[str]
    top_complaints: list[str]
    feature_requests: list[str]
    quality_issues: list[str]
    competitor_mentions: list[str]
    purchase_drivers: list[str]  # Why people bought it
    summary: str

openai_client = OpenAI()

async def analyze_reviews(product_url: str) -> ReviewInsights:
    """Scrape reviews and analyze with AI."""

    # Step 1: Scrape review pages
    async with httpx.AsyncClient() as client:
        response = await client.post(
            f"{MANTIS_BASE}/extract",
            headers={"Authorization": f"Bearer {MANTIS_API_KEY}"},
            json={
                "url": product_url,
                "schema": {
                    "type": "object",
                    "properties": {
                        "reviews": {
                            "type": "array",
                            "items": {
                                "type": "object",
                                "properties": {
                                    "rating": {"type": "number"},
                                    "title": {"type": "string"},
                                    "text": {"type": "string"},
                                    "date": {"type": "string"},
                                    "verified": {"type": "boolean"}
                                }
                            }
                        }
                    }
                },
                "prompt": "Extract all customer reviews from this product page including rating, title, full review text, date, and whether the purchase is verified."
            }
        )
        reviews = response.json()["data"]["reviews"]

    # Step 2: AI analysis
    reviews_text = "\n\n".join([
        f"[{r.get('rating', '?')}/5] {r.get('title', '')}: {r.get('text', '')}"
        for r in reviews[:50]  # Analyze up to 50 reviews
    ])

    completion = openai_client.beta.chat.completions.parse(
        model="gpt-4o",
        response_format=ReviewInsights,
        messages=[
            {"role": "system", "content": "You are an e-commerce analyst. Analyze these product reviews and extract actionable intelligence. Focus on patterns, not individual reviews."},
            {"role": "user", "content": f"Analyze these {len(reviews)} reviews:\n\n{reviews_text}"}
        ]
    )

    return completion.choices[0].message.parsed

💡 Why AI review analysis beats keyword counting: Traditional tools count mentions of words like "broken" or "great." AI agents understand context — "the battery life is great for the price but terrible compared to the Pro model" contains both praise AND a competitive insight that keyword tools miss entirely.

Step 5: Competitive Alerts via Slack

Wire up alerts so your team knows instantly when competitors make moves:

import httpx

SLACK_WEBHOOK = "https://hooks.slack.com/services/YOUR/WEBHOOK/URL"

async def send_ecommerce_alert(alerts: list[dict]):
    """Send formatted alerts to Slack."""
    if not alerts:
        return

    emoji_map = {
        "price_drop": "📉",
        "price_increase": "📈",
        "stockout": "🚫",
        "restock": "✅",
        "new_product": "🆕",
    }

    blocks = [{"type": "header", "text": {"type": "plain_text", "text": "🛒 E-Commerce Intelligence Alert"}}]

    for alert in alerts:
        emoji = emoji_map.get(alert["type"], "⚡")

        if alert["type"] == "price_drop":
            text = f"{emoji} *Price Drop* — {alert['product']}\n${alert['old_price']:.2f} → ${alert['new_price']:.2f} (-{alert['change_pct']}%)\n<{alert['url']}|View Product>"
        elif alert["type"] == "stockout":
            text = f"{emoji} *Stockout* — {alert['product']}\nCompetitor product is now out of stock — opportunity to capture demand\n<{alert['url']}|View Product>"
        elif alert["type"] == "restock":
            text = f"{emoji} *Restock* — {alert['product']}\nBack in stock at ${alert['price']:.2f}\n<{alert['url']}|View Product>"
        else:
            text = f"{emoji} *{alert['type'].replace('_', ' ').title()}* — {alert.get('product', 'Unknown')}"

        blocks.append({"type": "section", "text": {"type": "mrkdwn", "text": text}})

    async with httpx.AsyncClient() as client:
        await client.post(SLACK_WEBHOOK, json={"blocks": blocks})

Step 6: AI Strategic Reports

Generate weekly competitive intelligence reports using an LLM to interpret the raw data:

async def generate_weekly_report(conn) -> str:
    """Generate an AI-powered competitive intelligence report."""
    week_ago = (datetime.now() - timedelta(days=7)).isoformat()

    # Gather week's data
    price_changes = conn.execute("""
        SELECT product_name, alert_type, old_value, new_value, change_pct
        FROM price_alerts WHERE created_at > ? ORDER BY created_at DESC
    """, (week_ago,)).fetchall()

    # Get price trends
    trends = conn.execute("""
        SELECT product_name, MIN(price) as low, MAX(price) as high,
               AVG(price) as avg, COUNT(*) as datapoints
        FROM price_history WHERE scraped_at > ?
        GROUP BY product_url
        HAVING COUNT(*) > 1
    """, (week_ago,)).fetchall()

    report_data = f"""
    Price changes this week: {len(price_changes)}
    Changes: {[{"product": r[0], "type": r[1], "from": r[2], "to": r[3], "pct": r[4]} for r in price_changes]}

    Price trends: {[{"product": r[0], "low": r[1], "high": r[2], "avg": r[3], "points": r[4]} for r in trends]}
    """

    completion = openai_client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": """You are a senior e-commerce analyst. Write a concise weekly competitive intelligence report. Include:
            1. Executive summary (2-3 sentences)
            2. Key price movements and what they signal
            3. Stock availability patterns
            4. Strategic recommendations (what should WE do?)
            Keep it actionable. No fluff."""},
            {"role": "user", "content": f"Generate the weekly report from this data:\n{report_data}"}
        ]
    )

    return completion.choices[0].message.content

Putting It All Together: The Daily Pipeline

Here's the complete daily pipeline that ties everything together:

import asyncio

async def daily_ecommerce_pipeline():
    """Run the complete e-commerce intelligence pipeline."""
    conn = init_price_db()

    # 1. Define competitor URLs to monitor
    competitors = [
        {"name": "Competitor A", "category_urls": [
            "https://competitor-a.com/products/category-1",
            "https://competitor-a.com/products/category-2",
        ]},
        {"name": "Competitor B", "category_urls": [
            "https://competitor-b.com/collections/all",
        ]},
    ]

    all_alerts = []

    for competitor in competitors:
        print(f"\n📡 Scanning {competitor['name']}...")

        for cat_url in competitor["category_urls"]:
            # Discover products
            products = await discover_products(cat_url)
            print(f"  Found {len(products)} products in {cat_url}")

            # Track prices for each product
            for product_listing in products[:50]:  # Limit per category
                try:
                    detail = await extract_product_detail(product_listing.url)
                    alerts = track_price(conn, detail, product_listing.url)
                    all_alerts.extend(alerts)
                except Exception as e:
                    print(f"  ⚠️ Error on {product_listing.url}: {e}")

    # 2. Send alerts
    if all_alerts:
        await send_ecommerce_alert(all_alerts)
        print(f"\n🔔 Sent {len(all_alerts)} alerts")

    # 3. Weekly report (on Mondays)
    if datetime.now().weekday() == 0:
        report = await generate_weekly_report(conn)
        print(f"\n📊 Weekly Report:\n{report}")

    conn.close()
    print("\n✅ Pipeline complete")

# Run daily via cron: 0 8 * * * python ecommerce_pipeline.py
if __name__ == "__main__":
    asyncio.run(daily_ecommerce_pipeline())

Cost Comparison: Traditional vs AI Agent Approach

Approach	Monthly Cost	Products Tracked	Update Frequency	Review Analysis
Prisync / Competera	$500–$2,000	500–5,000	Daily	❌ None
Jungle Scout / Helium 10	$50–$200	Amazon only	Daily	Basic keywords
Custom scrapers (DIY)	$200–$500+	Unlimited	Any	Manual
AI Agent + Mantis API	$29–$99	Unlimited	Any	✅ AI-powered

With Mantis's Starter plan ($29/month for 5,000 API calls), you can track ~150 products daily with full price history, review analysis, and AI-powered alerts — capabilities that cost $500+/month with traditional e-commerce intelligence platforms.

Use Cases by E-Commerce Role

1. Brand Owners & D2C

Monitor MAP (Minimum Advertised Price) compliance across authorized resellers. Detect unauthorized sellers listing your products. Track how competitors price similar products in your category.

2. Marketplace Sellers (Amazon, Shopify)

Track competitor pricing in real time for repricing strategies. Monitor review sentiment to identify product improvement opportunities. Detect when competitors go out of stock — perfect time to increase ad spend.

3. Dropshippers & Arbitrage

Scan supplier sites for price drops automatically. Compare prices across multiple marketplaces to find arbitrage opportunities. Monitor supplier stock levels to avoid listing out-of-stock items.

4. Category Managers & Buyers

Track pricing trends across an entire product category. Analyze which features drive positive reviews (and which cause returns). Generate competitive assortment reports for buying decisions.

Build Your E-Commerce Intelligence System

Mantis WebPerception API handles JavaScript rendering, anti-bot protection, and AI data extraction — so you can focus on the intelligence layer.

Start Free — 100 API Calls/Month

Best Practices for E-Commerce Scraping

Respect rate limits — Space requests 2-5 seconds apart. E-commerce sites are aggressive about blocking scrapers.
Track SKUs, not URLs — Product URLs change. Use SKU or product ID as the stable identifier for price history.
Store raw + structured data — Keep the raw scraped HTML alongside extracted data. When extraction schemas change, you can re-process historical data.
Set meaningful thresholds — A $0.50 price change on a $500 product is noise. A $0.50 change on a $5 product is a 10% shift. Use percentage-based alerts.
Monitor availability separately from price — Stockouts are often more actionable than price changes. A competitor stockout is your opportunity.
Schedule strategically — Run price checks during business hours when prices are most likely to change. Run review analysis weekly (reviews don't change fast).

What's Next

This guide gives you a production-ready e-commerce intelligence system. To go deeper:

Web Scraping for Price Monitoring — Advanced price tracking with AI change analysis
Structured Data Extraction with AI — Master AI-powered data extraction techniques
The Complete Guide to Web Scraping with AI — Our comprehensive overview
Automate Website Monitoring with AI — Semantic change detection beyond e-commerce
Web Scraping for Market Research — Broader competitive intelligence systems