Web Scraping for Retail & CPG: How AI Agents Track Pricing, Inventory, Reviews & Shelf Data in 2026

Published: March 12, 2026 Β· 16 min read Β· By the Mantis Team

The global retail industry generates over $28 trillion annually, with e-commerce alone exceeding $6.3 trillion in 2025. For retailers and CPG (consumer packaged goods) brands, competitive intelligence isn't optional β€” it's survival. Who changed their price on Amazon this morning? Is your product in stock at Walmart.com? What are customers saying in Target reviews? How does your digital shelf compare to the competition?

Traditional retail intelligence platforms like Profitero, Salsify, and Nielsen charge $3,000–$50,000+ per month for digital shelf analytics. Yet the data they sell is publicly available on retailer websites. AI agents powered by web scraping APIs can build equivalent intelligence systems at a fraction of the cost.

In this guide, you'll build a complete retail and CPG intelligence system using Python, the Mantis WebPerception API, and GPT-4o β€” covering competitor pricing, inventory monitoring, review analysis, and digital shelf optimization.

Why Retail & CPG Teams Need Web Scraping

Retail moves fast. Prices change hourly. Products go in and out of stock. New competitors launch daily. The brands that win are the ones with the best real-time intelligence:

Build Retail Intelligence Agents with Mantis

Scrape pricing, reviews, inventory, and digital shelf data from any retailer with one API call. AI-powered extraction handles every product page format.

Get Free API Key β†’

Architecture: The 6-Step Retail Intelligence Pipeline

  1. Competitor price scraping β€” Monitor pricing across Amazon, Walmart, Target, and DTC competitors at scale
  2. Inventory & availability tracking β€” Detect out-of-stock events, low inventory signals, and fulfillment changes
  3. Review & sentiment monitoring β€” Track new reviews, rating trends, and customer sentiment across platforms
  4. Digital shelf analytics β€” Monitor search rankings, Buy Box ownership, content quality, and share of shelf
  5. GPT-4o competitive analysis β€” Generate pricing recommendations, identify threats, and predict competitor moves
  6. Alert delivery β€” Route price drops, OOS events, and negative reviews to the right team via Slack or email

Step 1: Define Your Retail Data Models

from pydantic import BaseModel
from typing import Optional, List
from datetime import datetime
from enum import Enum

class RetailerName(str, Enum):
    AMAZON = "amazon"
    WALMART = "walmart"
    TARGET = "target"
    BESTBUY = "bestbuy"
    COSTCO = "costco"
    KROGER = "kroger"
    OTHER = "other"

class ProductPrice(BaseModel):
    """Competitor product pricing data."""
    sku: str
    product_name: str
    brand: str
    retailer: RetailerName
    current_price: float
    original_price: Optional[float]  # MSRP or list price
    sale_price: Optional[float]
    price_per_unit: Optional[str]  # e.g., "$0.15/oz"
    currency: str = "USD"
    in_stock: bool
    fulfillment: Optional[str]  # "shipped by amazon", "store pickup", "3P seller"
    seller_name: Optional[str]
    buy_box_winner: Optional[bool]
    coupon: Optional[str]  # e.g., "Save 20% with coupon"
    scraped_at: datetime
    source_url: str
    price_change_pct: Optional[float]  # vs. last scrape

class ProductReview(BaseModel):
    """Customer review from a retailer."""
    product_sku: str
    retailer: RetailerName
    rating: float  # 1-5
    title: str
    body: str
    author: str
    date: datetime
    verified_purchase: bool
    helpful_votes: int
    sentiment: Optional[str]  # positive, negative, neutral, mixed
    key_themes: Optional[List[str]]  # AI-extracted: "quality", "shipping", "value"
    source_url: str

class DigitalShelfPosition(BaseModel):
    """Product ranking and visibility on retailer search."""
    product_sku: str
    retailer: RetailerName
    search_term: str
    organic_rank: Optional[int]
    sponsored_rank: Optional[int]
    page_number: int
    total_results: int
    buy_box_owned: Optional[bool]
    content_score: Optional[float]  # image count, description length, A+ content
    review_count: int
    average_rating: float
    scraped_at: datetime

class MAPViolation(BaseModel):
    """Minimum Advertised Price violation detected."""
    product_sku: str
    product_name: str
    brand: str
    map_price: float
    advertised_price: float
    violation_amount: float
    seller_name: str
    retailer: RetailerName
    source_url: str
    detected_at: datetime

Step 2: Scrape Competitor Pricing at Scale

from mantis import MantisClient
import asyncio

mantis = MantisClient(api_key="your-mantis-api-key")

async def scrape_product_prices(
    product_urls: List[dict],  # [{"sku": "ABC", "retailer": "amazon", "url": "..."}]
    previous_prices: dict = None  # sku -> last_price for change detection
) -> List[ProductPrice]:
    """
    Scrape current pricing from multiple retailer product pages.
    Mantis handles JS rendering and anti-bot for Amazon, Walmart, etc.
    """
    prices = []
    
    for product in product_urls:
        result = await mantis.scrape(
            url=product["url"],
            extract={
                "product_name": "string",
                "brand": "string",
                "current_price": "number",
                "original_price": "number or null",
                "sale_price": "number or null",
                "price_per_unit": "string or null",
                "in_stock": "boolean",
                "fulfillment": "string",
                "seller_name": "string or null",
                "buy_box_winner": "boolean or null",
                "coupon_text": "string or null"
            }
        )
        
        # Calculate price change vs. previous scrape
        prev_price = (previous_prices or {}).get(product["sku"])
        change_pct = None
        if prev_price and result.get("current_price"):
            change_pct = ((result["current_price"] - prev_price) / prev_price) * 100
        
        price = ProductPrice(
            sku=product["sku"],
            product_name=result.get("product_name", ""),
            brand=result.get("brand", ""),
            retailer=product["retailer"],
            current_price=result.get("current_price", 0),
            original_price=result.get("original_price"),
            sale_price=result.get("sale_price"),
            price_per_unit=result.get("price_per_unit"),
            in_stock=result.get("in_stock", False),
            fulfillment=result.get("fulfillment"),
            seller_name=result.get("seller_name"),
            buy_box_winner=result.get("buy_box_winner"),
            coupon=result.get("coupon_text"),
            scraped_at=datetime.now(),
            source_url=product["url"],
            price_change_pct=change_pct
        )
        prices.append(price)
    
    return prices

# Monitor your category across major retailers
competitor_products = [
    {"sku": "COMP-001", "retailer": "amazon", "url": "https://amazon.com/dp/B0EXAMPLE1"},
    {"sku": "COMP-001", "retailer": "walmart", "url": "https://walmart.com/ip/example1"},
    {"sku": "COMP-002", "retailer": "amazon", "url": "https://amazon.com/dp/B0EXAMPLE2"},
    {"sku": "OWN-001", "retailer": "amazon", "url": "https://amazon.com/dp/B0OWNPROD1"},
    # ... hundreds of SKUs
]

prices = await scrape_product_prices(competitor_products)

Detecting Price Drops and Promotions

async def detect_price_anomalies(
    current_prices: List[ProductPrice],
    historical_db: str = "retail_intelligence.db"
) -> dict:
    """
    Compare current prices against historical data to detect
    significant changes, promotions, and pricing strategies.
    """
    import sqlite3
    
    conn = sqlite3.connect(historical_db)
    alerts = {"price_drops": [], "price_increases": [], "new_promotions": [], "oos_events": []}
    
    for price in current_prices:
        # Get 30-day average for this SKU + retailer
        avg = conn.execute("""
            SELECT AVG(current_price), MIN(current_price), MAX(current_price)
            FROM price_history
            WHERE sku = ? AND retailer = ?
            AND scraped_at > datetime('now', '-30 days')
        """, (price.sku, price.retailer)).fetchone()
        
        if avg and avg[0]:
            avg_price, min_price, max_price = avg
            
            # Significant price drop (>10% below 30-day average)
            if price.current_price < avg_price * 0.9:
                alerts["price_drops"].append({
                    "sku": price.sku,
                    "product": price.product_name,
                    "retailer": price.retailer,
                    "current": price.current_price,
                    "avg_30d": round(avg_price, 2),
                    "drop_pct": round(((avg_price - price.current_price) / avg_price) * 100, 1),
                    "url": price.source_url
                })
            
            # Price increase (>5% above average)
            if price.current_price > avg_price * 1.05:
                alerts["price_increases"].append({
                    "sku": price.sku,
                    "product": price.product_name,
                    "increase_pct": round(((price.current_price - avg_price) / avg_price) * 100, 1)
                })
        
        # Out-of-stock detection
        if not price.in_stock:
            alerts["oos_events"].append({
                "sku": price.sku,
                "product": price.product_name,
                "retailer": price.retailer,
                "url": price.source_url
            })
        
        # Store current price
        conn.execute(
            "INSERT INTO price_history (sku, retailer, current_price, in_stock, scraped_at) VALUES (?, ?, ?, ?, ?)",
            (price.sku, price.retailer, price.current_price, price.in_stock, price.scraped_at.isoformat())
        )
    
    conn.commit()
    conn.close()
    return alerts

Step 3: Monitor Reviews & Customer Sentiment

async def scrape_product_reviews(
    product_urls: List[dict],
    max_reviews_per_product: int = 50,
    sort_by: str = "recent"
) -> List[ProductReview]:
    """
    Scrape customer reviews from retailer product pages.
    Focus on recent reviews to catch emerging issues.
    """
    reviews = []
    
    for product in product_urls:
        result = await mantis.scrape(
            url=product["url"],
            extract={
                "reviews": [{
                    "rating": "number (1-5)",
                    "title": "string",
                    "body": "string",
                    "author": "string",
                    "date": "string",
                    "verified_purchase": "boolean",
                    "helpful_votes": "number"
                }],
                "overall_rating": "number",
                "total_review_count": "number",
                "rating_distribution": {
                    "5_star_pct": "number",
                    "4_star_pct": "number",
                    "3_star_pct": "number",
                    "2_star_pct": "number",
                    "1_star_pct": "number"
                }
            }
        )
        
        for review in result.get("reviews", [])[:max_reviews_per_product]:
            r = ProductReview(
                product_sku=product["sku"],
                retailer=product["retailer"],
                rating=review.get("rating", 0),
                title=review.get("title", ""),
                body=review.get("body", ""),
                author=review.get("author", ""),
                date=review.get("date", ""),
                verified_purchase=review.get("verified_purchase", False),
                helpful_votes=review.get("helpful_votes", 0),
                source_url=product["url"]
            )
            reviews.append(r)
    
    return reviews

async def analyze_review_sentiment(reviews: List[ProductReview]) -> dict:
    """Use GPT-4o to analyze review sentiment and extract themes."""
    from openai import OpenAI
    client = OpenAI()
    
    review_texts = [
        f"Rating: {r.rating}/5 | {r.title}: {r.body[:300]}"
        for r in reviews[:50]
    ]
    
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{
            "role": "system",
            "content": """Analyze these product reviews and provide:
            1. OVERALL SENTIMENT: positive/negative/mixed with confidence %
            2. TOP POSITIVE THEMES: What customers love (with frequency)
            3. TOP NEGATIVE THEMES: What customers complain about (with frequency)
            4. TRENDING ISSUES: New problems appearing in recent reviews
            5. COMPETITIVE INSIGHTS: What customers compare this product to
            6. IMPROVEMENT PRIORITIES: Top 3 product/listing improvements to make
            
            Be data-driven. Count occurrences. Prioritize actionable insights."""
        }, {
            "role": "user",
            "content": f"Reviews to analyze:\n\n" + "\n---\n".join(review_texts)
        }],
        temperature=0.2
    )
    
    return {
        "analysis": response.choices[0].message.content,
        "reviews_analyzed": len(reviews),
        "avg_rating": sum(r.rating for r in reviews) / len(reviews) if reviews else 0
    }

Step 4: Digital Shelf Analytics

Where your product appears in retailer search results directly impacts sales. Monitor your digital shelf position and compare against competitors:

async def track_digital_shelf(
    search_terms: List[str],
    your_skus: List[str],
    retailers: List[str] = ["amazon", "walmart", "target"]
) -> List[DigitalShelfPosition]:
    """
    Track where your products rank in retailer search results
    for key category terms.
    """
    positions = []
    
    retailer_search_urls = {
        "amazon": "https://www.amazon.com/s?k={}",
        "walmart": "https://www.walmart.com/search?q={}",
        "target": "https://www.target.com/s?searchTerm={}"
    }
    
    for retailer in retailers:
        for term in search_terms:
            url = retailer_search_urls[retailer].format(term.replace(" ", "+"))
            
            result = await mantis.scrape(
                url=url,
                extract={
                    "products": [{
                        "position": "number",
                        "product_name": "string",
                        "asin_or_sku": "string",
                        "price": "number",
                        "rating": "number",
                        "review_count": "number",
                        "is_sponsored": "boolean",
                        "seller": "string",
                        "url": "string"
                    }],
                    "total_results": "number"
                }
            )
            
            for product in result.get("products", []):
                is_own = product.get("asin_or_sku", "") in your_skus
                
                pos = DigitalShelfPosition(
                    product_sku=product.get("asin_or_sku", ""),
                    retailer=retailer,
                    search_term=term,
                    organic_rank=product["position"] if not product.get("is_sponsored") else None,
                    sponsored_rank=product["position"] if product.get("is_sponsored") else None,
                    page_number=1,
                    total_results=result.get("total_results", 0),
                    review_count=product.get("review_count", 0),
                    average_rating=product.get("rating", 0),
                    scraped_at=datetime.now()
                )
                positions.append(pos)
    
    return positions

def calculate_share_of_shelf(positions: List[DigitalShelfPosition], your_skus: List[str]) -> dict:
    """Calculate share of shelf β€” % of top results that are your products."""
    results = {}
    
    for term in set(p.search_term for p in positions):
        term_positions = [p for p in positions if p.search_term == term]
        top_20 = term_positions[:20]
        your_count = sum(1 for p in top_20 if p.product_sku in your_skus)
        
        results[term] = {
            "share_of_shelf": round(your_count / len(top_20) * 100, 1) if top_20 else 0,
            "your_positions": [p.organic_rank for p in top_20 if p.product_sku in your_skus and p.organic_rank],
            "top_competitor": next(
                (p.product_sku for p in top_20 if p.product_sku not in your_skus),
                None
            )
        }
    
    return results

Step 5: MAP Compliance Monitoring

For brands with MAP (Minimum Advertised Price) policies, automated monitoring catches unauthorized price cuts from resellers:

async def monitor_map_compliance(
    products: List[dict],  # [{"sku": "X", "name": "Y", "map_price": 29.99, "urls": [...]}]
) -> List[MAPViolation]:
    """
    Check authorized and unauthorized sellers for MAP violations.
    Scans Amazon 3P sellers, eBay, Google Shopping, and DTC sites.
    """
    violations = []
    
    for product in products:
        for url in product["urls"]:
            result = await mantis.scrape(
                url=url,
                extract={
                    "sellers": [{
                        "seller_name": "string",
                        "price": "number",
                        "condition": "string",
                        "fulfillment": "string"
                    }]
                }
            )
            
            for seller in result.get("sellers", []):
                if seller["price"] < product["map_price"]:
                    violation = MAPViolation(
                        product_sku=product["sku"],
                        product_name=product["name"],
                        brand=product.get("brand", ""),
                        map_price=product["map_price"],
                        advertised_price=seller["price"],
                        violation_amount=round(product["map_price"] - seller["price"], 2),
                        seller_name=seller["seller_name"],
                        retailer=detect_retailer(url),
                        source_url=url,
                        detected_at=datetime.now()
                    )
                    violations.append(violation)
    
    return violations

Step 6: AI-Powered Competitive Analysis & Alerts

from openai import OpenAI

openai_client = OpenAI()

async def generate_retail_intelligence(
    prices: List[ProductPrice],
    reviews: List[ProductReview],
    shelf_positions: List[DigitalShelfPosition],
    violations: List[MAPViolation]
) -> dict:
    """
    Generate a comprehensive retail intelligence briefing
    combining pricing, reviews, shelf data, and MAP compliance.
    """
    response = openai_client.chat.completions.create(
        model="gpt-4o",
        messages=[{
            "role": "system",
            "content": """You are a retail analytics expert. Analyze the following data
            and produce an actionable intelligence briefing:
            
            1. PRICING INTELLIGENCE
               - Significant price changes (>5%)
               - Pricing trends by category/retailer
               - Promotional activity detected
               - Pricing recommendations (match, undercut, or hold)
            
            2. INVENTORY & AVAILABILITY
               - Out-of-stock events (opportunity to capture share)
               - Low stock signals
               - Fulfillment changes (1P vs 3P shifts)
            
            3. REVIEW & SENTIMENT
               - Rating trend direction
               - Emerging negative themes to address
               - Competitive sentiment comparison
            
            4. DIGITAL SHELF
               - Ranking changes for key terms
               - Share of shelf trends
               - Content optimization opportunities
            
            5. MAP COMPLIANCE
               - New violations detected
               - Repeat offenders
               - Recommended enforcement actions
            
            6. TOP 3 ACTIONS FOR THIS WEEK
               - Prioritized, specific, actionable
            
            Use data, not opinions. Include specific numbers."""
        }, {
            "role": "user",
            "content": f"""Pricing data: {len(prices)} SKUs tracked
            {json.dumps([p.model_dump() for p in prices[:30]], default=str)}
            
            Reviews: {len(reviews)} new reviews
            {json.dumps([r.model_dump() for r in reviews[:20]], default=str)}
            
            Shelf positions: {len(shelf_positions)} tracked
            {json.dumps([s.model_dump() for s in shelf_positions[:20]], default=str)}
            
            MAP violations: {len(violations)} detected
            {json.dumps([v.model_dump() for v in violations], default=str)}"""
        }],
        temperature=0.2
    )
    
    return {
        "briefing": response.choices[0].message.content,
        "generated_at": datetime.now().isoformat(),
        "data_summary": {
            "skus_tracked": len(prices),
            "reviews_analyzed": len(reviews),
            "search_terms_tracked": len(set(s.search_term for s in shelf_positions)),
            "map_violations": len(violations)
        }
    }

async def deliver_alerts(alerts: dict, slack_webhook: str):
    """Route retail alerts by urgency."""
    import httpx
    
    # Critical: Large competitor price drops, OOS on own products
    if alerts.get("price_drops") or alerts.get("oos_events"):
        critical_msg = "🚨 *Retail Alert*\n\n"
        
        for drop in alerts.get("price_drops", [])[:5]:
            critical_msg += f"β€’ *{drop['product']}* on {drop['retailer']}: ${drop['current']} (↓{drop['drop_pct']}% from ${drop['avg_30d']} avg)\n"
        
        for oos in alerts.get("oos_events", [])[:5]:
            critical_msg += f"β€’ ⚠️ *OOS:* {oos['product']} on {oos['retailer']}\n"
        
        await httpx.AsyncClient().post(slack_webhook, json={
            "text": critical_msg,
            "unfurl_links": False
        })

Advanced: Cross-Retailer Price Optimization

Build a pricing recommendation engine that considers competitor pricing across all retailers simultaneously:

async def price_optimization_engine(
    own_products: List[ProductPrice],
    competitor_prices: List[ProductPrice],
    margins: dict,  # sku -> {"cost": X, "min_margin": Y}
    strategy: str = "competitive"  # "competitive", "premium", "value"
) -> dict:
    """
    Generate pricing recommendations based on competitive landscape,
    margins, and pricing strategy.
    """
    recommendations = {}
    
    for own in own_products:
        sku = own.sku
        
        # Find competing products in same category
        competitors = [
            c for c in competitor_prices
            if c.retailer == own.retailer and c.sku != sku
        ]
        
        if not competitors:
            continue
        
        comp_prices = [c.current_price for c in competitors if c.in_stock]
        if not comp_prices:
            continue
        
        avg_comp = sum(comp_prices) / len(comp_prices)
        min_comp = min(comp_prices)
        max_comp = max(comp_prices)
        
        # Calculate floor based on margin requirements
        cost = margins.get(sku, {}).get("cost", 0)
        min_margin = margins.get(sku, {}).get("min_margin", 0.2)
        floor_price = cost / (1 - min_margin) if cost else 0
        
        # Strategy-based recommendation
        if strategy == "competitive":
            target = avg_comp * 0.97  # 3% below average
        elif strategy == "premium":
            target = avg_comp * 1.10  # 10% above average
        elif strategy == "value":
            target = min_comp * 0.95  # 5% below cheapest
        
        recommended = max(target, floor_price)
        
        recommendations[sku] = {
            "current_price": own.current_price,
            "recommended_price": round(recommended, 2),
            "change": round(recommended - own.current_price, 2),
            "competitor_avg": round(avg_comp, 2),
            "competitor_range": f"${min_comp:.2f} - ${max_comp:.2f}",
            "competitors_in_stock": len(comp_prices),
            "margin_at_recommended": round((recommended - cost) / recommended * 100, 1) if cost else None,
            "rationale": generate_pricing_rationale(own, competitors, recommended, strategy)
        }
    
    return recommendations

Cost Comparison: AI Agents vs. Retail Intelligence Platforms

PlatformMonthly CostBest For
Profitero$5,000–$50,000Enterprise digital shelf analytics, retailer scorecards
Salsify$3,000–$25,000Product content management + shelf analytics
Stackline$5,000–$30,000Connected commerce analytics, market share
Jungle Scout$49–$399Amazon-specific product research and tracking
Keepa / CamelCamelCamelFree–$20Amazon price history (single retailer)
Prisync$99–$399Competitor price tracking (limited scale)
AI Agent + Mantis$29–$299Multi-retailer pricing, reviews, shelf, MAP β€” fully custom

Honest caveat: Enterprise platforms like Profitero and Salsify offer pre-built retailer integrations, historical benchmarking databases, and category-level market share data that's difficult to replicate with scraping alone. Their value is strongest for brands selling through 20+ retailers at enterprise scale. For brands tracking 5–10 retailers and wanting custom intelligence tailored to their specific competitive set, an AI agent approach delivers 80–90% of the value at 5–10% of the cost.

Use Cases by Retail Segment

1. CPG Brands β€” Competitive Pricing & Digital Shelf

Track how your products are priced, positioned, and reviewed across Amazon, Walmart, Target, Kroger, and other key retailers. Detect when competitors launch promotions, when your Buy Box is lost, and when new negative reviews spike. Essential for brand managers and trade marketing teams managing dozens to thousands of SKUs.

2. DTC & E-commerce Brands

Monitor competitor DTC sites for pricing changes, new product launches, promotional activity, and customer review trends. Track SEO positioning for key category terms. Identify when competitors' products go out of stock β€” opportunities to capture search traffic with targeted ads.

3. Retailers & Marketplace Sellers

Automated repricing based on competitive data. Monitor MAP compliance from authorized dealers. Track category trends to inform buying decisions. Build assortment intelligence β€” which products are trending up across competitor catalogs?

4. Private Label & Amazon FBA Sellers

Product research at scale β€” track bestseller rankings, review velocity, and pricing trends across entire categories. Identify product opportunities where demand is high but competition is weak. Monitor your listings for hijackers and unauthorized sellers.

Compliance & Best Practices

Getting Started

  1. Define your competitive set β€” which SKUs, brands, and retailers matter most to track?
  2. Set up Mantis API access β€” sign up for a free API key (100 calls/month free)
  3. Start with pricing β€” competitor price monitoring delivers the fastest ROI and is easiest to implement
  4. Add reviews weekly β€” batch review scraping weekly to track sentiment trends without excessive API usage
  5. Build digital shelf tracking β€” monitor your top 10 search terms across your top 3 retailers
  6. Automate alerts β€” route price drops >10%, OOS events, and negative review spikes to Slack