Web Scraping for Retail & CPG: How AI Agents Track Pricing, Inventory, Reviews & Shelf Data in 2026
The global retail industry generates over $28 trillion annually, with e-commerce alone exceeding $6.3 trillion in 2025. For retailers and CPG (consumer packaged goods) brands, competitive intelligence isn't optional β it's survival. Who changed their price on Amazon this morning? Is your product in stock at Walmart.com? What are customers saying in Target reviews? How does your digital shelf compare to the competition?
Traditional retail intelligence platforms like Profitero, Salsify, and Nielsen charge $3,000β$50,000+ per month for digital shelf analytics. Yet the data they sell is publicly available on retailer websites. AI agents powered by web scraping APIs can build equivalent intelligence systems at a fraction of the cost.
In this guide, you'll build a complete retail and CPG intelligence system using Python, the Mantis WebPerception API, and GPT-4o β covering competitor pricing, inventory monitoring, review analysis, and digital shelf optimization.
Why Retail & CPG Teams Need Web Scraping
Retail moves fast. Prices change hourly. Products go in and out of stock. New competitors launch daily. The brands that win are the ones with the best real-time intelligence:
- Competitor pricing β Amazon, Walmart, Target, Best Buy, and thousands of DTC brands change prices multiple times per day
- Stock availability β Out-of-stock events cost CPG brands billions annually; knowing when competitors are OOS creates opportunities
- Customer reviews & sentiment β New negative reviews can tank conversion rates; catching them early lets you respond
- Digital shelf position β Where your product ranks in search results on Amazon, Walmart.com, and other retailers determines sales velocity
- MAP compliance β Brands need to monitor authorized and unauthorized resellers for Minimum Advertised Price violations
- Assortment & catalog changes β New product launches, discontinued items, and category restructuring by competitors
- Promotional activity β Flash sales, coupons, bundle deals, and seasonal promotions across retailer sites
Build Retail Intelligence Agents with Mantis
Scrape pricing, reviews, inventory, and digital shelf data from any retailer with one API call. AI-powered extraction handles every product page format.
Get Free API Key βArchitecture: The 6-Step Retail Intelligence Pipeline
- Competitor price scraping β Monitor pricing across Amazon, Walmart, Target, and DTC competitors at scale
- Inventory & availability tracking β Detect out-of-stock events, low inventory signals, and fulfillment changes
- Review & sentiment monitoring β Track new reviews, rating trends, and customer sentiment across platforms
- Digital shelf analytics β Monitor search rankings, Buy Box ownership, content quality, and share of shelf
- GPT-4o competitive analysis β Generate pricing recommendations, identify threats, and predict competitor moves
- Alert delivery β Route price drops, OOS events, and negative reviews to the right team via Slack or email
Step 1: Define Your Retail Data Models
from pydantic import BaseModel
from typing import Optional, List
from datetime import datetime
from enum import Enum
class RetailerName(str, Enum):
AMAZON = "amazon"
WALMART = "walmart"
TARGET = "target"
BESTBUY = "bestbuy"
COSTCO = "costco"
KROGER = "kroger"
OTHER = "other"
class ProductPrice(BaseModel):
"""Competitor product pricing data."""
sku: str
product_name: str
brand: str
retailer: RetailerName
current_price: float
original_price: Optional[float] # MSRP or list price
sale_price: Optional[float]
price_per_unit: Optional[str] # e.g., "$0.15/oz"
currency: str = "USD"
in_stock: bool
fulfillment: Optional[str] # "shipped by amazon", "store pickup", "3P seller"
seller_name: Optional[str]
buy_box_winner: Optional[bool]
coupon: Optional[str] # e.g., "Save 20% with coupon"
scraped_at: datetime
source_url: str
price_change_pct: Optional[float] # vs. last scrape
class ProductReview(BaseModel):
"""Customer review from a retailer."""
product_sku: str
retailer: RetailerName
rating: float # 1-5
title: str
body: str
author: str
date: datetime
verified_purchase: bool
helpful_votes: int
sentiment: Optional[str] # positive, negative, neutral, mixed
key_themes: Optional[List[str]] # AI-extracted: "quality", "shipping", "value"
source_url: str
class DigitalShelfPosition(BaseModel):
"""Product ranking and visibility on retailer search."""
product_sku: str
retailer: RetailerName
search_term: str
organic_rank: Optional[int]
sponsored_rank: Optional[int]
page_number: int
total_results: int
buy_box_owned: Optional[bool]
content_score: Optional[float] # image count, description length, A+ content
review_count: int
average_rating: float
scraped_at: datetime
class MAPViolation(BaseModel):
"""Minimum Advertised Price violation detected."""
product_sku: str
product_name: str
brand: str
map_price: float
advertised_price: float
violation_amount: float
seller_name: str
retailer: RetailerName
source_url: str
detected_at: datetime
Step 2: Scrape Competitor Pricing at Scale
from mantis import MantisClient
import asyncio
mantis = MantisClient(api_key="your-mantis-api-key")
async def scrape_product_prices(
product_urls: List[dict], # [{"sku": "ABC", "retailer": "amazon", "url": "..."}]
previous_prices: dict = None # sku -> last_price for change detection
) -> List[ProductPrice]:
"""
Scrape current pricing from multiple retailer product pages.
Mantis handles JS rendering and anti-bot for Amazon, Walmart, etc.
"""
prices = []
for product in product_urls:
result = await mantis.scrape(
url=product["url"],
extract={
"product_name": "string",
"brand": "string",
"current_price": "number",
"original_price": "number or null",
"sale_price": "number or null",
"price_per_unit": "string or null",
"in_stock": "boolean",
"fulfillment": "string",
"seller_name": "string or null",
"buy_box_winner": "boolean or null",
"coupon_text": "string or null"
}
)
# Calculate price change vs. previous scrape
prev_price = (previous_prices or {}).get(product["sku"])
change_pct = None
if prev_price and result.get("current_price"):
change_pct = ((result["current_price"] - prev_price) / prev_price) * 100
price = ProductPrice(
sku=product["sku"],
product_name=result.get("product_name", ""),
brand=result.get("brand", ""),
retailer=product["retailer"],
current_price=result.get("current_price", 0),
original_price=result.get("original_price"),
sale_price=result.get("sale_price"),
price_per_unit=result.get("price_per_unit"),
in_stock=result.get("in_stock", False),
fulfillment=result.get("fulfillment"),
seller_name=result.get("seller_name"),
buy_box_winner=result.get("buy_box_winner"),
coupon=result.get("coupon_text"),
scraped_at=datetime.now(),
source_url=product["url"],
price_change_pct=change_pct
)
prices.append(price)
return prices
# Monitor your category across major retailers
competitor_products = [
{"sku": "COMP-001", "retailer": "amazon", "url": "https://amazon.com/dp/B0EXAMPLE1"},
{"sku": "COMP-001", "retailer": "walmart", "url": "https://walmart.com/ip/example1"},
{"sku": "COMP-002", "retailer": "amazon", "url": "https://amazon.com/dp/B0EXAMPLE2"},
{"sku": "OWN-001", "retailer": "amazon", "url": "https://amazon.com/dp/B0OWNPROD1"},
# ... hundreds of SKUs
]
prices = await scrape_product_prices(competitor_products)
Detecting Price Drops and Promotions
async def detect_price_anomalies(
current_prices: List[ProductPrice],
historical_db: str = "retail_intelligence.db"
) -> dict:
"""
Compare current prices against historical data to detect
significant changes, promotions, and pricing strategies.
"""
import sqlite3
conn = sqlite3.connect(historical_db)
alerts = {"price_drops": [], "price_increases": [], "new_promotions": [], "oos_events": []}
for price in current_prices:
# Get 30-day average for this SKU + retailer
avg = conn.execute("""
SELECT AVG(current_price), MIN(current_price), MAX(current_price)
FROM price_history
WHERE sku = ? AND retailer = ?
AND scraped_at > datetime('now', '-30 days')
""", (price.sku, price.retailer)).fetchone()
if avg and avg[0]:
avg_price, min_price, max_price = avg
# Significant price drop (>10% below 30-day average)
if price.current_price < avg_price * 0.9:
alerts["price_drops"].append({
"sku": price.sku,
"product": price.product_name,
"retailer": price.retailer,
"current": price.current_price,
"avg_30d": round(avg_price, 2),
"drop_pct": round(((avg_price - price.current_price) / avg_price) * 100, 1),
"url": price.source_url
})
# Price increase (>5% above average)
if price.current_price > avg_price * 1.05:
alerts["price_increases"].append({
"sku": price.sku,
"product": price.product_name,
"increase_pct": round(((price.current_price - avg_price) / avg_price) * 100, 1)
})
# Out-of-stock detection
if not price.in_stock:
alerts["oos_events"].append({
"sku": price.sku,
"product": price.product_name,
"retailer": price.retailer,
"url": price.source_url
})
# Store current price
conn.execute(
"INSERT INTO price_history (sku, retailer, current_price, in_stock, scraped_at) VALUES (?, ?, ?, ?, ?)",
(price.sku, price.retailer, price.current_price, price.in_stock, price.scraped_at.isoformat())
)
conn.commit()
conn.close()
return alerts
Step 3: Monitor Reviews & Customer Sentiment
async def scrape_product_reviews(
product_urls: List[dict],
max_reviews_per_product: int = 50,
sort_by: str = "recent"
) -> List[ProductReview]:
"""
Scrape customer reviews from retailer product pages.
Focus on recent reviews to catch emerging issues.
"""
reviews = []
for product in product_urls:
result = await mantis.scrape(
url=product["url"],
extract={
"reviews": [{
"rating": "number (1-5)",
"title": "string",
"body": "string",
"author": "string",
"date": "string",
"verified_purchase": "boolean",
"helpful_votes": "number"
}],
"overall_rating": "number",
"total_review_count": "number",
"rating_distribution": {
"5_star_pct": "number",
"4_star_pct": "number",
"3_star_pct": "number",
"2_star_pct": "number",
"1_star_pct": "number"
}
}
)
for review in result.get("reviews", [])[:max_reviews_per_product]:
r = ProductReview(
product_sku=product["sku"],
retailer=product["retailer"],
rating=review.get("rating", 0),
title=review.get("title", ""),
body=review.get("body", ""),
author=review.get("author", ""),
date=review.get("date", ""),
verified_purchase=review.get("verified_purchase", False),
helpful_votes=review.get("helpful_votes", 0),
source_url=product["url"]
)
reviews.append(r)
return reviews
async def analyze_review_sentiment(reviews: List[ProductReview]) -> dict:
"""Use GPT-4o to analyze review sentiment and extract themes."""
from openai import OpenAI
client = OpenAI()
review_texts = [
f"Rating: {r.rating}/5 | {r.title}: {r.body[:300]}"
for r in reviews[:50]
]
response = client.chat.completions.create(
model="gpt-4o",
messages=[{
"role": "system",
"content": """Analyze these product reviews and provide:
1. OVERALL SENTIMENT: positive/negative/mixed with confidence %
2. TOP POSITIVE THEMES: What customers love (with frequency)
3. TOP NEGATIVE THEMES: What customers complain about (with frequency)
4. TRENDING ISSUES: New problems appearing in recent reviews
5. COMPETITIVE INSIGHTS: What customers compare this product to
6. IMPROVEMENT PRIORITIES: Top 3 product/listing improvements to make
Be data-driven. Count occurrences. Prioritize actionable insights."""
}, {
"role": "user",
"content": f"Reviews to analyze:\n\n" + "\n---\n".join(review_texts)
}],
temperature=0.2
)
return {
"analysis": response.choices[0].message.content,
"reviews_analyzed": len(reviews),
"avg_rating": sum(r.rating for r in reviews) / len(reviews) if reviews else 0
}
Step 4: Digital Shelf Analytics
Where your product appears in retailer search results directly impacts sales. Monitor your digital shelf position and compare against competitors:
async def track_digital_shelf(
search_terms: List[str],
your_skus: List[str],
retailers: List[str] = ["amazon", "walmart", "target"]
) -> List[DigitalShelfPosition]:
"""
Track where your products rank in retailer search results
for key category terms.
"""
positions = []
retailer_search_urls = {
"amazon": "https://www.amazon.com/s?k={}",
"walmart": "https://www.walmart.com/search?q={}",
"target": "https://www.target.com/s?searchTerm={}"
}
for retailer in retailers:
for term in search_terms:
url = retailer_search_urls[retailer].format(term.replace(" ", "+"))
result = await mantis.scrape(
url=url,
extract={
"products": [{
"position": "number",
"product_name": "string",
"asin_or_sku": "string",
"price": "number",
"rating": "number",
"review_count": "number",
"is_sponsored": "boolean",
"seller": "string",
"url": "string"
}],
"total_results": "number"
}
)
for product in result.get("products", []):
is_own = product.get("asin_or_sku", "") in your_skus
pos = DigitalShelfPosition(
product_sku=product.get("asin_or_sku", ""),
retailer=retailer,
search_term=term,
organic_rank=product["position"] if not product.get("is_sponsored") else None,
sponsored_rank=product["position"] if product.get("is_sponsored") else None,
page_number=1,
total_results=result.get("total_results", 0),
review_count=product.get("review_count", 0),
average_rating=product.get("rating", 0),
scraped_at=datetime.now()
)
positions.append(pos)
return positions
def calculate_share_of_shelf(positions: List[DigitalShelfPosition], your_skus: List[str]) -> dict:
"""Calculate share of shelf β % of top results that are your products."""
results = {}
for term in set(p.search_term for p in positions):
term_positions = [p for p in positions if p.search_term == term]
top_20 = term_positions[:20]
your_count = sum(1 for p in top_20 if p.product_sku in your_skus)
results[term] = {
"share_of_shelf": round(your_count / len(top_20) * 100, 1) if top_20 else 0,
"your_positions": [p.organic_rank for p in top_20 if p.product_sku in your_skus and p.organic_rank],
"top_competitor": next(
(p.product_sku for p in top_20 if p.product_sku not in your_skus),
None
)
}
return results
Step 5: MAP Compliance Monitoring
For brands with MAP (Minimum Advertised Price) policies, automated monitoring catches unauthorized price cuts from resellers:
async def monitor_map_compliance(
products: List[dict], # [{"sku": "X", "name": "Y", "map_price": 29.99, "urls": [...]}]
) -> List[MAPViolation]:
"""
Check authorized and unauthorized sellers for MAP violations.
Scans Amazon 3P sellers, eBay, Google Shopping, and DTC sites.
"""
violations = []
for product in products:
for url in product["urls"]:
result = await mantis.scrape(
url=url,
extract={
"sellers": [{
"seller_name": "string",
"price": "number",
"condition": "string",
"fulfillment": "string"
}]
}
)
for seller in result.get("sellers", []):
if seller["price"] < product["map_price"]:
violation = MAPViolation(
product_sku=product["sku"],
product_name=product["name"],
brand=product.get("brand", ""),
map_price=product["map_price"],
advertised_price=seller["price"],
violation_amount=round(product["map_price"] - seller["price"], 2),
seller_name=seller["seller_name"],
retailer=detect_retailer(url),
source_url=url,
detected_at=datetime.now()
)
violations.append(violation)
return violations
Step 6: AI-Powered Competitive Analysis & Alerts
from openai import OpenAI
openai_client = OpenAI()
async def generate_retail_intelligence(
prices: List[ProductPrice],
reviews: List[ProductReview],
shelf_positions: List[DigitalShelfPosition],
violations: List[MAPViolation]
) -> dict:
"""
Generate a comprehensive retail intelligence briefing
combining pricing, reviews, shelf data, and MAP compliance.
"""
response = openai_client.chat.completions.create(
model="gpt-4o",
messages=[{
"role": "system",
"content": """You are a retail analytics expert. Analyze the following data
and produce an actionable intelligence briefing:
1. PRICING INTELLIGENCE
- Significant price changes (>5%)
- Pricing trends by category/retailer
- Promotional activity detected
- Pricing recommendations (match, undercut, or hold)
2. INVENTORY & AVAILABILITY
- Out-of-stock events (opportunity to capture share)
- Low stock signals
- Fulfillment changes (1P vs 3P shifts)
3. REVIEW & SENTIMENT
- Rating trend direction
- Emerging negative themes to address
- Competitive sentiment comparison
4. DIGITAL SHELF
- Ranking changes for key terms
- Share of shelf trends
- Content optimization opportunities
5. MAP COMPLIANCE
- New violations detected
- Repeat offenders
- Recommended enforcement actions
6. TOP 3 ACTIONS FOR THIS WEEK
- Prioritized, specific, actionable
Use data, not opinions. Include specific numbers."""
}, {
"role": "user",
"content": f"""Pricing data: {len(prices)} SKUs tracked
{json.dumps([p.model_dump() for p in prices[:30]], default=str)}
Reviews: {len(reviews)} new reviews
{json.dumps([r.model_dump() for r in reviews[:20]], default=str)}
Shelf positions: {len(shelf_positions)} tracked
{json.dumps([s.model_dump() for s in shelf_positions[:20]], default=str)}
MAP violations: {len(violations)} detected
{json.dumps([v.model_dump() for v in violations], default=str)}"""
}],
temperature=0.2
)
return {
"briefing": response.choices[0].message.content,
"generated_at": datetime.now().isoformat(),
"data_summary": {
"skus_tracked": len(prices),
"reviews_analyzed": len(reviews),
"search_terms_tracked": len(set(s.search_term for s in shelf_positions)),
"map_violations": len(violations)
}
}
async def deliver_alerts(alerts: dict, slack_webhook: str):
"""Route retail alerts by urgency."""
import httpx
# Critical: Large competitor price drops, OOS on own products
if alerts.get("price_drops") or alerts.get("oos_events"):
critical_msg = "π¨ *Retail Alert*\n\n"
for drop in alerts.get("price_drops", [])[:5]:
critical_msg += f"β’ *{drop['product']}* on {drop['retailer']}: ${drop['current']} (β{drop['drop_pct']}% from ${drop['avg_30d']} avg)\n"
for oos in alerts.get("oos_events", [])[:5]:
critical_msg += f"β’ β οΈ *OOS:* {oos['product']} on {oos['retailer']}\n"
await httpx.AsyncClient().post(slack_webhook, json={
"text": critical_msg,
"unfurl_links": False
})
Advanced: Cross-Retailer Price Optimization
Build a pricing recommendation engine that considers competitor pricing across all retailers simultaneously:
async def price_optimization_engine(
own_products: List[ProductPrice],
competitor_prices: List[ProductPrice],
margins: dict, # sku -> {"cost": X, "min_margin": Y}
strategy: str = "competitive" # "competitive", "premium", "value"
) -> dict:
"""
Generate pricing recommendations based on competitive landscape,
margins, and pricing strategy.
"""
recommendations = {}
for own in own_products:
sku = own.sku
# Find competing products in same category
competitors = [
c for c in competitor_prices
if c.retailer == own.retailer and c.sku != sku
]
if not competitors:
continue
comp_prices = [c.current_price for c in competitors if c.in_stock]
if not comp_prices:
continue
avg_comp = sum(comp_prices) / len(comp_prices)
min_comp = min(comp_prices)
max_comp = max(comp_prices)
# Calculate floor based on margin requirements
cost = margins.get(sku, {}).get("cost", 0)
min_margin = margins.get(sku, {}).get("min_margin", 0.2)
floor_price = cost / (1 - min_margin) if cost else 0
# Strategy-based recommendation
if strategy == "competitive":
target = avg_comp * 0.97 # 3% below average
elif strategy == "premium":
target = avg_comp * 1.10 # 10% above average
elif strategy == "value":
target = min_comp * 0.95 # 5% below cheapest
recommended = max(target, floor_price)
recommendations[sku] = {
"current_price": own.current_price,
"recommended_price": round(recommended, 2),
"change": round(recommended - own.current_price, 2),
"competitor_avg": round(avg_comp, 2),
"competitor_range": f"${min_comp:.2f} - ${max_comp:.2f}",
"competitors_in_stock": len(comp_prices),
"margin_at_recommended": round((recommended - cost) / recommended * 100, 1) if cost else None,
"rationale": generate_pricing_rationale(own, competitors, recommended, strategy)
}
return recommendations
Cost Comparison: AI Agents vs. Retail Intelligence Platforms
| Platform | Monthly Cost | Best For |
|---|---|---|
| Profitero | $5,000β$50,000 | Enterprise digital shelf analytics, retailer scorecards |
| Salsify | $3,000β$25,000 | Product content management + shelf analytics |
| Stackline | $5,000β$30,000 | Connected commerce analytics, market share |
| Jungle Scout | $49β$399 | Amazon-specific product research and tracking |
| Keepa / CamelCamelCamel | Freeβ$20 | Amazon price history (single retailer) |
| Prisync | $99β$399 | Competitor price tracking (limited scale) |
| AI Agent + Mantis | $29β$299 | Multi-retailer pricing, reviews, shelf, MAP β fully custom |
Honest caveat: Enterprise platforms like Profitero and Salsify offer pre-built retailer integrations, historical benchmarking databases, and category-level market share data that's difficult to replicate with scraping alone. Their value is strongest for brands selling through 20+ retailers at enterprise scale. For brands tracking 5β10 retailers and wanting custom intelligence tailored to their specific competitive set, an AI agent approach delivers 80β90% of the value at 5β10% of the cost.
Use Cases by Retail Segment
1. CPG Brands β Competitive Pricing & Digital Shelf
Track how your products are priced, positioned, and reviewed across Amazon, Walmart, Target, Kroger, and other key retailers. Detect when competitors launch promotions, when your Buy Box is lost, and when new negative reviews spike. Essential for brand managers and trade marketing teams managing dozens to thousands of SKUs.
2. DTC & E-commerce Brands
Monitor competitor DTC sites for pricing changes, new product launches, promotional activity, and customer review trends. Track SEO positioning for key category terms. Identify when competitors' products go out of stock β opportunities to capture search traffic with targeted ads.
3. Retailers & Marketplace Sellers
Automated repricing based on competitive data. Monitor MAP compliance from authorized dealers. Track category trends to inform buying decisions. Build assortment intelligence β which products are trending up across competitor catalogs?
4. Private Label & Amazon FBA Sellers
Product research at scale β track bestseller rankings, review velocity, and pricing trends across entire categories. Identify product opportunities where demand is high but competition is weak. Monitor your listings for hijackers and unauthorized sellers.
Compliance & Best Practices
- Product pricing is public data β prices displayed on retailer websites are public by nature
- Retailer ToS vary β Amazon, Walmart, and others have terms about automated access; use rate limiting and respect robots.txt
- Review scraping β publicly posted reviews are generally scrapeable; don't collect reviewer PII beyond what's displayed
- Rate limiting is essential β space requests to avoid IP blocks; Mantis handles this automatically
- MAP enforcement β MAP policies are legal agreements between brands and retailers; monitoring violations is standard business practice
- Data freshness β pricing data degrades quickly; scrape high-priority SKUs multiple times daily
Getting Started
- Define your competitive set β which SKUs, brands, and retailers matter most to track?
- Set up Mantis API access β sign up for a free API key (100 calls/month free)
- Start with pricing β competitor price monitoring delivers the fastest ROI and is easiest to implement
- Add reviews weekly β batch review scraping weekly to track sentiment trends without excessive API usage
- Build digital shelf tracking β monitor your top 10 search terms across your top 3 retailers
- Automate alerts β route price drops >10%, OOS events, and negative review spikes to Slack