How to Scrape Amazon Product Data in 2026 (Python, Node.js, API)

Q: Is it legal to scrape Amazon product data?

Scraping publicly available Amazon product pages exists in a legal gray area. The Van Buren v. United States (2021) Supreme Court decision narrowed the CFAA, suggesting accessing publicly available data isn't a federal crime. However, Amazon's Terms of Service prohibit automated access. For commercial use, consider Amazon's Product Advertising API or a web scraping API that handles compliance.

Q: How do I scrape Amazon without getting blocked?

Amazon has aggressive anti-bot detection. To avoid blocks: rotate residential proxies, randomize request delays (3-15 seconds), rotate User-Agent strings, handle CAPTCHAs, avoid scraping too many pages per minute, and use headless browsers with stealth plugins. Alternatively, use a web scraping API like Mantis that handles anti-blocking automatically.

Q: What Python library is best for scraping Amazon?

For simple product pages, requests + BeautifulSoup works for small-scale scraping. For JavaScript-rendered content and better anti-detection, Playwright with stealth plugins is recommended. For production at scale, a dedicated web scraping API like Mantis handles proxies, CAPTCHAs, and rate limiting automatically.

Q: Can I use Amazon's official API instead of scraping?

Yes. The Amazon Product Advertising API (PA-API 5.0) provides structured product data including prices, reviews, and images. However, it requires an Amazon Associates account with qualifying sales, has strict rate limits (1 request/second), and doesn't provide all the data visible on product pages (like Q&A, full review text, or seller details).

Q: What data can I extract from Amazon product pages?

You can extract: product title, price (current and list price), rating and review count, product images, bullet-point features, description, ASIN, category/breadcrumbs, seller information, availability status, Best Seller Rank, product dimensions, Q&A content, and individual review text with ratings.

Q: How many Amazon pages can I scrape per day?

Without proxies, Amazon typically blocks after 20-50 requests. With rotating residential proxies and proper delays, you can scale to 5,000-10,000 pages per day. With a web scraping API like Mantis, you can make up to 100,000 requests per month on the Scale plan without managing infrastructure.

Why Scrape Amazon Product Data?

Amazon is the world's largest online marketplace, with over 350 million products and 300 million active customers. That product data powers some of the most valuable business intelligence in e-commerce:

Price monitoring — Track competitor prices in real time and adjust your pricing strategy automatically
Market research — Discover trending products, market gaps, and demand signals before competitors
Review analysis — Aggregate customer sentiment across thousands of reviews to inform product development
Competitive intelligence — Monitor competitor listings, BSR rankings, and new product launches
Dropshipping & arbitrage — Find price discrepancies between Amazon and other marketplaces
AI agent shopping tools — Give AI assistants the ability to search, compare, and recommend products
Investment research — Track product trends and brand performance as market indicators

Whether you're building a price tracker, a product research tool, or an AI shopping agent, scraping Amazon is a foundational data capability.

What Data Can You Extract?

Amazon product pages contain rich structured data across multiple sections:

Data Point	Location	CSS Selector Hint
Product Title	Top of page	#productTitle
Price	Buy box	.a-price .a-offscreen
List Price	Buy box (strikethrough)	.basisPrice .a-offscreen
Rating	Below title	#acrPopover
Review Count	Below title	#acrCustomerReviewText
Images	Left gallery	#imgTagWrapperId img
Bullet Features	Feature section	#feature-bullets li
ASIN	Product details	th:contains("ASIN")+td
BSR (Best Seller Rank)	Product details	#SalesRank
Availability	Buy box	#availability
Seller	Buy box	#sellerProfileTriggerId
Category Breadcrumbs	Top of page	#wayfinding-breadcrumbs_feature_div

Method 1: Python + BeautifulSoup

The simplest approach for scraping individual Amazon product pages. Works well for small-scale data collection and prototyping.

Install Dependencies

pip install requests beautifulsoup4 lxml

Basic Product Scraper

# amazon_scraper.py
import requests
from bs4 import BeautifulSoup
import json
import time
import random

HEADERS = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
                   "AppleWebKit/537.36 (KHTML, like Gecko) "
                   "Chrome/125.0.0.0 Safari/537.36",
    "Accept-Language": "en-US,en;q=0.9",
    "Accept-Encoding": "gzip, deflate, br",
    "Accept": "text/html,application/xhtml+xml",
    "Referer": "https://www.google.com/",
}

def scrape_amazon_product(url: str) -> dict:
    """Scrape product data from an Amazon product page."""
    resp = requests.get(url, headers=HEADERS, timeout=15)
    resp.raise_for_status()
    soup = BeautifulSoup(resp.text, "lxml")

    def text(selector):
        el = soup.select_one(selector)
        return el.get_text(strip=True) if el else None

    return {
        "title": text("#productTitle"),
        "price": text(".a-price .a-offscreen"),
        "list_price": text(".basisPrice .a-offscreen"),
        "rating": text("#acrPopover .a-icon-alt"),
        "review_count": text("#acrCustomerReviewText"),
        "availability": text("#availability span"),
        "features": [
            li.get_text(strip=True)
            for li in soup.select("#feature-bullets li span.a-list-item")
        ],
        "images": [
            img.get("src")
            for img in soup.select("#altImages img")
            if img.get("src") and "sprite" not in img["src"]
        ],
        "asin": url.split("/dp/")[1].split("/")[0] if "/dp/" in url else None,
        "url": url,
    }

# Example usage
product = scrape_amazon_product(
    "https://www.amazon.com/dp/B0CHX3QBCH"
)
print(json.dumps(product, indent=2))

⚠️ Important

Amazon changes their HTML structure frequently. CSS selectors that work today may break tomorrow. Always test your selectors and build in error handling. For production use, consider an API-based approach that maintains selectors for you.

Scraping Search Results

# search_scraper.py
def scrape_amazon_search(keyword: str, pages: int = 3) -> list:
    """Scrape Amazon search results for a keyword."""
    products = []

    for page in range(1, pages + 1):
        url = (
            f"https://www.amazon.com/s?k={keyword.replace(' ', '+')}"
            f"&page={page}"
        )
        resp = requests.get(url, headers=HEADERS, timeout=15)
        soup = BeautifulSoup(resp.text, "lxml")

        for item in soup.select('[data-component-type="s-search-result"]'):
            title_el = item.select_one("h2 a span")
            price_el = item.select_one(".a-price .a-offscreen")
            rating_el = item.select_one(".a-icon-alt")
            reviews_el = item.select_one(
                '[aria-label*="stars"] + span'
            )
            link_el = item.select_one("h2 a")

            products.append({
                "title": title_el.text.strip() if title_el else None,
                "price": price_el.text.strip() if price_el else None,
                "rating": rating_el.text.strip() if rating_el else None,
                "reviews": reviews_el.text.strip() if reviews_el else None,
                "url": (
                    "https://www.amazon.com" + link_el["href"]
                    if link_el else None
                ),
                "asin": item.get("data-asin"),
            })

        # Random delay between pages
        time.sleep(random.uniform(3, 8))

    return products

results = scrape_amazon_search("wireless earbuds", pages=2)
print(f"Found {len(results)} products")

Method 2: Playwright (Headless Browser)

Amazon heavily relies on JavaScript for dynamic content — lazy-loaded images, price updates, variant selectors, and review widgets. Playwright renders the full page like a real browser, giving you access to all dynamic content.

Install

pip install playwright
playwright install chromium

Full-Render Amazon Scraper

# playwright_amazon.py
import asyncio
from playwright.async_api import async_playwright
import json

async def scrape_amazon_product(asin: str) -> dict:
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        context = await browser.new_context(
            user_agent=(
                "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
                "AppleWebKit/537.36 (KHTML, like Gecko) "
                "Chrome/125.0.0.0 Safari/537.36"
            ),
            viewport={"width": 1920, "height": 1080},
            locale="en-US",
        )

        page = await context.new_page()

        # Block unnecessary resources for speed
        await page.route("**/*.{png,jpg,jpeg,gif,svg,ico}", 
                         lambda route: route.abort())
        await page.route("**/ads/**", lambda route: route.abort())

        url = f"https://www.amazon.com/dp/{asin}"
        await page.goto(url, wait_until="domcontentloaded")
        await page.wait_for_timeout(2000)

        product = await page.evaluate("""() => {
            const text = (sel) => {
                const el = document.querySelector(sel);
                return el ? el.textContent.trim() : null;
            };
            return {
                title: text('#productTitle'),
                price: text('.a-price .a-offscreen'),
                list_price: text('.basisPrice .a-offscreen'),
                rating: text('#acrPopover .a-icon-alt'),
                review_count: text('#acrCustomerReviewText'),
                availability: text('#availability span'),
                features: [...document.querySelectorAll(
                    '#feature-bullets li span.a-list-item'
                )].map(el => el.textContent.trim()).filter(Boolean),
                description: text('#productDescription p'),
                seller: text('#sellerProfileTriggerId'),
            };
        }""")

        # Extract all high-res images
        images = await page.evaluate("""() => {
            const imgs = document.querySelectorAll(
                '#altImages .a-button-thumbnail img'
            );
            return [...imgs]
                .map(img => img.src)
                .filter(src => src && !src.includes('sprite'))
                .map(src => src.replace(/\._.*_\./, '.'));
        }""")

        product["images"] = images
        product["asin"] = asin
        product["url"] = url

        await browser.close()
        return product

# Run it
data = asyncio.run(scrape_amazon_product("B0CHX3QBCH"))
print(json.dumps(data, indent=2))

💡 Pro Tip: Stealth Mode

Install playwright-stealth to bypass Amazon's bot detection. It patches common browser fingerprint checks like navigator.webdriver, Chrome plugin arrays, and WebGL rendering differences.

Method 3: Node.js + Cheerio

Lightweight and fast — ideal for scraping Amazon at moderate scale from a Node.js backend or serverless function.

Install

npm install cheerio node-fetch

Product Scraper

// amazon-scraper.mjs
import fetch from "node-fetch";
import * as cheerio from "cheerio";

const HEADERS = {
  "User-Agent":
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) " +
    "AppleWebKit/537.36 Chrome/125.0.0.0 Safari/537.36",
  "Accept-Language": "en-US,en;q=0.9",
  Accept: "text/html",
};

async function scrapeProduct(asin) {
  const url = `https://www.amazon.com/dp/${asin}`;
  const resp = await fetch(url, { headers: HEADERS });
  const html = await resp.text();
  const $ = cheerio.load(html);

  const text = (sel) => $(sel).first().text().trim() || null;

  return {
    title: text("#productTitle"),
    price: text(".a-price .a-offscreen"),
    listPrice: text(".basisPrice .a-offscreen"),
    rating: text("#acrPopover .a-icon-alt"),
    reviewCount: text("#acrCustomerReviewText"),
    availability: text("#availability span"),
    features: $("#feature-bullets li span.a-list-item")
      .map((_, el) => $(el).text().trim())
      .get()
      .filter(Boolean),
    asin,
    url,
  };
}

// Batch scrape with rate limiting
async function scrapeMultiple(asins, delayMs = 5000) {
  const results = [];
  for (const asin of asins) {
    try {
      const product = await scrapeProduct(asin);
      results.push(product);
      console.log(`✓ ${product.title?.slice(0, 50)}`);
    } catch (err) {
      console.error(`✗ ${asin}: ${err.message}`);
    }
    await new Promise((r) => setTimeout(r, delayMs));
  }
  return results;
}

// Usage
const products = await scrapeMultiple([
  "B0CHX3QBCH",
  "B0BSHF7WHW",
  "B09V3KXJPB",
]);
console.log(JSON.stringify(products, null, 2));

Method 4: Web Scraping API (Easiest)

The most reliable approach for production. A web scraping API handles proxies, CAPTCHAs, browser rendering, and selector maintenance — you just send a URL and get structured data back.

Using the Mantis API

# One API call — structured Amazon data
import requests

resp = requests.post(
    "https://api.mantisapi.com/v1/scrape",
    headers={
        "Authorization": "Bearer YOUR_API_KEY",
        "Content-Type": "application/json",
    },
    json={
        "url": "https://www.amazon.com/dp/B0CHX3QBCH",
        "extract": {
            "title": "product title",
            "price": "current price",
            "original_price": "list/original price",
            "rating": "star rating",
            "review_count": "number of reviews",
            "features": "bullet point features (array)",
            "availability": "in stock status",
            "seller": "seller name",
            "images": "product image URLs (array)",
            "description": "product description",
        },
        "render_js": True,
    },
)

product = resp.json()
print(product)

Skip the Proxy Headaches

Mantis handles Amazon's anti-bot detection, proxy rotation, CAPTCHA solving, and JavaScript rendering — so you don't have to.

View Pricing Get Started Free

Node.js with Mantis

// mantis-amazon.mjs
const resp = await fetch("https://api.mantisapi.com/v1/scrape", {
  method: "POST",
  headers: {
    Authorization: "Bearer YOUR_API_KEY",
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    url: "https://www.amazon.com/dp/B0CHX3QBCH",
    extract: {
      title: "product title",
      price: "current price",
      rating: "star rating out of 5",
      review_count: "total number of reviews",
      features: "key product features (array)",
    },
    render_js: true,
  }),
});

const product = await resp.json();
console.log(product);

Beating Amazon's Anti-Bot Detection

Amazon has some of the most aggressive anti-scraping measures on the web. Here's what you're up against and how to handle it:

Amazon's Defense Layers

Defense	What It Does	Countermeasure
IP Rate Limiting	Blocks IPs making too many requests	Rotating residential proxies
CAPTCHA Challenges	Serves CAPTCHA on suspicious requests	CAPTCHA solving services or API
Browser Fingerprinting	Detects headless browsers via JS	Stealth plugins, real browser profiles
Behavioral Analysis	Detects non-human browsing patterns	Random delays, scroll simulation
Session Tracking	Correlates requests across sessions	Fresh sessions, cookie rotation
Dynamic Selectors	Changes CSS class names periodically	Semantic selectors, AI extraction

Essential Anti-Detection Techniques

# anti_detection.py
import random
import time

PROXY_POOL = [
    "http://user:pass@proxy1.example.com:8080",
    "http://user:pass@proxy2.example.com:8080",
    "http://user:pass@proxy3.example.com:8080",
]

USER_AGENTS = [
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 "
    "Chrome/125.0.0.0 Safari/537.36",
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 "
    "Chrome/125.0.0.0 Safari/537.36",
    "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 "
    "Chrome/124.0.0.0 Safari/537.36",
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:126.0) "
    "Gecko/20100101 Firefox/126.0",
]

def get_session():
    """Create a requests session with random proxy and UA."""
    session = requests.Session()
    session.proxies = {
        "http": random.choice(PROXY_POOL),
        "https": random.choice(PROXY_POOL),
    }
    session.headers.update({
        "User-Agent": random.choice(USER_AGENTS),
        "Accept-Language": "en-US,en;q=0.9",
        "Accept-Encoding": "gzip, deflate, br",
        "Accept": "text/html,application/xhtml+xml",
    })
    return session

def polite_delay():
    """Random delay to mimic human browsing."""
    time.sleep(random.uniform(4, 12))

def handle_captcha(response):
    """Detect and handle Amazon CAPTCHAs."""
    if "captcha" in response.text.lower() or response.status_code == 503:
        print("⚠️ CAPTCHA detected — rotating proxy")
        return True
    return False

Amazon PA-API vs Scraping

Amazon offers the Product Advertising API (PA-API 5.0) as an official data source. Here's how it compares:

Feature	PA-API 5.0	Web Scraping	Mantis API
Setup Difficulty	Medium (Associates account required)	High (proxies, CAPTCHAs, selectors)	Low (API key)
Rate Limit	1 req/sec (scales with sales)	Depends on proxy pool	Based on plan (up to 100K/mo)
Data Coverage	Basic product info, prices, images	Everything visible on the page	Everything visible on the page
Reviews	Rating + count only	Full review text + individual ratings	Full review text + individual ratings
Q&A Content	Not available	Full Q&A text	Full Q&A text
Seller Details	Limited	Full seller info	Full seller info
BSR History	Current rank only	Current rank (track over time)	Current rank (track over time)
Reliability	Very high (official)	Breaks when Amazon changes HTML	High (maintained selectors)
Cost	Free (requires qualifying sales)	Proxy costs ($50-500+/mo)	$0-299/mo
Legal Risk	None (authorized)	ToS violation risk	API handles compliance

💡 When to Use Each

PA-API: You're an Amazon affiliate and need basic product data (prices, images, ratings). Scraping/Mantis API: You need full review text, Q&A, seller data, BSR tracking, or data not available through PA-API.

Method Comparison

Criteria	Python + BS4	Playwright	Node.js + Cheerio	Mantis API
Setup Time	5 min	10 min	5 min	2 min
JS Rendering	❌	✅	❌	✅
Anti-Detection	Basic	Good (with stealth)	Basic	Built-in
Speed	Fast	Slow (browser overhead)	Fast	Medium
Maintenance	High (selectors break)	High (selectors break)	High (selectors break)	None
Scale	Low-Medium	Low	Medium	High
Cost (10K pages/mo)	$50-200 (proxies)	$100-300 (proxies + compute)	$50-200 (proxies)	$99 (Pro plan)
Best For	Prototyping	Dynamic content	Serverless / APIs	Production

Real-World Use Cases

1. Price Tracker Bot

Monitor product prices and alert when they drop below a threshold — perfect for deal sites, purchasing agents, or personal shopping bots.

# price_tracker.py
import requests
import json
from datetime import datetime

MANTIS_KEY = "YOUR_API_KEY"
WATCHLIST = [
    {"asin": "B0CHX3QBCH", "target_price": 249.99},
    {"asin": "B0BSHF7WHW", "target_price": 89.99},
    {"asin": "B09V3KXJPB", "target_price": 349.00},
]

def check_prices():
    alerts = []
    for item in WATCHLIST:
        resp = requests.post(
            "https://api.mantisapi.com/v1/scrape",
            headers={
                "Authorization": f"Bearer {MANTIS_KEY}",
                "Content-Type": "application/json",
            },
            json={
                "url": f"https://www.amazon.com/dp/{item['asin']}",
                "extract": {
                    "title": "product title",
                    "price": "current price as number",
                },
                "render_js": True,
            },
        )
        data = resp.json()
        price = float(
            data.get("price", "0").replace("$", "").replace(",", "")
        )

        # Log price history
        with open(f"prices_{item['asin']}.jsonl", "a") as f:
            f.write(json.dumps({
                "asin": item["asin"],
                "price": price,
                "timestamp": datetime.utcnow().isoformat(),
            }) + "\n")

        if price <= item["target_price"] and price > 0:
            alerts.append({
                "title": data.get("title", item["asin"]),
                "price": price,
                "target": item["target_price"],
                "url": f"https://www.amazon.com/dp/{item['asin']}",
            })

    return alerts

# Run on a schedule (cron, Lambda, etc.)
alerts = check_prices()
for alert in alerts:
    print(f"🚨 PRICE DROP: {alert['title']}")
    print(f"   ${alert['price']} (target: ${alert['target']})")
    print(f"   {alert['url']}\n")

2. Product Comparison Engine

Build a comparison tool that aggregates data across multiple products — useful for review sites, affiliate content, or internal procurement tools.

# comparison_engine.py
import requests
import json

MANTIS_KEY = "YOUR_API_KEY"

def compare_products(asins: list) -> dict:
    """Compare multiple Amazon products side by side."""
    products = []
    for asin in asins:
        resp = requests.post(
            "https://api.mantisapi.com/v1/scrape",
            headers={
                "Authorization": f"Bearer {MANTIS_KEY}",
                "Content-Type": "application/json",
            },
            json={
                "url": f"https://www.amazon.com/dp/{asin}",
                "extract": {
                    "title": "product title",
                    "price": "current price",
                    "rating": "star rating as number",
                    "review_count": "number of reviews as integer",
                    "features": "top 5 key features (array)",
                    "availability": "stock status",
                },
                "render_js": True,
            },
        )
        data = resp.json()
        data["asin"] = asin
        products.append(data)

    # Rank by value (rating * reviews / price)
    for p in products:
        try:
            price = float(
                p.get("price", "0").replace("$", "").replace(",", "")
            )
            rating = float(p.get("rating", "0"))
            reviews = int(
                p.get("review_count", "0").replace(",", "")
            )
            p["value_score"] = round(
                (rating * reviews) / max(price, 1), 2
            )
        except (ValueError, TypeError):
            p["value_score"] = 0

    products.sort(key=lambda x: x["value_score"], reverse=True)
    return {"comparison": products, "winner": products[0]["asin"]}

result = compare_products([
    "B0CHX3QBCH", "B0BSHF7WHW", "B09V3KXJPB"
])
print(json.dumps(result, indent=2))

3. AI Agent Shopping Assistant

Give an AI agent the ability to search Amazon and recommend products — a core building block for e-commerce AI assistants.

# agent_shopping.py — LangChain tool for Amazon search
from langchain.tools import tool
import requests

MANTIS_KEY = "YOUR_API_KEY"

@tool
def search_amazon(query: str) -> str:
    """Search Amazon for products and return top results
    with prices, ratings, and links."""
    resp = requests.post(
        "https://api.mantisapi.com/v1/scrape",
        headers={
            "Authorization": f"Bearer {MANTIS_KEY}",
            "Content-Type": "application/json",
        },
        json={
            "url": (
                f"https://www.amazon.com/s?k="
                f"{query.replace(' ', '+')}"
            ),
            "extract": {
                "products": (
                    "array of top 5 products with: "
                    "title, price, rating, review_count, url, asin"
                ),
            },
            "render_js": True,
        },
    )
    data = resp.json()
    products = data.get("products", [])

    if not products:
        return f"No products found for '{query}'"

    result = f"Top Amazon results for '{query}':\n\n"
    for i, p in enumerate(products, 1):
        result += (
            f"{i}. {p.get('title', 'N/A')}\n"
            f"   Price: {p.get('price', 'N/A')} | "
            f"Rating: {p.get('rating', 'N/A')} "
            f"({p.get('review_count', '0')} reviews)\n"
            f"   https://www.amazon.com/dp/"
            f"{p.get('asin', '')}\n\n"
        )
    return result

# Use in a LangChain agent
# agent = create_agent(tools=[search_amazon], ...)

Legal Considerations

Amazon scraping exists in a legal gray area. Key precedents and considerations:

Van Buren v. United States (2021) — The Supreme Court narrowed the CFAA, ruling that accessing publicly available data (even against ToS) isn't "exceeding authorized access" under federal law
hiQ Labs v. LinkedIn (2022) — The Ninth Circuit ruled that scraping publicly accessible data doesn't violate the CFAA, strengthening the case for scraping public product listings
Amazon's Terms of Service — Explicitly prohibit scraping and automated access. Violating ToS is a contract issue, not criminal, but Amazon can pursue civil action
Robots.txt — Amazon's robots.txt disallows many paths. While not legally binding, respecting it demonstrates good faith
GDPR/CCPA — Product data is generally not personal data, but review scraping may involve personal information (reviewer names, profiles)
Rate & Volume — Excessive scraping that degrades service could be considered tortious interference or trespass to chattels

⚖️ Best Practices for Legal Safety

Only scrape publicly available data. Respect rate limits. Don't circumvent explicit access controls. Don't scrape personal data. Consult legal counsel for commercial use cases. Consider using Amazon's official PA-API for basic product data, and a web scraping API for data PA-API doesn't cover.

Production-Ready Amazon Scraping

Stop fighting proxies, CAPTCHAs, and broken selectors. Mantis extracts structured Amazon data with a single API call.

View Pricing Get Started Free

Frequently Asked Questions

Is it legal to scrape Amazon product data?

Scraping publicly available Amazon product pages is in a legal gray area. The Van Buren v. United States (2021) decision narrowed the CFAA, and hiQ v. LinkedIn affirmed that scraping public data isn't a federal crime. However, Amazon's ToS prohibit automated access. For commercial use, consider an API-based approach.

How do I scrape Amazon without getting blocked?

Use rotating residential proxies, randomize delays (3-15 seconds), rotate User-Agent strings, handle CAPTCHAs, and use headless browsers with stealth plugins. Or use a web scraping API like Mantis that handles all anti-blocking automatically.

What Python library is best for scraping Amazon?

For prototyping: requests + BeautifulSoup. For JS-rendered content: Playwright with stealth plugins. For production: a web scraping API that maintains selectors and handles anti-detection.

Can I use Amazon's official API instead of scraping?

Yes — the PA-API 5.0 provides basic product data (prices, images, ratings) but requires an Associates account with qualifying sales and doesn't include full reviews, Q&A, or seller details.

What data can I extract from Amazon product pages?

Product title, price, rating, review count, images, bullet features, description, ASIN, BSR, category, seller info, availability, Q&A content, and individual reviews with ratings.

How many Amazon pages can I scrape per day?

Without proxies: 20-50 before blocking. With rotating residential proxies: 5,000-10,000. With Mantis API: up to 100,000/month on the Scale plan.

How to Scrape Amazon Product Data in 2026

📑 Table of Contents

Why Scrape Amazon Product Data?

What Data Can You Extract?

Method 1: Python + BeautifulSoup

Install Dependencies

Basic Product Scraper

Scraping Search Results

Method 2: Playwright (Headless Browser)

Install

Full-Render Amazon Scraper

Method 3: Node.js + Cheerio

Install

Product Scraper

Method 4: Web Scraping API (Easiest)

Using the Mantis API

Skip the Proxy Headaches

Node.js with Mantis

Beating Amazon's Anti-Bot Detection

Amazon's Defense Layers

Essential Anti-Detection Techniques

Amazon PA-API vs Scraping

Method Comparison

Real-World Use Cases

1. Price Tracker Bot

2. Product Comparison Engine

3. AI Agent Shopping Assistant

Legal Considerations

Production-Ready Amazon Scraping

Frequently Asked Questions

Is it legal to scrape Amazon product data?

How do I scrape Amazon without getting blocked?

What Python library is best for scraping Amazon?

Can I use Amazon's official API instead of scraping?

What data can I extract from Amazon product pages?

How many Amazon pages can I scrape per day?

Related Guides