Web Scraping with JavaScript and Node.js: The Complete Guide for 2026

{{DATE_DISPLAY}} Web Scraping

Web Scraping with JavaScript and Node.js: The Complete Guide for 2026

JavaScript isn't just for building websites anymore. It's one of the most popular languages for web scraping — and with Node.js, you can build scrapers that handle everything from static HTML to JavaScript-heavy single-page apps.

But scraping in 2026 is harder than it used to be. Anti-bot systems, CAPTCHAs, and dynamic rendering make DIY scraping a constant battle. This guide covers every approach — from simple HTML parsing to headless browsers to API-based scraping — so you can pick the right tool for your project.

Why JavaScript for Web Scraping?

Same language as the web. You're scraping websites built with JavaScript, using JavaScript. DOM manipulation feels natural.
Massive ecosystem. npm has libraries for everything: HTTP clients, HTML parsers, headless browsers, proxies.
Async by default. Node.js handles concurrent requests effortlessly with async/await.
Full-stack capability. Build your scraper and your data pipeline in the same language.

The 4 Approaches to Web Scraping in JavaScript

Approach	Best For	Handles JS?	Speed	Complexity
Cheerio + Axios	Static HTML pages	❌	⚡ Fast	Low
Puppeteer	Chrome-rendered pages	✅	🐢 Slow	Medium
Playwright	Cross-browser, complex SPAs	✅	🐢 Slow	Medium
WebPerception API	Production scraping at scale	✅	⚡ Fast	Very Low

Approach 1: Cheerio + Axios (Static Pages)

The lightest option. Fetch raw HTML and parse it with jQuery-like syntax.

Setup

npm install axios cheerio

Example: Scrape Product Listings

const axios = require('axios');
const cheerio = require('cheerio');

async function scrapeProducts(url) {
  const { data } = await axios.get(url);
  const $ = cheerio.load(data);

  const products = [];
  $('.product-card').each((i, el) => {
    products.push({
      name: $(el).find('.title').text().trim(),
      price: $(el).find('.price').text().trim(),
      url: $(el).find('a').attr('href'),
    });
  });

  return products;
}

scrapeProducts('https://example.com/products')
  .then(console.log);

When Cheerio Works

Server-rendered HTML (most blogs, news sites, e-commerce)
Pages where the data is in the initial HTML response
High-speed scraping where you need thousands of pages fast

When Cheerio Fails

Single-page apps (React, Vue, Angular) — the HTML is empty until JavaScript runs
Pages behind login walls that require cookie management
Sites with anti-bot protection (Cloudflare, DataDome)

Approach 2: Puppeteer (Headless Chrome)

When pages need JavaScript to render, you need a real browser. Puppeteer controls Chrome programmatically.

Setup

npm install puppeteer

Example: Scrape a JavaScript-Rendered Page

const puppeteer = require('puppeteer');

async function scrapeSPA(url) {
  const browser = await puppeteer.launch({ headless: 'new' });
  const page = await browser.newPage();

  await page.goto(url, { waitUntil: 'networkidle2' });

  const data = await page.evaluate(() => {
    const items = document.querySelectorAll('.listing-item');
    return Array.from(items).map(item => ({
      title: item.querySelector('h2')?.textContent?.trim(),
      price: item.querySelector('.price')?.textContent?.trim(),
      description: item.querySelector('.desc')?.textContent?.trim(),
    }));
  });

  await browser.close();
  return data;
}

Handling Infinite Scroll

async function scrapeInfiniteScroll(url) {
  const browser = await puppeteer.launch({ headless: 'new' });
  const page = await browser.newPage();
  await page.goto(url, { waitUntil: 'networkidle2' });

  let previousHeight;
  while (true) {
    previousHeight = await page.evaluate('document.body.scrollHeight');
    await page.evaluate('window.scrollTo(0, document.body.scrollHeight)');
    await new Promise(r => setTimeout(r, 2000));
    const newHeight = await page.evaluate('document.body.scrollHeight');
    if (newHeight === previousHeight) break;
  }

  const data = await page.evaluate(() => {
    // Extract all loaded items
    return Array.from(document.querySelectorAll('.item'))
      .map(el => el.textContent.trim());
  });

  await browser.close();
  return data;
}

Puppeteer Challenges

Memory hungry. Each browser instance uses 100-300MB RAM.
Slow. Launching Chrome, loading pages, waiting for JS — it adds up.
Detection. Many sites detect Puppeteer via navigator.webdriver and other signals.
Infrastructure. Running headless Chrome in production requires careful resource management.

Approach 3: Playwright (Cross-Browser)

Playwright is Microsoft's answer to Puppeteer. It supports Chrome, Firefox, and WebKit, with a more modern API.

Setup

npm install playwright

Example: Scrape with Auto-Waiting

const { chromium } = require('playwright');

async function scrapeWithPlaywright(url) {
  const browser = await chromium.launch();
  const page = await browser.newPage();

  await page.goto(url);
  await page.waitForSelector('.results-loaded');

  const results = await page.$$eval('.result-card', cards =>
    cards.map(card => ({
      title: card.querySelector('h3')?.textContent?.trim(),
      link: card.querySelector('a')?.href,
      snippet: card.querySelector('.snippet')?.textContent?.trim(),
    }))
  );

  await browser.close();
  return results;
}

Playwright vs Puppeteer

Feature	Puppeteer	Playwright
Browsers	Chrome only	Chrome, Firefox, WebKit
Auto-waiting	Manual	Built-in
API design	Older, callback-style	Modern, promise-based
Parallelism	Page-level	Context-level (lighter)
Maintained by	Google	Microsoft

Both have the same fundamental limitations: they're slow, resource-heavy, and detectable.

Approach 4: WebPerception API (Production-Grade)

All three approaches above share the same problems at scale:

Anti-bot arms race. You're constantly updating your scraper to bypass new protections.
Infrastructure costs. Running headless browsers in production is expensive.
Maintenance burden. Selectors break when sites change. Someone has to fix them.

The WebPerception API eliminates all of this. One API call replaces your entire scraping infrastructure.

Setup

npm install node-fetch  # or use built-in fetch in Node 18+

Example: Scrape Any Page

const response = await fetch('https://api.mantisapi.com/v1/scrape', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer YOUR_API_KEY',
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    url: 'https://example.com/products',
    render_js: true,
  }),
});

const { content, metadata } = await response.json();
// content = fully rendered HTML, ready for parsing

Example: AI-Powered Data Extraction

Skip the selectors entirely. Tell the API what you want in plain English:

const response = await fetch('https://api.mantisapi.com/v1/extract', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer YOUR_API_KEY',
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    url: 'https://example.com/products',
    prompt: 'Extract all products with name, price, rating, and availability',
    schema: {
      type: 'array',
      items: {
        type: 'object',
        properties: {
          name: { type: 'string' },
          price: { type: 'number' },
          rating: { type: 'number' },
          availability: { type: 'string' },
        },
      },
    },
  }),
});

const { data } = await response.json();
// data = structured JSON matching your schema

Example: Take Screenshots

const response = await fetch('https://api.mantisapi.com/v1/screenshot', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer YOUR_API_KEY',
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    url: 'https://example.com',
    full_page: true,
    format: 'png',
  }),
});

const screenshot = await response.buffer();

Why Use an API?

DIY Scraping	WebPerception API
Manage browser infrastructure	One HTTP call
Fight anti-bot systems	Handled automatically
Fix broken selectors	AI extracts data by intent
Scale = more servers	Scale = more API calls
Hours of maintenance/week	Zero maintenance

Pricing

Free: 100 calls/month (perfect for testing)
Starter: $29/month — 5,000 calls
Pro: $99/month — 25,000 calls
Scale: $299/month — 100,000 calls

Start free at mantisapi.com.

Common Patterns

Handling Pagination

// DIY with Cheerio
async function scrapeAllPages(baseUrl) {
  let page = 1;
  let allResults = [];

  while (true) {
    const { data } = await axios.get(`${baseUrl}?page=${page}`);
    const $ = cheerio.load(data);
    const items = $('.item').map((i, el) => $(el).text().trim()).get();

    if (items.length === 0) break;
    allResults.push(...items);
    page++;
  }

  return allResults;
}

// With WebPerception API — just pass each URL
async function scrapeAllPagesAPI(baseUrl, totalPages) {
  const results = await Promise.all(
    Array.from({ length: totalPages }, (_, i) =>
      fetch('https://api.mantisapi.com/v1/extract', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer YOUR_API_KEY',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          url: `${baseUrl}?page=${i + 1}`,
          prompt: 'Extract all product listings',
        }),
      }).then(r => r.json())
    )
  );

  return results.flatMap(r => r.data);
}

Rate Limiting

function rateLimit(fn, delayMs) {
  let lastCall = 0;
  return async (...args) => {
    const now = Date.now();
    const wait = Math.max(0, delayMs - (now - lastCall));
    await new Promise(r => setTimeout(r, wait));
    lastCall = Date.now();
    return fn(...args);
  };
}

const scrapePage = rateLimit(async (url) => {
  // your scraping logic
}, 1000); // 1 request per second

Error Handling & Retries

async function scrapeWithRetry(url, maxRetries = 3) {
  for (let attempt = 1; attempt <= maxRetries; attempt++) {
    try {
      const response = await fetch('https://api.mantisapi.com/v1/scrape', {
        method: 'POST',
        headers: {
          'Authorization': 'Bearer YOUR_API_KEY',
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({ url, render_js: true }),
      });

      if (!response.ok) throw new Error(`HTTP ${response.status}`);
      return await response.json();
    } catch (err) {
      if (attempt === maxRetries) throw err;
      await new Promise(r => setTimeout(r, 1000 * attempt)); // exponential backoff
    }
  }
}

Which Approach Should You Use?

Choose Cheerio + Axios if: - You're scraping static HTML pages - Speed matters more than complexity - You're comfortable writing CSS selectors

Choose Puppeteer/Playwright if: - Pages require JavaScript rendering - You need to interact with the page (click, scroll, type) - You're scraping a small number of pages and can manage the infrastructure

Choose WebPerception API if: - You're building a production application - You don't want to manage browser infrastructure - You need to handle anti-bot protection automatically - You want AI-powered data extraction instead of brittle selectors - You're building an AI agent that needs web perception

Building a Web Scraper for AI Agents

If you're building an AI agent that needs to read the web, WebPerception is the natural choice. Here's how to integrate it as a tool:

// LangChain-style tool definition
const webPerceptionTool = {
  name: 'web_perception',
  description: 'Fetch and extract structured data from any webpage',
  async execute({ url, query }) {
    const response = await fetch('https://api.mantisapi.com/v1/extract', {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${process.env.MANTIS_API_KEY}`,
        'Content-Type': 'application/json',
      },
      body: JSON.stringify({ url, prompt: query }),
    });
    return response.json();
  },
};

Your agent can now perceive any webpage — no browser infrastructure, no broken selectors, no anti-bot headaches.

Conclusion

JavaScript has excellent tools for web scraping, from lightweight HTML parsing with Cheerio to full browser automation with Puppeteer and Playwright. But in 2026, the smartest approach for production applications is to let an API handle the hard parts.

The WebPerception API gives you rendered HTML, AI-powered extraction, and screenshot capabilities — all without managing a single browser instance.

Get your free API key →

Building an AI agent? Read our guide on how to build your first AI agent and learn how WebPerception fits into the agent stack.

Ready to try Mantis?

100 free API calls/month. No credit card required.

Get Your API Key →