Web Scraping with JavaScript & Node.js in 2026: The Ultimate Guide

Q: What is the best Node.js library for web scraping?

For static HTML pages, use Cheerio with Axios — it's fast and lightweight. For JavaScript-rendered pages (SPAs, React, Angular), use Puppeteer or Playwright. For production workloads at scale, a web scraping API like Mantis handles rendering, proxies, and anti-bot bypass automatically.

Q: Is JavaScript better than Python for web scraping?

Python has more scraping libraries (BeautifulSoup, Scrapy, etc.) and a larger community. JavaScript/Node.js excels at browser automation since Puppeteer and Playwright were built for it. If you already write JavaScript, Node.js is excellent for scraping. If you need large-scale crawling frameworks, Python (Scrapy) has the edge.

Updated March 27, 2026 · 22 min read · By the Mantis Team

📑 Table of Contents

Why Node.js for Web Scraping?
The Node.js Web Scraping Stack
Quick Start: Your First Node.js Scraper
Parsing HTML with Cheerio
Browser Automation with Puppeteer
Modern Scraping with Playwright
Concurrent Scraping Patterns
Avoiding Blocks: Headers, Proxies & Stealth
When to Use a Web Scraping API Instead
Choosing the Right Tool
JavaScript vs Python for Scraping
FAQ

1. Why Node.js for Web Scraping?

JavaScript is the language of the web — and Node.js makes it the language of web scraping too. Here's why developers choose Node.js for scraping in 2026:

Async-first design — Node's event loop handles hundreds of concurrent HTTP requests without threads or complex async syntax
Browser automation was built here — Puppeteer and Playwright were created for the Node.js ecosystem first
Same language as the target — You're scraping websites built with JavaScript. Understanding the DOM, selectors, and browser APIs is natural
npm ecosystem — Over 2 million packages. Every scraping need has a library
Fast for I/O — V8 engine + non-blocking I/O makes Node.js blazing fast for network-heavy tasks like scraping
Full-stack synergy — If your app is JavaScript (React, Next.js, Express), your scrapers can share code and types

2. The Node.js Web Scraping Stack

Here's every major tool in the Node.js scraping ecosystem:

Tool	Type	Best For	JS Rendering	Guide
Cheerio	HTML parser	Fast HTML parsing (jQuery-style)	❌	Full guide →
Puppeteer	Browser automation	Headless Chrome, screenshots	✅	Full guide →
Playwright	Browser automation	Multi-browser, modern API	✅	Playwright guide →
Axios	HTTP client	Simple HTTP requests	❌	—
node-fetch	HTTP client	Fetch API for Node.js	❌	—
Got	HTTP client	Advanced HTTP (retries, streams)	❌	—
Crawlee	Framework	Large-scale crawling	✅ (via Puppeteer/Playwright)	—
Mantis API	Web scraping API	Production scraping, AI agents	✅	Full guide →

💡 Tip: The most common combo is Axios + Cheerio for static pages (like Python's Requests + BeautifulSoup), and Puppeteer or Playwright for JavaScript-rendered sites.

3. Quick Start: Your First Node.js Scraper

Let's build a working scraper in under 20 lines using Axios to fetch pages and Cheerio to parse HTML:

const axios = require('axios');
const cheerio = require('cheerio');

async function scrapeHN() {
  // 1. Fetch the page
  const { data } = await axios.get('https://news.ycombinator.com', {
    headers: { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36' }
  });

  // 2. Parse the HTML
  const $ = cheerio.load(data);

  // 3. Extract data
  $('.titleline > a').slice(0, 10).each((i, el) => {
    console.log($(el).text(), '→', $(el).attr('href'));
  });
}

scrapeHN();

npm install axios cheerio
node scraper.js

That's it — a working scraper in 15 lines. For the complete jQuery-style API, DOM traversal, table scraping, and pagination, see our complete Cheerio guide.

4. Parsing HTML with Cheerio

Cheerio is the Node.js equivalent of Python's BeautifulSoup. It implements a subset of jQuery for fast, memory-efficient HTML parsing — no browser needed:

const cheerio = require('cheerio');

const html = `
  <div class="products">
    <div class="product">
      <h2 class="name">Widget Pro</h2>
      <span class="price">$49.99</span>
      <a href="/products/widget-pro">Details</a>
    </div>
    <div class="product">
      <h2 class="name">Gadget Max</h2>
      <span class="price">$79.99</span>
      <a href="/products/gadget-max">Details</a>
    </div>
  </div>
`;

const $ = cheerio.load(html);

// CSS selectors — just like jQuery
$('.product').each((i, el) => {
  const name = $(el).find('.name').text();
  const price = $(el).find('.price').text();
  const url = $(el).find('a').attr('href');
  console.log({ name, price, url });
});

// DOM traversal
$('.product').first().next().find('.name').text(); // "Gadget Max"
$('.name').parent().attr('class'); // "product"

Cheerio is 10-20x faster than browser-based scraping because it only parses HTML — no DOM rendering, no JavaScript execution. Use it whenever the page content is in the raw HTML. See the complete Cheerio guide for tables, pagination, and production patterns.

5. Browser Automation with Puppeteer

When pages render content with JavaScript (React, Angular, Vue), you need a real browser. Puppeteer controls headless Chrome:

const puppeteer = require('puppeteer');

async function scrapeSPA() {
  const browser = await puppeteer.launch({ headless: 'new' });
  const page = await browser.newPage();

  // Set a realistic User-Agent
  await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36');

  await page.goto('https://example.com/spa-app', {
    waitUntil: 'networkidle2'
  });

  // Wait for dynamic content
  await page.waitForSelector('.product-card');

  // Extract data from the rendered page
  const products = await page.$$eval('.product-card', cards =>
    cards.map(card => ({
      name: card.querySelector('.name').textContent,
      price: card.querySelector('.price').textContent,
    }))
  );

  console.log(products);

  // Take a screenshot
  await page.screenshot({ path: 'products.png', fullPage: true });

  await browser.close();
}

scrapeSPA();

npm install puppeteer
node scraper.js

Puppeteer excels at screenshots, PDF generation, and form interaction. For stealth mode, proxy rotation, network interception, and concurrent scraping with puppeteer-cluster, see our complete Puppeteer guide.

6. Modern Scraping with Playwright

Playwright is the newer alternative to Puppeteer, created by the same team at Microsoft. It supports Chromium, Firefox, and WebKit, with a more modern API:

const { chromium } = require('playwright');

async function scrapeWithPlaywright() {
  const browser = await chromium.launch();
  const page = await browser.newPage();

  await page.goto('https://example.com/products');

  // Playwright auto-waits for elements
  const products = await page.locator('.product-card').all();

  for (const product of products) {
    const name = await product.locator('.name').textContent();
    const price = await product.locator('.price').textContent();
    console.log({ name, price });
  }

  // Intercept network requests
  await page.route('**/*.{png,jpg,gif}', route => route.abort());

  // Capture API responses
  page.on('response', async response => {
    if (response.url().includes('/api/products')) {
      const json = await response.json();
      console.log('API data:', json);
    }
  });

  await browser.close();
}

scrapeWithPlaywright();

npm install playwright
npx playwright install chromium

💡 Playwright vs Puppeteer: Playwright has auto-waiting (fewer flaky selectors), multi-browser support, and better network interception. Puppeteer has a larger ecosystem and community. For new projects in 2026, we recommend Playwright.

7. Concurrent Scraping Patterns

Node.js was built for concurrency. Here are the key patterns for scraping many pages at once:

Promise.all with Rate Limiting

const axios = require('axios');
const cheerio = require('cheerio');

// Process URLs in batches of N
async function scrapeBatch(urls, batchSize = 5) {
  const results = [];

  for (let i = 0; i < urls.length; i += batchSize) {
    const batch = urls.slice(i, i + batchSize);

    const batchResults = await Promise.all(
      batch.map(async (url) => {
        try {
          const { data } = await axios.get(url, {
            headers: { 'User-Agent': 'Mozilla/5.0' },
            timeout: 10000
          });
          const $ = cheerio.load(data);
          return { url, title: $('h1').text(), status: 'ok' };
        } catch (err) {
          return { url, error: err.message, status: 'error' };
        }
      })
    );

    results.push(...batchResults);

    // Rate limit: wait 1 second between batches
    if (i + batchSize < urls.length) {
      await new Promise(r => setTimeout(r, 1000));
    }
  }

  return results;
}

// Usage
const urls = Array.from({ length: 50 }, (_, i) => `https://example.com/page/${i + 1}`);
scrapeBatch(urls, 5).then(console.log);

Puppeteer Cluster for Browser-Based Concurrency

const { Cluster } = require('puppeteer-cluster');

async function scrapeWithCluster() {
  const cluster = await Cluster.launch({
    concurrency: Cluster.CONCURRENCY_CONTEXT,
    maxConcurrency: 4,
    puppeteerOptions: { headless: 'new' }
  });

  await cluster.task(async ({ page, data: url }) => {
    await page.goto(url, { waitUntil: 'networkidle2' });
    const title = await page.title();
    console.log(`${url} → ${title}`);
  });

  for (let i = 1; i <= 50; i++) {
    cluster.queue(`https://example.com/page/${i}`);
  }

  await cluster.idle();
  await cluster.close();
}

scrapeWithCluster();

Crawlee — The Scrapy of Node.js

For large-scale structured crawling, Crawlee (by Apify) is the most complete Node.js framework:

const { CheerioCrawler } = require('crawlee');

const crawler = new CheerioCrawler({
  maxConcurrency: 10,
  maxRequestsPerMinute: 60,

  async requestHandler({ $, request, enqueueLinks }) {
    const title = $('h1').text();
    const price = $('.price').text();
    console.log({ url: request.url, title, price });

    // Auto-discover and follow links
    await enqueueLinks({
      selector: 'a.next-page',
    });
  },
});

crawler.run(['https://example.com/products']);

Crawlee handles retries, request queues, data storage, proxy rotation, and both HTTP and browser-based crawling. It's what you reach for when a simple script isn't enough.

8. Avoiding Blocks: Headers, Proxies & Stealth

The same anti-bot systems that block Python scrapers block Node.js scrapers. Here's how to stay under the radar:

Essential Headers

const headers = {
  'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 Chrome/131.0.0.0 Safari/537.36',
  'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
  'Accept-Language': 'en-US,en;q=0.9',
  'Accept-Encoding': 'gzip, deflate, br',
  'Referer': 'https://www.google.com/',
  'Connection': 'keep-alive',
};

Puppeteer Stealth

const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');

puppeteer.use(StealthPlugin());

const browser = await puppeteer.launch({ headless: 'new' });
// Now passes most bot detection tests

Proxy Rotation

const proxies = [
  'http://user:pass@proxy1.example.com:8080',
  'http://user:pass@proxy2.example.com:8080',
  'http://user:pass@proxy3.example.com:8080',
];

// With Axios
const HttpsProxyAgent = require('https-proxy-agent');
const proxy = proxies[Math.floor(Math.random() * proxies.length)];
const { data } = await axios.get(url, {
  httpsAgent: new HttpsProxyAgent(proxy)
});

// With Puppeteer
const browser = await puppeteer.launch({
  args: [`--proxy-server=${proxy}`]
});

For a comprehensive deep dive into anti-blocking for all languages, see our guide to scraping without getting blocked.

🛡️ Tired of Fighting Anti-Bot Systems?

Mantis handles proxy rotation, JavaScript rendering, and anti-blocking automatically. One API call, clean data back.

Try Mantis Free — 100 Calls/Month →

9. When to Use a Web Scraping API Instead

Building and maintaining scraping infrastructure is expensive. Here's the real cost:

Component	DIY Cost (Monthly)	Mantis API
Proxy rotation	$50–500	✅ Included
Headless browsers	$100–300	✅ Included
CAPTCHA solving	$50–200	✅ Included
Anti-bot bypass	Engineering time	✅ Included
Maintenance	Ongoing dev hours	✅ Managed
Total	$200–1,000+	From $29/mo

Use a web scraping API when:

You need production-grade reliability (99.9% uptime)
Sites use Cloudflare, DataDome, or PerimeterX
You're building AI agents that need live web data
You don't want to manage proxy infrastructure
You need screenshots or AI-powered data extraction

Mantis API with Node.js

const axios = require('axios');

const response = await axios.post('https://api.mantisapi.com/v1/scrape', {
  url: 'https://example.com/products',
  render_js: true,
  extract: {
    products: '.product-card',
    fields: {
      name: '.name',
      price: '.price'
    }
  }
}, {
  headers: { 'Authorization': `Bearer ${API_KEY}` }
});

console.log(response.data.products);
// [{ name: "Widget Pro", price: "$49.99" }, ...]

One API call replaces Puppeteer + proxies + stealth plugins + error handling. See our API comparison guide for details.

10. Choosing the Right Tool

Decision flowchart:

📌 Does the page need JavaScript to render?
├─ No → Is it a small project?
│    ├─ Yes → Axios + Cheerio (guide)
│    └─ No → Need structured crawling?
│         ├─ Yes → Crawlee (CheerioCrawler)
│         └─ No → Axios + Cheerio with batching
└─ Yes → Is it for production / at scale?
     ├─ Yes → Mantis API (pricing)
     └─ No → Playwright or Puppeteer (guide)

Quick Comparison

Criteria	Axios + Cheerio	Puppeteer	Playwright	Crawlee	Mantis API
Learning curve	⭐ Easy	⭐⭐ Medium	⭐⭐ Medium	⭐⭐ Medium	⭐ Easy
Speed	Very fast	Slow	Slow	Fast	Fast
JS rendering	❌	✅	✅	✅ (plugin)	✅
Concurrency	Promise.all	puppeteer-cluster	Manual	Built-in	Built-in
Anti-bot bypass	Manual	Stealth plugin	Manual	Built-in	Automatic
Best for	Quick scripts	Screenshots, PDFs	JS-heavy sites	Large crawls	Production / AI

11. JavaScript vs Python for Web Scraping

The eternal debate. Here's a fair comparison:

Factor	JavaScript / Node.js	Python
Browser automation	⭐⭐⭐ (Puppeteer/Playwright were built here)	⭐⭐ (good bindings)
HTML parsing	⭐⭐ Cheerio	⭐⭐⭐ BeautifulSoup, lxml
Crawling frameworks	⭐⭐ Crawlee	⭐⭐⭐ Scrapy (mature)
Async/concurrency	⭐⭐⭐ Native event loop	⭐⭐ asyncio (added later)
Data processing	⭐ Limited	⭐⭐⭐ pandas, NumPy
Community/tutorials	⭐⭐ Growing	⭐⭐⭐ Dominant
AI/agent integration	⭐⭐ Vercel AI SDK	⭐⭐⭐ LangChain, CrewAI

Choose JavaScript when: Your stack is already JS, you need browser automation, or you want native async concurrency.

Choose Python when: You need Scrapy-level crawling, data science integration, or access to the larger scraping community.

Choose Mantis API when: You don't want to worry about language-specific infrastructure at all.

See our Python scraping guide for the Python side of the comparison.

🚀 Need Data at Scale? Skip the Infrastructure.

Mantis WebPerception API: scraping, screenshots, and AI extraction — one API call. Works with any language.

Start Free →

12. Frequently Asked Questions

Can you web scrape with JavaScript?

Yes. Node.js is one of the best platforms for web scraping. Cheerio parses HTML fast, while Puppeteer and Playwright automate full browsers for JavaScript-rendered pages. Node's async-first design makes it naturally suited for concurrent scraping.

What is the best Node.js library for web scraping?

For static HTML pages, use Cheerio with Axios — it's fast and lightweight. For JavaScript-rendered pages (SPAs, React, Angular), use Puppeteer or Playwright. For production workloads at scale, a web scraping API like Mantis handles everything automatically.

Is Puppeteer or Playwright better for web scraping?

Playwright is generally better for scraping in 2026. It supports Chromium, Firefox, and WebKit, has built-in auto-waiting, better network interception, and more reliable selectors. Puppeteer is Chromium-only but has a larger community. For new projects, we recommend Playwright.

Is JavaScript better than Python for web scraping?

Python has more scraping libraries and a larger community. JavaScript/Node.js excels at browser automation since Puppeteer and Playwright were built for it. If you already write JavaScript, Node.js is excellent. If you need large-scale crawling frameworks, Python (Scrapy) has the edge.

How do I scrape a JavaScript-rendered website with Node.js?

Use Puppeteer or Playwright to launch a headless browser, navigate to the page, wait for content to render, then extract the data. You can also feed the rendered HTML into Cheerio for fast parsing. Alternatively, use a web scraping API like Mantis that handles JavaScript rendering server-side.

How much does web scraping with Node.js cost?

Node.js libraries are free, but production scraping has hidden costs: proxy services ($50–500/month), headless browser servers ($100–300/month), CAPTCHA solving ($1–3 per 1,000), and maintenance time. A web scraping API like Mantis starts free (100 calls/month) with paid plans from $29/month for 5,000 calls.