What is Cheerio used for in Node.js?

Cheerio is a fast, lightweight Node.js library for parsing and manipulating HTML and XML. It implements a subset of jQuery's API, making it intuitive for web developers. It's commonly used for web scraping to extract data like product listings, article text, prices, and structured data from HTML pages without running a full browser.

Is Cheerio better than Puppeteer for web scraping?

Cheerio and Puppeteer serve different purposes. Cheerio is a fast HTML parser — it processes static HTML but cannot execute JavaScript. Puppeteer controls a real Chrome browser and can handle JavaScript-rendered pages, SPAs, and interactions. Use Cheerio for static HTML pages (10-100x faster). Use Puppeteer when pages require JavaScript to render content. For production at scale, a web scraping API like Mantis handles both cases automatically.

Can Cheerio handle JavaScript-rendered pages?

No. Cheerio only parses static HTML — it does not execute JavaScript. For SPAs, React, Angular, or Vue apps, you need a headless browser like Puppeteer or Playwright to render the page first, then optionally pass the rendered HTML to Cheerio for parsing. Alternatively, a web scraping API like Mantis handles JavaScript rendering automatically.

How fast is Cheerio compared to other scraping tools?

Cheerio is extremely fast — it parses HTML in single-digit milliseconds because it doesn't run a browser. Benchmarks show Cheerio parsing at 10-100x the speed of Puppeteer or Playwright. The bottleneck in scraping is always the HTTP requests and anti-bot measures, not the parsing. Cheerio's speed makes it ideal for high-volume scraping of static pages.

Should I use Cheerio or JSDOM for web scraping?

Use Cheerio for web scraping. It's 8-10x faster than JSDOM because it doesn't create a full DOM environment. JSDOM is better when you need to execute JavaScript or test browser APIs. For pure HTML parsing and data extraction, Cheerio's jQuery-style API is faster, lighter, and more intuitive.

Is web scraping with Node.js and Cheerio legal?

Web scraping publicly available data is generally legal in the US (see hiQ v. LinkedIn, 2022). However, you should respect robots.txt, avoid scraping personal data without consent (GDPR/CCPA), don't violate Terms of Service you've agreed to, and don't overload servers. Always consult legal counsel for your specific use case.

Web Scraping with Cheerio and Node.js in 2026: The Complete Guide

Published March 16, 2026 · 20 min read · Updated for Cheerio 1.0+

Cheerio is the go-to HTML parsing library for Node.js developers — fast, lightweight, and built around the jQuery API you already know. If you're coming from a JavaScript background, Cheerio is the most natural way to scrape and extract data from web pages.

Think of Cheerio as jQuery for the server. It parses HTML into a traversable data structure and gives you a familiar $() API to select elements, extract text, and manipulate the DOM — all without running a browser. This guide takes you from zero to production-ready scraper.

Installation & Setup
Your First Cheerio Scraper
CSS Selectors: The Core of Cheerio
DOM Traversal: parent, children, siblings
Extracting Text, Attributes, and HTML
Scraping Tables and Lists
Handling Pagination
Concurrent Scraping with Promise.all
Error Handling and Retries
Production-Ready Scraper Class
When Pages Need JavaScript
Cheerio vs Puppeteer vs Playwright vs API
The API Shortcut: Skip the Parsing
FAQ

1. Installation & Setup

Initialize a Node.js project and install Cheerio with a modern HTTP client:

mkdir my-scraper && cd my-scraper
npm init -y
npm install cheerio axios

We'll use axios for HTTP requests and cheerio for HTML parsing. You can also use node-fetch or the built-in fetch (Node 18+).

Verify your installation:

import * as cheerio from 'cheerio';

const html = '<h1>Hello, Cheerio!</h1>';
const $ = cheerio.load(html);
console.log($('h1').text()); // "Hello, Cheerio!"

💡 ESM vs CommonJS: Cheerio 1.0+ is ESM-first. Use import syntax or add "type": "module" to your package.json. For CommonJS, use const cheerio = require('cheerio') with Cheerio 0.22.x or dynamic import.

2. Your First Cheerio Scraper

Let's scrape article titles from Hacker News — a classic first target:

import * as cheerio from 'cheerio';
import axios from 'axios';

async function scrapeHackerNews() {
  // 1. Fetch the HTML
  const { data: html } = await axios.get('https://news.ycombinator.com', {
    headers: {
      'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
    }
  });

  // 2. Load into Cheerio
  const $ = cheerio.load(html);

  // 3. Extract data
  const stories = [];
  $('.titleline > a').each((i, el) => {
    stories.push({
      rank: i + 1,
      title: $(el).text(),
      url: $(el).attr('href')
    });
  });

  console.log(`Found ${stories.length} stories`);
  stories.slice(0, 5).forEach(s =>
    console.log(`${s.rank}. ${s.title}`)
  );

  return stories;
}

scrapeHackerNews();

That's the core pattern: fetch HTML → load into Cheerio → select elements → extract data. Everything else builds on this.

3. CSS Selectors: The Core of Cheerio

Cheerio supports all standard CSS selectors — the same ones you use in browser DevTools and jQuery:

Basic Selectors

// By tag
$('h1')                    // All h1 elements
$('p')                     // All paragraphs

// By class
$('.product-card')         // Elements with class "product-card"
$('.price.sale')           // Elements with BOTH classes

// By ID
$('#main-content')         // Element with id "main-content"

// By attribute
$('a[href]')               // All links with href attribute
$('img[alt="logo"]')       // Images with alt="logo"
$('a[href^="https"]')      // Links starting with "https"
$('a[href$=".pdf"]')       // Links ending with ".pdf"
$('a[href*="mantis"]')     // Links containing "mantis"

Combinators

// Descendant (any depth)
$('div.products .price')   // .price anywhere inside div.products

// Direct child
$('ul > li')               // Only direct li children of ul

// Adjacent sibling
$('h2 + p')                // First p immediately after h2

// General sibling
$('h2 ~ p')                // All p siblings after h2

Pseudo-selectors

$('tr:first-child')        // First tr in each group
$('tr:last-child')         // Last tr
$('li:nth-child(2)')       // Second li in each group
$('li:nth-child(odd)')     // Odd-numbered list items
$('td:not(.hidden)')       // td without class "hidden"
$('p:contains("price")')   // p elements containing "price" text

💡 Pro tip: Use your browser's DevTools to test selectors. Right-click an element → Inspect → in the Console, run document.querySelectorAll('.your-selector') to verify before coding.

4. DOM Traversal: parent, children, siblings

Sometimes CSS selectors alone aren't enough. Cheerio gives you jQuery-style traversal methods:

// Parent & ancestors
$('.price').parent()                  // Direct parent
$('.price').closest('.product-card')  // Nearest ancestor matching selector
$('.price').parents('div')            // All div ancestors

// Children
$('.product-card').children()         // All direct children
$('.product-card').children('.title') // Direct children matching selector
$('.product-card').find('.price')     // Descendants (any depth) matching selector

// Siblings
$('.active').next()                   // Next sibling
$('.active').prev()                   // Previous sibling
$('.active').nextAll('li')            // All following li siblings
$('.active').siblings()               // All siblings

Chaining Methods

// Chain traversals like jQuery
const prices = $('table.products')
  .find('tbody tr')
  .not('.out-of-stock')
  .find('td.price')
  .map((i, el) => $(el).text().trim())
  .get(); // .get() converts Cheerio object to regular array

5. Extracting Text, Attributes, and HTML

Text Extraction

// Get text content
$('h1').text()                        // Text of first match
$('.description').text()              // All text, concatenated
$('.description').first().text()      // Explicitly first element

// Trim whitespace (common need)
$('.price').text().trim()

// Get text from multiple elements
const titles = $('h2').map((i, el) => $(el).text().trim()).get();
console.log(titles); // ['Title 1', 'Title 2', ...]

Attribute Extraction

// Get attributes
$('a').attr('href')                   // href of first link
$('img').attr('src')                  // src of first image
$('img').attr('alt')                  // alt text
$('input').attr('value')              // input value

// Get all links
const links = $('a[href]').map((i, el) => ({
  text: $(el).text().trim(),
  url: $(el).attr('href')
})).get();

// Get data attributes
$('.product').attr('data-id')
$('.product').data('id')              // Shorthand for data-* attributes

HTML Extraction

// Get inner HTML
$('.content').html()                  // Inner HTML of first match

// Get outer HTML (the element itself + its content)
$.html($('.content'))                 // Outer HTML

// Get all HTML
$.html()                              // Entire parsed document

6. Scraping Tables and Lists

Tables are one of the most common scraping targets. Here's how to extract tabular data:

function scrapeTable($, tableSelector) {
  const headers = [];
  const rows = [];

  // Extract headers
  $(`${tableSelector} thead th`).each((i, el) => {
    headers.push($(el).text().trim());
  });

  // Extract rows
  $(`${tableSelector} tbody tr`).each((i, tr) => {
    const row = {};
    $(tr).find('td').each((j, td) => {
      row[headers[j] || `col_${j}`] = $(td).text().trim();
    });
    rows.push(row);
  });

  return { headers, rows };
}

// Usage
const { headers, rows } = scrapeTable($, 'table.pricing');
console.log(headers); // ['Plan', 'Price', 'API Calls']
console.log(rows);    // [{Plan: 'Free', Price: '$0', ...}, ...]

Scraping Nested Lists

function scrapeList($, selector) {
  return $(selector).map((i, el) => {
    const $el = $(el);
    const children = $el.children('ul, ol');

    if (children.length) {
      return {
        text: $el.contents().first().text().trim(),
        children: scrapeList($, `${selector}:eq(${i}) > ul > li, ${selector}:eq(${i}) > ol > li`)
      };
    }
    return { text: $el.text().trim() };
  }).get();
}

7. Handling Pagination

Most websites spread data across multiple pages. Here's how to handle common pagination patterns:

Next-Page Links

async function scrapeAllPages(startUrl) {
  const allItems = [];
  let url = startUrl;

  while (url) {
    console.log(`Scraping: ${url}`);
    const { data: html } = await axios.get(url, {
      headers: { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)' }
    });
    const $ = cheerio.load(html);

    // Extract items from current page
    $('.product-card').each((i, el) => {
      allItems.push({
        title: $(el).find('.title').text().trim(),
        price: $(el).find('.price').text().trim(),
        url: $(el).find('a').attr('href')
      });
    });

    // Find next page link
    const nextLink = $('a.next-page').attr('href');
    url = nextLink ? new URL(nextLink, url).href : null;

    // Be polite — wait between requests
    if (url) await sleep(1000);
  }

  console.log(`Total items: ${allItems.length}`);
  return allItems;
}

const sleep = (ms) => new Promise(r => setTimeout(r, ms));

Page Number Pagination

async function scrapePages(baseUrl, totalPages) {
  const allItems = [];

  for (let page = 1; page <= totalPages; page++) {
    const url = `${baseUrl}?page=${page}`;
    console.log(`Page ${page}/${totalPages}: ${url}`);

    const { data: html } = await axios.get(url);
    const $ = cheerio.load(html);

    $('.item').each((i, el) => {
      allItems.push({
        name: $(el).find('.name').text().trim(),
        price: $(el).find('.price').text().trim()
      });
    });

    await sleep(1000 + Math.random() * 1000); // Random delay
  }

  return allItems;
}

8. Concurrent Scraping with Promise.all

Node.js excels at concurrent I/O. Scrape multiple pages simultaneously while respecting rate limits:

async function scrapeConcurrent(urls, concurrency = 5) {
  const results = [];

  // Process in batches
  for (let i = 0; i < urls.length; i += concurrency) {
    const batch = urls.slice(i, i + concurrency);
    console.log(`Batch ${Math.floor(i / concurrency) + 1}: ${batch.length} URLs`);

    const batchResults = await Promise.allSettled(
      batch.map(async (url) => {
        const { data: html } = await axios.get(url, {
          headers: { 'User-Agent': 'Mozilla/5.0' },
          timeout: 10000
        });
        const $ = cheerio.load(html);
        return {
          url,
          title: $('h1').text().trim(),
          description: $('meta[name="description"]').attr('content') || ''
        };
      })
    );

    // Collect successful results
    for (const result of batchResults) {
      if (result.status === 'fulfilled') {
        results.push(result.value);
      } else {
        console.error(`Failed: ${result.reason.message}`);
      }
    }

    // Delay between batches
    if (i + concurrency < urls.length) {
      await sleep(2000);
    }
  }

  return results;
}

// Usage
const urls = Array.from({ length: 50 }, (_, i) =>
  `https://example.com/products?page=${i + 1}`
);
const data = await scrapeConcurrent(urls, 5);

⚠️ Be respectful: Don't blast servers with hundreds of concurrent requests. Use a concurrency limit (3-10), add delays between batches, and check robots.txt. Getting your IP blocked helps nobody.

9. Error Handling and Retries

Production scrapers need robust error handling. Network failures, rate limits, and changed HTML are inevitable:

async function fetchWithRetry(url, options = {}, maxRetries = 3) {
  const { delay = 1000, backoffMultiplier = 2 } = options;

  for (let attempt = 1; attempt <= maxRetries; attempt++) {
    try {
      const response = await axios.get(url, {
        headers: {
          'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
          'Accept': 'text/html,application/xhtml+xml',
          'Accept-Language': 'en-US,en;q=0.9'
        },
        timeout: 15000,
        validateStatus: (status) => status < 500 // Don't retry on 4xx
      });

      if (response.status === 429) {
        // Rate limited — back off significantly
        const retryAfter = parseInt(response.headers['retry-after'] || '60');
        console.log(`Rate limited. Waiting ${retryAfter}s...`);
        await sleep(retryAfter * 1000);
        continue;
      }

      return response;
    } catch (error) {
      console.error(`Attempt ${attempt}/${maxRetries} failed: ${error.message}`);
      if (attempt === maxRetries) throw error;
      await sleep(delay * Math.pow(backoffMultiplier, attempt - 1));
    }
  }
}

// Usage
const response = await fetchWithRetry('https://example.com/data');
const $ = cheerio.load(response.data);

10. Production-Ready Scraper Class

Here's a complete, reusable scraper class with logging, retries, concurrency control, and data export:

import * as cheerio from 'cheerio';
import axios from 'axios';
import { writeFile } from 'fs/promises';

class CheerioScraper {
  constructor(options = {}) {
    this.concurrency = options.concurrency || 5;
    this.delay = options.delay || 1000;
    this.maxRetries = options.maxRetries || 3;
    this.timeout = options.timeout || 15000;
    this.results = [];
    this.errors = [];
    this.userAgents = [
      'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 Chrome/120.0.0.0',
      'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 Chrome/120.0.0.0',
      'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 Chrome/120.0.0.0',
      'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:121.0) Gecko/20100101 Firefox/121.0'
    ];
  }

  randomUA() {
    return this.userAgents[Math.floor(Math.random() * this.userAgents.length)];
  }

  async fetch(url) {
    for (let attempt = 1; attempt <= this.maxRetries; attempt++) {
      try {
        const { data } = await axios.get(url, {
          headers: {
            'User-Agent': this.randomUA(),
            'Accept': 'text/html,application/xhtml+xml',
            'Accept-Language': 'en-US,en;q=0.9'
          },
          timeout: this.timeout
        });
        return cheerio.load(data);
      } catch (err) {
        if (attempt === this.maxRetries) {
          this.errors.push({ url, error: err.message });
          return null;
        }
        await this.sleep(this.delay * Math.pow(2, attempt - 1));
      }
    }
  }

  async scrapeUrls(urls, extractor) {
    for (let i = 0; i < urls.length; i += this.concurrency) {
      const batch = urls.slice(i, i + this.concurrency);
      const batchNum = Math.floor(i / this.concurrency) + 1;
      const totalBatches = Math.ceil(urls.length / this.concurrency);
      console.log(`Batch ${batchNum}/${totalBatches} (${batch.length} URLs)`);

      const promises = batch.map(async (url) => {
        const $ = await this.fetch(url);
        if ($) {
          const data = extractor($, url);
          if (data) this.results.push(data);
        }
      });
      await Promise.allSettled(promises);

      if (i + this.concurrency < urls.length) {
        await this.sleep(this.delay + Math.random() * 1000);
      }
    }

    console.log(`Done: ${this.results.length} results, ${this.errors.length} errors`);
    return this.results;
  }

  async exportJSON(filename) {
    await writeFile(filename, JSON.stringify(this.results, null, 2));
    console.log(`Exported ${this.results.length} items to ${filename}`);
  }

  async exportCSV(filename) {
    if (!this.results.length) return;
    const headers = Object.keys(this.results[0]);
    const csv = [
      headers.join(','),
      ...this.results.map(row =>
        headers.map(h => `"${String(row[h] || '').replace(/"/g, '""')}"`).join(',')
      )
    ].join('\n');
    await writeFile(filename, csv);
    console.log(`Exported ${this.results.length} items to ${filename}`);
  }

  sleep(ms) {
    return new Promise(r => setTimeout(r, ms));
  }
}

// Usage example: scrape product listings
const scraper = new CheerioScraper({ concurrency: 3, delay: 1500 });

const urls = Array.from({ length: 20 }, (_, i) =>
  `https://example.com/products?page=${i + 1}`
);

await scraper.scrapeUrls(urls, ($, url) => {
  const products = [];
  $('.product-card').each((i, el) => {
    products.push({
      name: $(el).find('.name').text().trim(),
      price: $(el).find('.price').text().trim(),
      rating: $(el).find('.rating').attr('data-score'),
      url: $(el).find('a').attr('href'),
      source: url
    });
  });
  return products;
});

await scraper.exportJSON('products.json');
await scraper.exportCSV('products.csv');

🚀 Need Data at Scale? Skip the Infrastructure

Building scrapers is fun — maintaining proxy rotation, handling CAPTCHAs, and managing rate limits is not. Mantis API handles all of that for you.

Get 100 Free API Calls →

11. When Pages Need JavaScript

Cheerio's biggest limitation: it cannot execute JavaScript. If a page loads content dynamically (React, Angular, Vue, infinite scroll), you won't see that content in Cheerio.

How to Detect JavaScript-Rendered Content

// Quick check: compare what Cheerio sees vs what browser sees
async function detectJSContent(url) {
  const { data: html } = await axios.get(url);
  const $ = cheerio.load(html);

  // If the body is nearly empty or has a root div with no content,
  // the page likely uses JavaScript rendering
  const bodyText = $('body').text().trim();
  const rootDiv = $('#root, #app, #__next').html();

  console.log(`Body text length: ${bodyText.length}`);
  console.log(`Root div content: ${rootDiv ? rootDiv.length : 'N/A'} chars`);

  if (bodyText.length < 100 || (rootDiv && rootDiv.length < 50)) {
    console.log('⚠️ This page likely requires JavaScript rendering');
    return true;
  }
  return false;
}

Option 1: Check for Hidden APIs

Many SPAs fetch data from JSON APIs. Intercept these in DevTools (Network tab → XHR/Fetch) and call them directly — much faster than browser automation:

// Instead of scraping the rendered page, call the API directly
const { data } = await axios.get('https://example.com/api/products', {
  headers: { 'Accept': 'application/json' },
  params: { page: 1, limit: 50 }
});

// data is already structured — no parsing needed!
console.log(data.products);

Option 2: Puppeteer + Cheerio Combo

When you must render JavaScript, use Puppeteer to render the page, then hand the HTML to Cheerio for fast parsing:

import puppeteer from 'puppeteer';
import * as cheerio from 'cheerio';

async function scrapeJSPage(url) {
  const browser = await puppeteer.launch({ headless: 'new' });
  const page = await browser.newPage();
  await page.goto(url, { waitUntil: 'networkidle2' });

  // Get rendered HTML and parse with Cheerio (much faster than Puppeteer's $ methods)
  const html = await page.content();
  await browser.close();

  const $ = cheerio.load(html);
  // Now use Cheerio as normal — much faster than page.evaluate()
  return $('.product').map((i, el) => ({
    name: $(el).find('.name').text().trim(),
    price: $(el).find('.price').text().trim()
  })).get();
}

Option 3: Use Mantis API

The simplest solution — Mantis handles JavaScript rendering, anti-bot detection, and proxy rotation automatically:

import axios from 'axios';

// One API call replaces Puppeteer + Cheerio + proxy rotation + CAPTCHA handling
const { data } = await axios.post('https://api.mantisapi.com/extract', {
  url: 'https://example.com/products',
  selectors: {
    products: {
      selector: '.product-card',
      type: 'list',
      fields: {
        name: '.name',
        price: '.price',
        rating: { selector: '.rating', attr: 'data-score' }
      }
    }
  }
}, {
  headers: { 'x-api-key': 'YOUR_API_KEY' }
});

console.log(data.products); // Clean, structured data

12. Cheerio vs Puppeteer vs Playwright vs API

Feature	Cheerio	Puppeteer	Playwright	Mantis API
Language	Node.js	Node.js	Node.js/Python/Java	Any (REST API)
JavaScript rendering	❌ No	✅ Yes	✅ Yes	✅ Yes
Speed (per page)	~50-200ms	~2-10s	~2-8s	~200ms
Memory usage	~20MB	~300MB+	~300MB+	~0 (server-side)
Anti-bot bypass	❌ Manual	⚠️ With plugins	⚠️ With plugins	✅ Built-in
Proxy rotation	❌ Manual	⚠️ Manual config	⚠️ Manual config	✅ Built-in
Screenshots	❌ No	✅ Yes	✅ Yes	✅ Yes
Learning curve	Low (jQuery)	Medium	Medium	Low (REST API)
Best for	Static HTML pages	JS-heavy sites	Cross-browser testing	Production at scale
Cost at scale	$50-200/mo (servers)	$200-800/mo (infra)	$200-800/mo (infra)	$29-299/mo

💡 Recommendation: Start with Cheerio for static pages — it's fast and simple. When you hit JavaScript-rendered pages or anti-bot measures, switch to Puppeteer or Mantis API. For production workloads that need reliability, an API eliminates the infrastructure headache.

13. The API Shortcut: Skip the Parsing

Here's the truth about web scraping at scale: the hard part isn't parsing HTML. Cheerio handles that beautifully. The hard parts are:

Anti-bot detection — CAPTCHAs, fingerprinting, IP bans
JavaScript rendering — SPAs, dynamic content, infinite scroll
Proxy rotation — Residential proxies cost $5-15/GB
Infrastructure — Headless browsers eat RAM and CPU
Maintenance — Sites change layouts, selectors break

A web scraping API handles all of this. Here's the cost comparison:

Component	DIY (Cheerio + Puppeteer)	Mantis API
Proxy rotation	$100-500/mo	Included
Headless browser servers	$50-200/mo	Included
CAPTCHA solving	$50-200/mo	Included
Engineering time	$$$	None
Total	$200-900/mo	$29-299/mo

// Cheerio: ~50 lines of code, manual proxy rotation, manual error handling
// Mantis API: 5 lines of code, everything handled

const { data } = await axios.get('https://api.mantisapi.com/screenshot', {
  params: { url: 'https://example.com', format: 'png' },
  headers: { 'x-api-key': 'YOUR_API_KEY' }
});
// Done. Screenshot captured. JavaScript rendered. Anti-bot bypassed.

📣 From Cheerio to Production in Minutes

Prototype with Cheerio. Ship with Mantis. Get 100 free API calls/month — no credit card required.

Start Free →

14. FAQ

Check the structured FAQ data above for common Cheerio web scraping questions, covering use cases, Cheerio vs Puppeteer, JavaScript rendering limitations, speed benchmarks, Cheerio vs JSDOM, and legal considerations.

What's Next?

You now have everything you need to build production-grade scrapers with Cheerio and Node.js. Here are your next steps:

For JavaScript-heavy pages: Read our Puppeteer Complete Guide
For Python developers: Check our BeautifulSoup Guide or Scrapy Guide
Getting blocked? Read How to Scrape Without Getting Blocked
Comparing tools? See our Best Web Scraping APIs for AI Agents
Ready to scale? Try Mantis API free — 100 calls/month, no credit card

Web Scraping with Cheerio and Node.js in 2026: The Complete Guide

Table of Contents

1. Installation & Setup

2. Your First Cheerio Scraper

3. CSS Selectors: The Core of Cheerio

Basic Selectors

Combinators

Pseudo-selectors

4. DOM Traversal: parent, children, siblings

Chaining Methods

5. Extracting Text, Attributes, and HTML

Text Extraction

Attribute Extraction

HTML Extraction

6. Scraping Tables and Lists

Scraping Nested Lists

7. Handling Pagination

Next-Page Links

Page Number Pagination

8. Concurrent Scraping with Promise.all

9. Error Handling and Retries

10. Production-Ready Scraper Class

🚀 Need Data at Scale? Skip the Infrastructure

11. When Pages Need JavaScript

How to Detect JavaScript-Rendered Content

Option 1: Check for Hidden APIs

Option 2: Puppeteer + Cheerio Combo

Option 3: Use Mantis API

12. Cheerio vs Puppeteer vs Playwright vs API

13. The API Shortcut: Skip the Parsing

📣 From Cheerio to Production in Minutes

14. FAQ

What's Next?