Web Scraping with Rust in 2026: The Complete Guide

Published March 28, 2026 · 20 min read · By the Mantis Team

📑 Table of Contents

  1. Why Rust for Web Scraping?
  2. The Rust Web Scraping Stack
  3. Quick Start: Your First Rust Scraper
  4. HTML Parsing with the Scraper Crate
  5. Async Scraping with Tokio + Reqwest
  6. Handling JavaScript-Rendered Pages
  7. Large-Scale Crawling with Spider
  8. Anti-Blocking: Headers, Proxies & Rate Limiting
  9. When to Use a Web Scraping API Instead
  10. Rust vs Python vs Node.js vs Go
  11. Production Tips

Web scraping with Rust combines the language's legendary performance with its safety guarantees to build scrapers that are fast, memory-efficient, and virtually crash-free. If you're already writing Rust — or you need to process thousands of pages with minimal resources — Rust is an excellent choice for web scraping in 2026.

This guide covers every major Rust scraping tool: reqwest for HTTP, scraper for HTML parsing, headless_chrome for JavaScript-rendered pages, and the spider framework for large-scale crawling. You'll get working code examples you can copy and run today.

Why Rust for Web Scraping?

Rust isn't the most common choice for web scraping — Python and JavaScript dominate that space. But Rust has unique advantages:

💡 When Rust shines: High-volume scraping (10K+ pages), embedded scraping in larger Rust applications, resource-constrained environments, and anywhere you need maximum reliability.

The Rust Web Scraping Stack

CratePurposeDownloads/mo
reqwestAsync HTTP client (like Python's requests)~15M
scraperHTML parsing with CSS selectors~2M
select.rsAlternative HTML parser~500K
headless_chromeChrome DevTools Protocol (headless browser)~200K
chromiumoxideModern async Chrome automation~150K
spiderFull crawling framework~100K
tokioAsync runtime~40M
serdeSerialization (JSON/CSV output)~50M

The core combo: reqwest + scraper + tokio + serde. This covers 90% of scraping use cases.

Quick Start: Your First Rust Scraper

Let's build a simple scraper that fetches a page and extracts all links. First, set up your project:

cargo new web_scraper
cd web_scraper

Add dependencies to Cargo.toml:

[dependencies]
reqwest = { version = "0.12", features = ["json"] }
scraper = "0.20"
tokio = { version = "1", features = ["full"] }
serde = { version = "1", features = ["derive"] }
serde_json = "1"

Now write the scraper in src/main.rs:

use reqwest;
use scraper::{Html, Selector};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Fetch the page
    let url = "https://news.ycombinator.com";
    let response = reqwest::get(url).await?.text().await?;

    // Parse HTML
    let document = Html::parse_document(&response);
    let selector = Selector::parse("a.titleline > a").unwrap();

    // Extract titles and links
    for element in document.select(&selector) {
        let title = element.text().collect::<String>();
        let href = element.value().attr("href").unwrap_or("#");
        println!("{title} → {href}");
    }

    Ok(())
}
cargo run

That's it. Three crates, ~15 lines of code, and you have a working scraper. The compiled binary is a single file under 10MB that runs anywhere.

HTML Parsing with the Scraper Crate

The scraper crate provides CSS selector-based HTML parsing — similar to BeautifulSoup or Cheerio but faster:

use scraper::{Html, Selector};
use serde::Serialize;

#[derive(Debug, Serialize)]
struct Product {
    name: String,
    price: String,
    url: String,
}

fn parse_products(html: &str) -> Vec<Product> {
    let document = Html::parse_document(html);
    let card_sel = Selector::parse(".product-card").unwrap();
    let name_sel = Selector::parse("h3.product-name").unwrap();
    let price_sel = Selector::parse(".price").unwrap();
    let link_sel = Selector::parse("a").unwrap();

    document.select(&card_sel).filter_map(|card| {
        let name = card.select(&name_sel).next()?.text().collect::<String>();
        let price = card.select(&price_sel).next()?.text().collect::<String>();
        let url = card.select(&link_sel).next()?
            .value().attr("href")?.to_string();

        Some(Product { name, price, url })
    }).collect()
}

Key scraper features:

Async Scraping with Tokio + Reqwest

Rust's async ecosystem makes concurrent scraping elegant and efficient. Here's how to scrape multiple pages simultaneously:

use reqwest::Client;
use scraper::{Html, Selector};
use std::time::Duration;
use tokio::time::sleep;

async fn scrape_page(client: &Client, url: &str) -> Result<Vec<String>, reqwest::Error> {
    let response = client.get(url)
        .header("User-Agent", "Mozilla/5.0 (compatible; MantisBot/1.0)")
        .send()
        .await?
        .text()
        .await?;

    let document = Html::parse_document(&response);
    let selector = Selector::parse("h2").unwrap();

    Ok(document.select(&selector)
        .map(|el| el.text().collect::<String>())
        .collect())
}

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = Client::builder()
        .timeout(Duration::from_secs(30))
        .build()?;

    let urls = vec![
        "https://example.com/page/1",
        "https://example.com/page/2",
        "https://example.com/page/3",
        "https://example.com/page/4",
        "https://example.com/page/5",
    ];

    // Scrape all pages concurrently
    let mut handles = vec![];
    for url in urls {
        let client = client.clone();
        let url = url.to_string();
        handles.push(tokio::spawn(async move {
            // Polite delay
            sleep(Duration::from_millis(500)).await;
            scrape_page(&client, &url).await
        }));
    }

    for handle in handles {
        match handle.await? {
            Ok(titles) => {
                for title in titles {
                    println!("  {title}");
                }
            }
            Err(e) => eprintln!("Error: {e}"),
        }
    }

    Ok(())
}
💡 Concurrency control: Use tokio::sync::Semaphore to limit concurrent requests. Without it, you might fire thousands of requests simultaneously and get blocked.

Semaphore-Based Rate Limiting

use tokio::sync::Semaphore;
use std::sync::Arc;

let semaphore = Arc::new(Semaphore::new(10)); // Max 10 concurrent requests

for url in urls {
    let permit = semaphore.clone().acquire_owned().await.unwrap();
    let client = client.clone();
    tokio::spawn(async move {
        let result = scrape_page(&client, &url).await;
        drop(permit); // Release when done
        result
    });
}

Handling JavaScript-Rendered Pages

Many modern websites render content with JavaScript. Reqwest only fetches raw HTML — it won't execute JS. You have three options:

Option 1: headless_chrome

use headless_chrome::Browser;

fn scrape_spa(url: &str) -> Result<String, Box<dyn std::error::Error>> {
    let browser = Browser::default()?;
    let tab = browser.new_tab()?;

    tab.navigate_to(url)?;
    tab.wait_for_element(".dynamic-content")?;

    let content = tab.get_content()?;
    Ok(content)
}

Option 2: chromiumoxide (async-native)

use chromiumoxide::Browser;
use futures::StreamExt;

let (mut browser, mut handler) = Browser::launch(
    chromiumoxide::BrowserConfig::builder()
        .build()
        .map_err(|e| format!("Config error: {e}"))?
).await?;

let handle = tokio::spawn(async move {
    while let Some(event) = handler.next().await {
        let _ = event;
    }
});

let page = browser.new_page(url).await?;
page.wait_for_navigation().await?;
let html = page.content().await?;
browser.close().await?;
handle.await?;

Option 3: Use an API (Recommended for Production)

Headless browsers in Rust add complexity — binary dependencies, memory overhead, and flaky waits. For production, a web scraping API handles rendering server-side:

use reqwest::Client;
use serde::Deserialize;

#[derive(Deserialize)]
struct ScrapeResponse {
    content: String,
    metadata: serde_json::Value,
}

async fn scrape_with_api(client: &Client, url: &str) -> Result<ScrapeResponse, reqwest::Error> {
    client.post("https://api.mantisapi.com/v1/scrape")
        .header("Authorization", "Bearer YOUR_API_KEY")
        .json(&serde_json::json!({
            "url": url,
            "format": "markdown",
            "wait_for": "networkidle"
        }))
        .send()
        .await?
        .json::<ScrapeResponse>()
        .await
}

🦀 Scrape Any Page from Rust — No Headless Browser Needed

Mantis handles JavaScript rendering, anti-blocking, and proxies. Your Rust code stays clean.

Try Mantis Free →

Large-Scale Crawling with Spider

The spider crate is Rust's answer to Scrapy — a full crawling framework with built-in concurrency, link following, and robots.txt compliance:

use spider::website::Website;
use spider::tokio;

#[tokio::main]
async fn main() {
    let mut website = Website::new("https://example.com")
        .with_limit(100)           // Max 100 pages
        .with_delay(500)           // 500ms between requests
        .with_respect_robots_txt(true)
        .build()
        .unwrap();

    website.crawl().await;

    for page in website.get_pages().unwrap() {
        println!("URL: {}", page.get_url());
        println!("Status: {}", page.get_status_code());
        // Parse page.get_html() with scraper crate
    }
}

Spider handles the crawling infrastructure — request scheduling, deduplication, concurrent fetching — while you focus on parsing.

Anti-Blocking: Headers, Proxies & Rate Limiting

Getting blocked is the #1 scraping challenge. Here's a production-ready Rust client with anti-blocking measures:

use reqwest::{Client, Proxy};
use rand::seq::SliceRandom;

fn build_stealth_client(proxy_url: Option<&str>) -> Result<Client, reqwest::Error> {
    let mut builder = Client::builder()
        .timeout(Duration::from_secs(30))
        .cookie_store(true)
        .gzip(true);

    if let Some(proxy) = proxy_url {
        builder = builder.proxy(Proxy::all(proxy)?);
    }

    builder.build()
}

fn random_user_agent() -> &'static str {
    let agents = [
        "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 Chrome/120.0.0.0 Safari/537.36",
        "Mozilla/5.0 (Macintosh; Intel Mac OS X 14_0) AppleWebKit/537.36 Chrome/120.0.0.0 Safari/537.36",
        "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 Chrome/120.0.0.0 Safari/537.36",
        "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:120.0) Gecko/20100101 Firefox/120.0",
    ];
    agents.choose(&mut rand::thread_rng()).unwrap()
}

async fn stealth_get(client: &Client, url: &str) -> Result<String, reqwest::Error> {
    client.get(url)
        .header("User-Agent", random_user_agent())
        .header("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8")
        .header("Accept-Language", "en-US,en;q=0.9")
        .header("Accept-Encoding", "gzip, deflate, br")
        .header("Connection", "keep-alive")
        .send()
        .await?
        .text()
        .await
}

For more anti-blocking techniques, see our complete anti-blocking guide.

When to Use a Web Scraping API Instead

Building scrapers in Rust is powerful, but comes with ongoing maintenance. Consider a web scraping API when:

Cost ComponentDIY Rust ScrapingMantis API
HTTP ClientFree (reqwest)Included
JS RenderingChromium binary (~300MB RAM/instance)Included
Proxies$50-500/monthIncluded
CAPTCHA Solving$1-3 per 1,000Included
MaintenanceOngoing engineering timeZero
Total (5K pages/mo)$100-800/mo + dev time$29/month

Need Data at Scale? Skip the Infrastructure.

100 free API calls/month. Paid plans from $29/month for 5,000 calls.

View Pricing →

Rust vs Python vs Node.js vs Go

FeatureRustPythonNode.jsGo
Performance⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐
Memory Usage⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐
Ecosystem Size⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐
Ease of Use⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐
Concurrency⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐
DeploymentSingle binaryVirtual envnode_modulesSingle binary
Best ForHigh-volume, embeddedQuick scripts, MLBrowser automationInfrastructure tools

Explore our language-specific guides: Python · JavaScript/Node.js · Go & Colly

Production Tips

Use structured error handling. Replace unwrap() with proper error types. The anyhow and thiserror crates make this easy.

Serialize output with serde. Define your scraped data as structs with #[derive(Serialize)], then output JSON or CSV natively.

Use connection pooling. Reqwest's Client maintains a connection pool internally — create one Client and reuse it across all requests.

Compile in release mode. cargo build --release enables optimizations that make HTML parsing 5-10x faster than debug builds.

Handle retries with exponential backoff. Use the reqwest-retry or backon crate for automatic retry logic.

Monitor with tracing. The tracing crate gives you structured logging — essential for debugging scrapers that process thousands of pages.

⚠️ Legal note: Always respect robots.txt, don't overload servers, and avoid scraping personal data without consent. See our guide on legal web scraping for details.

Next Steps

Rust gives you unmatched performance and safety for web scraping. Whether you're building a small data collector or a high-volume crawling pipeline, the Rust ecosystem has the tools you need. And when the infrastructure burden gets too heavy, Mantis handles it all with a single API call.