Is Rust good for web scraping?

Yes. Rust offers exceptional performance for web scraping — low memory usage, zero-cost abstractions, and fearless concurrency. With reqwest for HTTP and the scraper crate for HTML parsing, Rust scrapers can process thousands of pages with minimal resource consumption. It's ideal for high-volume, production scraping where speed and reliability matter.

What is the best Rust library for web scraping?

The most popular combination is reqwest (async HTTP client) + scraper (CSS selector-based HTML parsing). For JavaScript-rendered pages, use headless_chrome or chromiumoxide. For large-scale crawling, the spider crate provides a full crawling framework similar to Python's Scrapy.

Can Rust scrape JavaScript-rendered websites?

Yes, using headless browser crates like headless_chrome or chromiumoxide, which control a real Chromium browser. Alternatively, you can use a web scraping API like Mantis that handles JavaScript rendering server-side, so your Rust code only makes simple HTTP requests.

Is Rust faster than Python for web scraping?

Rust is significantly faster for CPU-bound tasks like HTML parsing — typically 10-50x faster than Python. However, web scraping is mostly I/O-bound (waiting for HTTP responses), so the real advantage is lower memory usage and better concurrency. A Rust scraper can handle thousands of concurrent connections with minimal RAM.

How do I handle rate limiting in Rust web scraping?

Use tokio::time::sleep for delays between requests, the governor crate for rate limiting, and tower middleware for retry logic with exponential backoff. For production scraping, a managed API like Mantis handles rate limiting, retries, and anti-blocking automatically.

Should I use Rust or Python for web scraping?

Use Python for quick scripts, prototyping, and when you need the largest ecosystem of scraping tools. Use Rust when you need maximum performance, low memory usage, type safety, or are building scraping into a larger Rust application. For production workloads at scale, consider a web scraping API like Mantis that works with any language.

Web Scraping with Rust in 2026: The Complete Guide

Published March 28, 2026 · 20 min read · By the Mantis Team

📑 Table of Contents

Why Rust for Web Scraping?
The Rust Web Scraping Stack
Quick Start: Your First Rust Scraper
HTML Parsing with the Scraper Crate
Async Scraping with Tokio + Reqwest
Handling JavaScript-Rendered Pages
Large-Scale Crawling with Spider
Anti-Blocking: Headers, Proxies & Rate Limiting
When to Use a Web Scraping API Instead
Rust vs Python vs Node.js vs Go
Production Tips

Web scraping with Rust combines the language's legendary performance with its safety guarantees to build scrapers that are fast, memory-efficient, and virtually crash-free. If you're already writing Rust — or you need to process thousands of pages with minimal resources — Rust is an excellent choice for web scraping in 2026.

This guide covers every major Rust scraping tool: reqwest for HTTP, scraper for HTML parsing, headless_chrome for JavaScript-rendered pages, and the spider framework for large-scale crawling. You'll get working code examples you can copy and run today.

Why Rust for Web Scraping?

Rust isn't the most common choice for web scraping — Python and JavaScript dominate that space. But Rust has unique advantages:

Speed: HTML parsing in Rust is 10-50x faster than Python. When you're processing millions of pages, that adds up.
Memory efficiency: A Rust scraper handling 1,000 concurrent connections uses a fraction of the memory Python would need.
Fearless concurrency: Rust's ownership model prevents data races at compile time. Your async scraper won't have subtle threading bugs.
Single binary deployment: Compile once, deploy anywhere. No Python environments, no node_modules, no runtime dependencies.
Type safety: Define your scraped data as structs with serde. Malformed data gets caught at compile time, not in production.

💡 When Rust shines: High-volume scraping (10K+ pages), embedded scraping in larger Rust applications, resource-constrained environments, and anywhere you need maximum reliability.

The Rust Web Scraping Stack

Crate	Purpose	Downloads/mo
reqwest	Async HTTP client (like Python's requests)	~15M
scraper	HTML parsing with CSS selectors	~2M
select.rs	Alternative HTML parser	~500K
headless_chrome	Chrome DevTools Protocol (headless browser)	~200K
chromiumoxide	Modern async Chrome automation	~150K
spider	Full crawling framework	~100K
tokio	Async runtime	~40M
serde	Serialization (JSON/CSV output)	~50M

The core combo: reqwest + scraper + tokio + serde. This covers 90% of scraping use cases.

Quick Start: Your First Rust Scraper

Let's build a simple scraper that fetches a page and extracts all links. First, set up your project:

cargo new web_scraper
cd web_scraper

Add dependencies to Cargo.toml:

[dependencies]
reqwest = { version = "0.12", features = ["json"] }
scraper = "0.20"
tokio = { version = "1", features = ["full"] }
serde = { version = "1", features = ["derive"] }
serde_json = "1"

Now write the scraper in src/main.rs:

use reqwest;
use scraper::{Html, Selector};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Fetch the page
    let url = "https://news.ycombinator.com";
    let response = reqwest::get(url).await?.text().await?;

    // Parse HTML
    let document = Html::parse_document(&response);
    let selector = Selector::parse("a.titleline > a").unwrap();

    // Extract titles and links
    for element in document.select(&selector) {
        let title = element.text().collect::<String>();
        let href = element.value().attr("href").unwrap_or("#");
        println!("{title} → {href}");
    }

    Ok(())
}

cargo run

That's it. Three crates, ~15 lines of code, and you have a working scraper. The compiled binary is a single file under 10MB that runs anywhere.

HTML Parsing with the Scraper Crate

The scraper crate provides CSS selector-based HTML parsing — similar to BeautifulSoup or Cheerio but faster:

use scraper::{Html, Selector};
use serde::Serialize;

#[derive(Debug, Serialize)]
struct Product {
    name: String,
    price: String,
    url: String,
}

fn parse_products(html: &str) -> Vec<Product> {
    let document = Html::parse_document(html);
    let card_sel = Selector::parse(".product-card").unwrap();
    let name_sel = Selector::parse("h3.product-name").unwrap();
    let price_sel = Selector::parse(".price").unwrap();
    let link_sel = Selector::parse("a").unwrap();

    document.select(&card_sel).filter_map(|card| {
        let name = card.select(&name_sel).next()?.text().collect::<String>();
        let price = card.select(&price_sel).next()?.text().collect::<String>();
        let url = card.select(&link_sel).next()?
            .value().attr("href")?.to_string();

        Some(Product { name, price, url })
    }).collect()
}

Key scraper features:

CSS selectors: Full CSS3 selector support — classes, IDs, attributes, pseudo-classes
Text extraction: .text().collect::<String>() to get inner text
Attributes: .value().attr("href") for any HTML attribute
Nested selection: Select within selected elements for structured extraction

Async Scraping with Tokio + Reqwest

Rust's async ecosystem makes concurrent scraping elegant and efficient. Here's how to scrape multiple pages simultaneously:

use reqwest::Client;
use scraper::{Html, Selector};
use std::time::Duration;
use tokio::time::sleep;

async fn scrape_page(client: &Client, url: &str) -> Result<Vec<String>, reqwest::Error> {
    let response = client.get(url)
        .header("User-Agent", "Mozilla/5.0 (compatible; MantisBot/1.0)")
        .send()
        .await?
        .text()
        .await?;

    let document = Html::parse_document(&response);
    let selector = Selector::parse("h2").unwrap();

    Ok(document.select(&selector)
        .map(|el| el.text().collect::<String>())
        .collect())
}

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = Client::builder()
        .timeout(Duration::from_secs(30))
        .build()?;

    let urls = vec![
        "https://example.com/page/1",
        "https://example.com/page/2",
        "https://example.com/page/3",
        "https://example.com/page/4",
        "https://example.com/page/5",
    ];

    // Scrape all pages concurrently
    let mut handles = vec![];
    for url in urls {
        let client = client.clone();
        let url = url.to_string();
        handles.push(tokio::spawn(async move {
            // Polite delay
            sleep(Duration::from_millis(500)).await;
            scrape_page(&client, &url).await
        }));
    }

    for handle in handles {
        match handle.await? {
            Ok(titles) => {
                for title in titles {
                    println!("  {title}");
                }
            }
            Err(e) => eprintln!("Error: {e}"),
        }
    }

    Ok(())
}

💡 Concurrency control: Use tokio::sync::Semaphore to limit concurrent requests. Without it, you might fire thousands of requests simultaneously and get blocked.

Semaphore-Based Rate Limiting

use tokio::sync::Semaphore;
use std::sync::Arc;

let semaphore = Arc::new(Semaphore::new(10)); // Max 10 concurrent requests

for url in urls {
    let permit = semaphore.clone().acquire_owned().await.unwrap();
    let client = client.clone();
    tokio::spawn(async move {
        let result = scrape_page(&client, &url).await;
        drop(permit); // Release when done
        result
    });
}

Handling JavaScript-Rendered Pages

Many modern websites render content with JavaScript. Reqwest only fetches raw HTML — it won't execute JS. You have three options:

Option 1: headless_chrome

use headless_chrome::Browser;

fn scrape_spa(url: &str) -> Result<String, Box<dyn std::error::Error>> {
    let browser = Browser::default()?;
    let tab = browser.new_tab()?;

    tab.navigate_to(url)?;
    tab.wait_for_element(".dynamic-content")?;

    let content = tab.get_content()?;
    Ok(content)
}

Option 2: chromiumoxide (async-native)

use chromiumoxide::Browser;
use futures::StreamExt;

let (mut browser, mut handler) = Browser::launch(
    chromiumoxide::BrowserConfig::builder()
        .build()
        .map_err(|e| format!("Config error: {e}"))?
).await?;

let handle = tokio::spawn(async move {
    while let Some(event) = handler.next().await {
        let _ = event;
    }
});

let page = browser.new_page(url).await?;
page.wait_for_navigation().await?;
let html = page.content().await?;
browser.close().await?;
handle.await?;

Option 3: Use an API (Recommended for Production)

Headless browsers in Rust add complexity — binary dependencies, memory overhead, and flaky waits. For production, a web scraping API handles rendering server-side:

use reqwest::Client;
use serde::Deserialize;

#[derive(Deserialize)]
struct ScrapeResponse {
    content: String,
    metadata: serde_json::Value,
}

async fn scrape_with_api(client: &Client, url: &str) -> Result<ScrapeResponse, reqwest::Error> {
    client.post("https://api.mantisapi.com/v1/scrape")
        .header("Authorization", "Bearer YOUR_API_KEY")
        .json(&serde_json::json!({
            "url": url,
            "format": "markdown",
            "wait_for": "networkidle"
        }))
        .send()
        .await?
        .json::<ScrapeResponse>()
        .await
}

🦀 Scrape Any Page from Rust — No Headless Browser Needed

Mantis handles JavaScript rendering, anti-blocking, and proxies. Your Rust code stays clean.

Try Mantis Free →

Large-Scale Crawling with Spider

The spider crate is Rust's answer to Scrapy — a full crawling framework with built-in concurrency, link following, and robots.txt compliance:

use spider::website::Website;
use spider::tokio;

#[tokio::main]
async fn main() {
    let mut website = Website::new("https://example.com")
        .with_limit(100)           // Max 100 pages
        .with_delay(500)           // 500ms between requests
        .with_respect_robots_txt(true)
        .build()
        .unwrap();

    website.crawl().await;

    for page in website.get_pages().unwrap() {
        println!("URL: {}", page.get_url());
        println!("Status: {}", page.get_status_code());
        // Parse page.get_html() with scraper crate
    }
}

Spider handles the crawling infrastructure — request scheduling, deduplication, concurrent fetching — while you focus on parsing.

Anti-Blocking: Headers, Proxies & Rate Limiting

Getting blocked is the #1 scraping challenge. Here's a production-ready Rust client with anti-blocking measures:

use reqwest::{Client, Proxy};
use rand::seq::SliceRandom;

fn build_stealth_client(proxy_url: Option<&str>) -> Result<Client, reqwest::Error> {
    let mut builder = Client::builder()
        .timeout(Duration::from_secs(30))
        .cookie_store(true)
        .gzip(true);

    if let Some(proxy) = proxy_url {
        builder = builder.proxy(Proxy::all(proxy)?);
    }

    builder.build()
}

fn random_user_agent() -> &'static str {
    let agents = [
        "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 Chrome/120.0.0.0 Safari/537.36",
        "Mozilla/5.0 (Macintosh; Intel Mac OS X 14_0) AppleWebKit/537.36 Chrome/120.0.0.0 Safari/537.36",
        "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 Chrome/120.0.0.0 Safari/537.36",
        "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:120.0) Gecko/20100101 Firefox/120.0",
    ];
    agents.choose(&mut rand::thread_rng()).unwrap()
}

async fn stealth_get(client: &Client, url: &str) -> Result<String, reqwest::Error> {
    client.get(url)
        .header("User-Agent", random_user_agent())
        .header("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8")
        .header("Accept-Language", "en-US,en;q=0.9")
        .header("Accept-Encoding", "gzip, deflate, br")
        .header("Connection", "keep-alive")
        .send()
        .await?
        .text()
        .await
}

For more anti-blocking techniques, see our complete anti-blocking guide.

When to Use a Web Scraping API Instead

Building scrapers in Rust is powerful, but comes with ongoing maintenance. Consider a web scraping API when:

JavaScript rendering: Headless browsers in Rust are heavy and complex
Anti-blocking: Proxy rotation, CAPTCHA solving, and fingerprinting are ongoing battles
Speed to market: An API call takes minutes to implement; a full scraping stack takes days
Scale: Managing proxy pools, browser farms, and retry logic is infrastructure you don't want to own

Cost Component	DIY Rust Scraping	Mantis API
HTTP Client	Free (reqwest)	Included
JS Rendering	Chromium binary (~300MB RAM/instance)	Included
Proxies	$50-500/month	Included
CAPTCHA Solving	$1-3 per 1,000	Included
Maintenance	Ongoing engineering time	Zero
Total (5K pages/mo)	$100-800/mo + dev time	$29/month

Need Data at Scale? Skip the Infrastructure.

100 free API calls/month. Paid plans from $29/month for 5,000 calls.

View Pricing →

Rust vs Python vs Node.js vs Go

Feature	Rust	Python	Node.js	Go
Performance	⭐⭐⭐⭐⭐	⭐⭐	⭐⭐⭐	⭐⭐⭐⭐
Memory Usage	⭐⭐⭐⭐⭐	⭐⭐	⭐⭐⭐	⭐⭐⭐⭐
Ecosystem Size	⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐
Ease of Use	⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐
Concurrency	⭐⭐⭐⭐⭐	⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐⭐⭐
Deployment	Single binary	Virtual env	node_modules	Single binary
Best For	High-volume, embedded	Quick scripts, ML	Browser automation	Infrastructure tools

Explore our language-specific guides: Python · JavaScript/Node.js · Go & Colly

Production Tips

Use structured error handling. Replace unwrap() with proper error types. The anyhow and thiserror crates make this easy.

Serialize output with serde. Define your scraped data as structs with #[derive(Serialize)], then output JSON or CSV natively.

Use connection pooling. Reqwest's Client maintains a connection pool internally — create one Client and reuse it across all requests.

Compile in release mode. cargo build --release enables optimizations that make HTML parsing 5-10x faster than debug builds.

Handle retries with exponential backoff. Use the reqwest-retry or backon crate for automatic retry logic.

Monitor with tracing. The tracing crate gives you structured logging — essential for debugging scrapers that process thousands of pages.

⚠️ Legal note: Always respect robots.txt, don't overload servers, and avoid scraping personal data without consent. See our guide on legal web scraping for details.

Next Steps

Get your free API key — 100 calls/month on the free tier
Python Web Scraping Guide — compare approaches across languages
AI Web Scraping Guide — let AI extract structured data from any page
Best Web Scraping APIs for AI Agents — full comparison of API options

Rust gives you unmatched performance and safety for web scraping. Whether you're building a small data collector or a high-volume crawling pipeline, the Rust ecosystem has the tools you need. And when the infrastructure burden gets too heavy, Mantis handles it all with a single API call.