Web scraping with Rust combines the language's legendary performance with its safety guarantees to build scrapers that are fast, memory-efficient, and virtually crash-free. If you're already writing Rust — or you need to process thousands of pages with minimal resources — Rust is an excellent choice for web scraping in 2026.
This guide covers every major Rust scraping tool: reqwest for HTTP, scraper for HTML parsing, headless_chrome for JavaScript-rendered pages, and the spider framework for large-scale crawling. You'll get working code examples you can copy and run today.
Rust isn't the most common choice for web scraping — Python and JavaScript dominate that space. But Rust has unique advantages:
| Crate | Purpose | Downloads/mo |
|---|---|---|
| reqwest | Async HTTP client (like Python's requests) | ~15M |
| scraper | HTML parsing with CSS selectors | ~2M |
| select.rs | Alternative HTML parser | ~500K |
| headless_chrome | Chrome DevTools Protocol (headless browser) | ~200K |
| chromiumoxide | Modern async Chrome automation | ~150K |
| spider | Full crawling framework | ~100K |
| tokio | Async runtime | ~40M |
| serde | Serialization (JSON/CSV output) | ~50M |
The core combo: reqwest + scraper + tokio + serde. This covers 90% of scraping use cases.
Let's build a simple scraper that fetches a page and extracts all links. First, set up your project:
cargo new web_scraper
cd web_scraper
Add dependencies to Cargo.toml:
[dependencies]
reqwest = { version = "0.12", features = ["json"] }
scraper = "0.20"
tokio = { version = "1", features = ["full"] }
serde = { version = "1", features = ["derive"] }
serde_json = "1"
Now write the scraper in src/main.rs:
use reqwest;
use scraper::{Html, Selector};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// Fetch the page
let url = "https://news.ycombinator.com";
let response = reqwest::get(url).await?.text().await?;
// Parse HTML
let document = Html::parse_document(&response);
let selector = Selector::parse("a.titleline > a").unwrap();
// Extract titles and links
for element in document.select(&selector) {
let title = element.text().collect::<String>();
let href = element.value().attr("href").unwrap_or("#");
println!("{title} → {href}");
}
Ok(())
}
cargo run
That's it. Three crates, ~15 lines of code, and you have a working scraper. The compiled binary is a single file under 10MB that runs anywhere.
The scraper crate provides CSS selector-based HTML parsing — similar to BeautifulSoup or Cheerio but faster:
use scraper::{Html, Selector};
use serde::Serialize;
#[derive(Debug, Serialize)]
struct Product {
name: String,
price: String,
url: String,
}
fn parse_products(html: &str) -> Vec<Product> {
let document = Html::parse_document(html);
let card_sel = Selector::parse(".product-card").unwrap();
let name_sel = Selector::parse("h3.product-name").unwrap();
let price_sel = Selector::parse(".price").unwrap();
let link_sel = Selector::parse("a").unwrap();
document.select(&card_sel).filter_map(|card| {
let name = card.select(&name_sel).next()?.text().collect::<String>();
let price = card.select(&price_sel).next()?.text().collect::<String>();
let url = card.select(&link_sel).next()?
.value().attr("href")?.to_string();
Some(Product { name, price, url })
}).collect()
}
Key scraper features:
Rust's async ecosystem makes concurrent scraping elegant and efficient. Here's how to scrape multiple pages simultaneously:
use reqwest::Client;
use scraper::{Html, Selector};
use std::time::Duration;
use tokio::time::sleep;
async fn scrape_page(client: &Client, url: &str) -> Result<Vec<String>, reqwest::Error> {
let response = client.get(url)
.header("User-Agent", "Mozilla/5.0 (compatible; MantisBot/1.0)")
.send()
.await?
.text()
.await?;
let document = Html::parse_document(&response);
let selector = Selector::parse("h2").unwrap();
Ok(document.select(&selector)
.map(|el| el.text().collect::<String>())
.collect())
}
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let client = Client::builder()
.timeout(Duration::from_secs(30))
.build()?;
let urls = vec![
"https://example.com/page/1",
"https://example.com/page/2",
"https://example.com/page/3",
"https://example.com/page/4",
"https://example.com/page/5",
];
// Scrape all pages concurrently
let mut handles = vec![];
for url in urls {
let client = client.clone();
let url = url.to_string();
handles.push(tokio::spawn(async move {
// Polite delay
sleep(Duration::from_millis(500)).await;
scrape_page(&client, &url).await
}));
}
for handle in handles {
match handle.await? {
Ok(titles) => {
for title in titles {
println!(" {title}");
}
}
Err(e) => eprintln!("Error: {e}"),
}
}
Ok(())
}
use tokio::sync::Semaphore;
use std::sync::Arc;
let semaphore = Arc::new(Semaphore::new(10)); // Max 10 concurrent requests
for url in urls {
let permit = semaphore.clone().acquire_owned().await.unwrap();
let client = client.clone();
tokio::spawn(async move {
let result = scrape_page(&client, &url).await;
drop(permit); // Release when done
result
});
}
Many modern websites render content with JavaScript. Reqwest only fetches raw HTML — it won't execute JS. You have three options:
use headless_chrome::Browser;
fn scrape_spa(url: &str) -> Result<String, Box<dyn std::error::Error>> {
let browser = Browser::default()?;
let tab = browser.new_tab()?;
tab.navigate_to(url)?;
tab.wait_for_element(".dynamic-content")?;
let content = tab.get_content()?;
Ok(content)
}
use chromiumoxide::Browser;
use futures::StreamExt;
let (mut browser, mut handler) = Browser::launch(
chromiumoxide::BrowserConfig::builder()
.build()
.map_err(|e| format!("Config error: {e}"))?
).await?;
let handle = tokio::spawn(async move {
while let Some(event) = handler.next().await {
let _ = event;
}
});
let page = browser.new_page(url).await?;
page.wait_for_navigation().await?;
let html = page.content().await?;
browser.close().await?;
handle.await?;
Headless browsers in Rust add complexity — binary dependencies, memory overhead, and flaky waits. For production, a web scraping API handles rendering server-side:
use reqwest::Client;
use serde::Deserialize;
#[derive(Deserialize)]
struct ScrapeResponse {
content: String,
metadata: serde_json::Value,
}
async fn scrape_with_api(client: &Client, url: &str) -> Result<ScrapeResponse, reqwest::Error> {
client.post("https://api.mantisapi.com/v1/scrape")
.header("Authorization", "Bearer YOUR_API_KEY")
.json(&serde_json::json!({
"url": url,
"format": "markdown",
"wait_for": "networkidle"
}))
.send()
.await?
.json::<ScrapeResponse>()
.await
}
Mantis handles JavaScript rendering, anti-blocking, and proxies. Your Rust code stays clean.
Try Mantis Free →The spider crate is Rust's answer to Scrapy — a full crawling framework with built-in concurrency, link following, and robots.txt compliance:
use spider::website::Website;
use spider::tokio;
#[tokio::main]
async fn main() {
let mut website = Website::new("https://example.com")
.with_limit(100) // Max 100 pages
.with_delay(500) // 500ms between requests
.with_respect_robots_txt(true)
.build()
.unwrap();
website.crawl().await;
for page in website.get_pages().unwrap() {
println!("URL: {}", page.get_url());
println!("Status: {}", page.get_status_code());
// Parse page.get_html() with scraper crate
}
}
Spider handles the crawling infrastructure — request scheduling, deduplication, concurrent fetching — while you focus on parsing.
Getting blocked is the #1 scraping challenge. Here's a production-ready Rust client with anti-blocking measures:
use reqwest::{Client, Proxy};
use rand::seq::SliceRandom;
fn build_stealth_client(proxy_url: Option<&str>) -> Result<Client, reqwest::Error> {
let mut builder = Client::builder()
.timeout(Duration::from_secs(30))
.cookie_store(true)
.gzip(true);
if let Some(proxy) = proxy_url {
builder = builder.proxy(Proxy::all(proxy)?);
}
builder.build()
}
fn random_user_agent() -> &'static str {
let agents = [
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 Chrome/120.0.0.0 Safari/537.36",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 14_0) AppleWebKit/537.36 Chrome/120.0.0.0 Safari/537.36",
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 Chrome/120.0.0.0 Safari/537.36",
"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:120.0) Gecko/20100101 Firefox/120.0",
];
agents.choose(&mut rand::thread_rng()).unwrap()
}
async fn stealth_get(client: &Client, url: &str) -> Result<String, reqwest::Error> {
client.get(url)
.header("User-Agent", random_user_agent())
.header("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8")
.header("Accept-Language", "en-US,en;q=0.9")
.header("Accept-Encoding", "gzip, deflate, br")
.header("Connection", "keep-alive")
.send()
.await?
.text()
.await
}
For more anti-blocking techniques, see our complete anti-blocking guide.
Building scrapers in Rust is powerful, but comes with ongoing maintenance. Consider a web scraping API when:
| Cost Component | DIY Rust Scraping | Mantis API |
|---|---|---|
| HTTP Client | Free (reqwest) | Included |
| JS Rendering | Chromium binary (~300MB RAM/instance) | Included |
| Proxies | $50-500/month | Included |
| CAPTCHA Solving | $1-3 per 1,000 | Included |
| Maintenance | Ongoing engineering time | Zero |
| Total (5K pages/mo) | $100-800/mo + dev time | $29/month |
100 free API calls/month. Paid plans from $29/month for 5,000 calls.
View Pricing →| Feature | Rust | Python | Node.js | Go |
|---|---|---|---|---|
| Performance | ⭐⭐⭐⭐⭐ | ⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐⭐ |
| Memory Usage | ⭐⭐⭐⭐⭐ | ⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐⭐ |
| Ecosystem Size | ⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐ |
| Ease of Use | ⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐ |
| Concurrency | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
| Deployment | Single binary | Virtual env | node_modules | Single binary |
| Best For | High-volume, embedded | Quick scripts, ML | Browser automation | Infrastructure tools |
Explore our language-specific guides: Python · JavaScript/Node.js · Go & Colly
Use structured error handling. Replace unwrap() with proper error types. The anyhow and thiserror crates make this easy.
Serialize output with serde. Define your scraped data as structs with #[derive(Serialize)], then output JSON or CSV natively.
Use connection pooling. Reqwest's Client maintains a connection pool internally — create one Client and reuse it across all requests.
Compile in release mode. cargo build --release enables optimizations that make HTML parsing 5-10x faster than debug builds.
Handle retries with exponential backoff. Use the reqwest-retry or backon crate for automatic retry logic.
Monitor with tracing. The tracing crate gives you structured logging — essential for debugging scrapers that process thousands of pages.
Rust gives you unmatched performance and safety for web scraping. Whether you're building a small data collector or a high-volume crawling pipeline, the Rust ecosystem has the tools you need. And when the infrastructure burden gets too heavy, Mantis handles it all with a single API call.