Is Python Requests good for web scraping?

Yes — Python Requests is the most popular HTTP library for web scraping. It's simple, well-documented, and handles sessions, cookies, headers, and authentication. However, it can only fetch raw HTML — it cannot execute JavaScript. For JS-heavy sites, you need Playwright, Selenium, or a web scraping API like Mantis.

Can Python Requests handle JavaScript-rendered pages?

No. Requests is an HTTP library — it fetches the raw HTML response from the server. It does not execute JavaScript. For single-page apps (React, Angular, Vue) or dynamically loaded content, use Playwright, Selenium, or a scraping API that handles JS rendering automatically.

What is the difference between Requests and urllib3?

Requests is built on top of urllib3 and provides a much friendlier API. urllib3 is lower-level and handles connection pooling, retries, and SSL. For web scraping, always use Requests — it handles cookies, sessions, redirects, and encoding automatically.

How do I avoid getting blocked when scraping with Requests?

Rotate User-Agent headers, add realistic browser headers, use session objects for cookie persistence, implement delays between requests (1-3 seconds), rotate proxy IPs, and respect robots.txt. For production scraping, consider a web scraping API like Mantis that handles anti-detection automatically.

Should I use Requests or httpx for web scraping?

Requests is synchronous and simpler — great for most scraping tasks. httpx supports async (with asyncio) and HTTP/2, making it faster for scraping many URLs concurrently. For simple projects, use Requests. For high-throughput scraping with hundreds of concurrent requests, consider httpx or aiohttp.

How fast can I scrape with Python Requests?

With connection pooling via Session objects, Requests can make 50-200 requests per second to a single server (depending on latency and server capacity). With concurrent.futures ThreadPoolExecutor, you can parallelize to 500+ requests/second across multiple domains. However, most sites will rate-limit or block you well before that — realistic sustainable rates are 1-10 requests/second per domain.

Web Scraping with Python Requests in 2026: The Complete Guide

Published March 16, 2026 · 20 min read · Updated for Requests 2.32+

The Python Requests library is where most web scraping journeys begin — and for good reason. It's the most downloaded Python package on PyPI, with an API so intuitive it practically reads like English. If you need to fetch web pages and extract data, Requests is your foundation.

This guide covers everything from basic GET requests to production-ready scraping patterns — including sessions, authentication, headers, proxies, concurrency, and when to graduate to something more powerful.

Installation & Setup
Your First Scraping Request
HTTP Methods: GET, POST, PUT, DELETE
Request Headers & User-Agent Rotation
Session Objects & Cookie Persistence
Authentication: Basic, Token, OAuth
Parsing HTML with BeautifulSoup
Working with JSON APIs
Scraping Paginated Pages
Proxy Rotation for Anti-Detection
Rate Limiting & Retry Logic
Concurrent Scraping with ThreadPoolExecutor
Production-Ready Scraper Class
Requests vs httpx vs aiohttp vs API
The API Shortcut: Skip HTTP Complexity
FAQ

1. Installation & Setup

Install Requests and BeautifulSoup (for HTML parsing):

pip install requests beautifulsoup4 lxml

Verify your setup:

import requests
from bs4 import BeautifulSoup

print(requests.__version__)  # 2.32.x
print("Ready to scrape!")

💡 Pro tip: Always use a virtual environment (python -m venv venv) to keep your scraping dependencies isolated.

2. Your First Scraping Request

Fetching a web page is one line:

import requests

response = requests.get("https://example.com")

print(response.status_code)   # 200
print(response.headers["content-type"])  # text/html; charset=UTF-8
print(response.text[:500])    # First 500 chars of HTML

The response object gives you everything:

response.status_code — HTTP status (200, 404, 403, etc.)
response.text — Response body as a string (decoded)
response.content — Response body as bytes (for images, PDFs)
response.headers — Response headers (dict-like)
response.url — Final URL after redirects
response.cookies — Cookies set by the server
response.elapsed — Time the request took

Handling Errors Properly

Never assume a request succeeds. Always check:

import requests

response = requests.get("https://example.com")

# Option 1: Check status code
if response.status_code == 200:
    html = response.text
else:
    print(f"Failed: {response.status_code}")

# Option 2: Raise exception on HTTP errors (4xx, 5xx)
response.raise_for_status()  # Raises requests.exceptions.HTTPError

Setting Timeouts (Critical!)

Without a timeout, your scraper can hang forever on unresponsive servers:

# Always set a timeout (seconds)
response = requests.get("https://example.com", timeout=10)

# Separate connect and read timeouts
response = requests.get("https://example.com", timeout=(5, 30))
# 5s to connect, 30s to read

⚠️ Never scrape without timeouts. A single hung request can block your entire scraper. Use timeout=10 as a sensible default.

3. HTTP Methods: GET, POST, PUT, DELETE

Most scraping uses GET, but sometimes you need POST (for forms, search queries, or API endpoints):

import requests

# GET — fetch a page
response = requests.get("https://api.example.com/products")

# POST — submit form data
response = requests.post("https://example.com/search", data={
    "query": "web scraping",
    "page": 1
})

# POST — submit JSON data (API endpoints)
response = requests.post("https://api.example.com/search", json={
    "query": "web scraping",
    "filters": {"category": "tools"}
})

# PUT — update a resource
response = requests.put("https://api.example.com/items/42", json={
    "name": "Updated Item"
})

# DELETE — remove a resource
response = requests.delete("https://api.example.com/items/42")

💡 Scraping tip: Many "AJAX-powered" sites load data via hidden API endpoints. Open your browser's Network tab (F12 → Network → XHR) and look for JSON responses. Hitting these APIs directly with Requests is faster than parsing HTML.

4. Request Headers & User-Agent Rotation

Servers use headers to identify your client. The default Requests user-agent (python-requests/2.32.x) screams "bot" — and many sites block it immediately.

Setting Custom Headers

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36",
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
    "Accept-Language": "en-US,en;q=0.9",
    "Accept-Encoding": "gzip, deflate, br",
    "Referer": "https://www.google.com/",
    "DNT": "1",
    "Connection": "keep-alive",
    "Upgrade-Insecure-Requests": "1"
}

response = requests.get("https://example.com", headers=headers, timeout=10)

Rotating User-Agents

import random

USER_AGENTS = [
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 Chrome/124.0.0.0 Safari/537.36",
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 14_4) AppleWebKit/605.1.15 Safari/605.1.15",
    "Mozilla/5.0 (X11; Linux x86_64; rv:125.0) Gecko/20100101 Firefox/125.0",
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:125.0) Gecko/20100101 Firefox/125.0",
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 14_4) AppleWebKit/537.36 Chrome/124.0.0.0 Safari/537.36",
]

def get_random_headers():
    return {
        "User-Agent": random.choice(USER_AGENTS),
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
        "Accept-Language": "en-US,en;q=0.9",
        "Referer": "https://www.google.com/",
    }

response = requests.get("https://example.com", headers=get_random_headers(), timeout=10)

5. Session Objects & Cookie Persistence

A Session object persists cookies, headers, and connection pools across requests — essential for scraping sites that require login or track sessions:

import requests

session = requests.Session()

# Set default headers for ALL requests in this session
session.headers.update({
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 Chrome/124.0.0.0 Safari/537.36",
    "Accept-Language": "en-US,en;q=0.9",
})

# First request — server sets cookies
response = session.get("https://example.com", timeout=10)

# Subsequent requests automatically include cookies
response = session.get("https://example.com/dashboard", timeout=10)

# Check stored cookies
print(session.cookies.get_dict())
# {'session_id': 'abc123', 'csrf_token': 'xyz789'}

Login with Session

session = requests.Session()

# Step 1: GET the login page (grab CSRF token)
login_page = session.get("https://example.com/login", timeout=10)
# Parse CSRF token from HTML with BeautifulSoup...

# Step 2: POST credentials
login_response = session.post("https://example.com/login", data={
    "username": "your_user",
    "password": "your_pass",
    "csrf_token": csrf_token,  # From step 1
}, timeout=10)

# Step 3: Access protected pages (cookies are automatic)
dashboard = session.get("https://example.com/dashboard", timeout=10)
print(dashboard.status_code)  # 200 if login succeeded

💡 Performance benefit: Session objects reuse TCP connections via connection pooling. This makes sequential requests to the same host 2-5x faster than individual requests.get() calls.

6. Authentication: Basic, Token, OAuth

Basic Authentication

from requests.auth import HTTPBasicAuth

response = requests.get(
    "https://api.example.com/data",
    auth=HTTPBasicAuth("username", "password"),
    timeout=10
)

# Shorthand (tuple)
response = requests.get(
    "https://api.example.com/data",
    auth=("username", "password"),
    timeout=10
)

Bearer Token (API Keys)

headers = {
    "Authorization": "Bearer your_api_key_here",
    "Content-Type": "application/json"
}

response = requests.get("https://api.example.com/data", headers=headers, timeout=10)

Custom Auth (OAuth 2.0 Token Refresh)

from requests.auth import AuthBase

class TokenAuth(AuthBase):
    def __init__(self, token):
        self.token = token

    def __call__(self, r):
        r.headers["Authorization"] = f"Bearer {self.token}"
        return r

response = requests.get(
    "https://api.example.com/data",
    auth=TokenAuth("your_token"),
    timeout=10
)

7. Parsing HTML with BeautifulSoup

Requests fetches the HTML. BeautifulSoup parses it. Together, they're the classic scraping duo:

import requests
from bs4 import BeautifulSoup

# Fetch the page
response = requests.get("https://news.ycombinator.com", timeout=10)
response.raise_for_status()

# Parse with BeautifulSoup
soup = BeautifulSoup(response.text, "lxml")

# Extract all story titles
stories = soup.select(".titleline > a")
for story in stories[:10]:
    print(f"{story.text} → {story['href']}")

Extracting Structured Data

import requests
from bs4 import BeautifulSoup
import json

response = requests.get("https://example.com/products", timeout=10)
soup = BeautifulSoup(response.text, "lxml")

products = []
for card in soup.select(".product-card"):
    product = {
        "name": card.select_one(".product-name").text.strip(),
        "price": card.select_one(".price").text.strip(),
        "url": card.select_one("a")["href"],
        "rating": card.select_one(".rating").text.strip() if card.select_one(".rating") else None,
    }
    products.append(product)

# Save as JSON
with open("products.json", "w") as f:
    json.dump(products, f, indent=2)

print(f"Scraped {len(products)} products")

For a deep dive into BeautifulSoup, see our Complete BeautifulSoup Guide.

8. Working with JSON APIs

Many modern websites load data via JSON APIs behind the scenes. Scraping these directly is faster and more reliable than parsing HTML:

import requests

# Hit the API directly
response = requests.get("https://api.example.com/products", params={
    "category": "electronics",
    "page": 1,
    "per_page": 50
}, timeout=10)

data = response.json()  # Parse JSON response

for product in data["results"]:
    print(f"{product['name']} — ${product['price']}")

Finding Hidden APIs

Here's how to discover the API endpoints a website uses:

Open Chrome DevTools (F12)
Go to Network tab → filter by XHR/Fetch
Interact with the page (search, paginate, filter)
Look at the requests — copy the URL, headers, and payload
Reproduce the request with Python Requests

# Reproduce a discovered API call
response = requests.get(
    "https://example.com/api/v2/search",
    params={"q": "laptop", "sort": "price_asc", "page": 1},
    headers={
        "User-Agent": "Mozilla/5.0 ...",
        "X-Requested-With": "XMLHttpRequest",  # Often required
        "Referer": "https://example.com/search?q=laptop",
    },
    timeout=10
)

results = response.json()

💡 Always check for APIs first. Parsing JSON is 10x easier than parsing HTML. Many sites have undocumented APIs that return clean, structured data.

9. Scraping Paginated Pages

URL-Based Pagination

import requests
from bs4 import BeautifulSoup
import time

all_products = []

for page in range(1, 51):  # Pages 1-50
    response = requests.get(
        f"https://example.com/products?page={page}",
        headers=get_random_headers(),
        timeout=10
    )

    if response.status_code != 200:
        print(f"Page {page} failed: {response.status_code}")
        break

    soup = BeautifulSoup(response.text, "lxml")
    products = soup.select(".product-card")

    if not products:  # No more products — we've reached the end
        break

    for p in products:
        all_products.append({
            "name": p.select_one(".name").text.strip(),
            "price": p.select_one(".price").text.strip(),
        })

    print(f"Page {page}: {len(products)} products")
    time.sleep(1.5)  # Be polite — don't hammer the server

print(f"Total: {len(all_products)} products")

API Pagination with Cursors

import requests

all_items = []
cursor = None

while True:
    params = {"limit": 100}
    if cursor:
        params["cursor"] = cursor

    response = requests.get(
        "https://api.example.com/items",
        params=params,
        timeout=10
    )
    data = response.json()

    all_items.extend(data["items"])
    cursor = data.get("next_cursor")

    if not cursor:  # No more pages
        break

print(f"Total items: {len(all_items)}")

10. Proxy Rotation for Anti-Detection

Using the same IP for thousands of requests will get you blocked. Proxy rotation distributes requests across multiple IPs:

import requests
import random

PROXIES = [
    "http://user:pass@proxy1.example.com:8080",
    "http://user:pass@proxy2.example.com:8080",
    "http://user:pass@proxy3.example.com:8080",
]

def get_with_proxy(url, **kwargs):
    proxy = random.choice(PROXIES)
    return requests.get(
        url,
        proxies={"http": proxy, "https": proxy},
        timeout=10,
        **kwargs
    )

response = get_with_proxy("https://example.com")

Proxy with Session

session = requests.Session()
session.proxies = {
    "http": "http://user:pass@proxy.example.com:8080",
    "https": "http://user:pass@proxy.example.com:8080",
}

# All requests through this session use the proxy
response = session.get("https://example.com", timeout=10)

SOCKS5 Proxy (Tor)

# pip install requests[socks]
import requests

session = requests.Session()
session.proxies = {
    "http": "socks5h://127.0.0.1:9050",
    "https": "socks5h://127.0.0.1:9050",
}

response = session.get("https://check.torproject.org", timeout=30)

⚠️ Free proxies are unreliable and insecure. For production scraping, use paid residential proxies or skip proxy management entirely with a scraping API.

11. Rate Limiting & Retry Logic

Responsible scraping means not overloading servers. Here's a robust retry pattern:

import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
import time

def create_scraping_session(retries=3, backoff=0.5):
    session = requests.Session()

    retry_strategy = Retry(
        total=retries,
        backoff_factor=backoff,       # 0.5s, 1s, 2s between retries
        status_forcelist=[429, 500, 502, 503, 504],
        allowed_methods=["GET", "POST"],
    )

    adapter = HTTPAdapter(
        max_retries=retry_strategy,
        pool_connections=10,
        pool_maxsize=10,
    )

    session.mount("http://", adapter)
    session.mount("https://", adapter)

    return session

# Usage
session = create_scraping_session()
response = session.get("https://example.com", timeout=10)

Respecting Rate Limits

import time

def rate_limited_get(session, url, delay=1.5, **kwargs):
    """GET with rate limiting."""
    response = session.get(url, timeout=10, **kwargs)

    # Check for rate limit response
    if response.status_code == 429:
        retry_after = int(response.headers.get("Retry-After", 60))
        print(f"Rate limited. Waiting {retry_after}s...")
        time.sleep(retry_after)
        response = session.get(url, timeout=10, **kwargs)

    time.sleep(delay)  # Polite delay between requests
    return response

12. Concurrent Scraping with ThreadPoolExecutor

Sequential scraping is slow. Use threads to scrape multiple pages simultaneously:

import requests
from concurrent.futures import ThreadPoolExecutor, as_completed
from bs4 import BeautifulSoup
import time

session = requests.Session()
session.headers.update({
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 Chrome/124.0.0.0 Safari/537.36"
})

def scrape_page(url):
    """Scrape a single page and return extracted data."""
    try:
        response = session.get(url, timeout=15)
        response.raise_for_status()
        soup = BeautifulSoup(response.text, "lxml")
        title = soup.select_one("title").text.strip()
        return {"url": url, "title": title, "status": "success"}
    except Exception as e:
        return {"url": url, "title": None, "status": f"error: {e}"}

# Generate URLs
urls = [f"https://example.com/page/{i}" for i in range(1, 101)]

# Scrape concurrently (10 threads)
results = []
with ThreadPoolExecutor(max_workers=10) as executor:
    futures = {executor.submit(scrape_page, url): url for url in urls}

    for future in as_completed(futures):
        result = future.result()
        results.append(result)
        if result["status"] == "success":
            print(f"✓ {result['url']} — {result['title']}")
        else:
            print(f"✗ {result['url']} — {result['status']}")

print(f"\nScraped {len(results)} pages")

💡 Thread count: Start with 5-10 threads. More threads = faster, but also more likely to trigger rate limits. For a single domain, 5 threads is usually the sweet spot.

13. Production-Ready Scraper Class

Here's a complete, reusable scraper class with all best practices built in:

import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
from bs4 import BeautifulSoup
from concurrent.futures import ThreadPoolExecutor, as_completed
import random
import time
import json
import csv
import logging

logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)s %(message)s")
logger = logging.getLogger(__name__)

class WebScraper:
    """Production-ready web scraper with retry, rotation, and export."""

    USER_AGENTS = [
        "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 Chrome/124.0.0.0 Safari/537.36",
        "Mozilla/5.0 (Macintosh; Intel Mac OS X 14_4) AppleWebKit/605.1.15 Safari/605.1.15",
        "Mozilla/5.0 (X11; Linux x86_64; rv:125.0) Gecko/20100101 Firefox/125.0",
        "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:125.0) Gecko/20100101 Firefox/125.0",
    ]

    def __init__(self, delay=1.5, max_retries=3, timeout=15, max_workers=5):
        self.delay = delay
        self.timeout = timeout
        self.max_workers = max_workers
        self.session = self._create_session(max_retries)
        self.results = []

    def _create_session(self, max_retries):
        session = requests.Session()
        retry = Retry(
            total=max_retries,
            backoff_factor=1,
            status_forcelist=[429, 500, 502, 503, 504],
        )
        adapter = HTTPAdapter(max_retries=retry, pool_maxsize=20)
        session.mount("http://", adapter)
        session.mount("https://", adapter)
        return session

    def _get_headers(self):
        return {
            "User-Agent": random.choice(self.USER_AGENTS),
            "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
            "Accept-Language": "en-US,en;q=0.9",
            "Accept-Encoding": "gzip, deflate, br",
            "Referer": "https://www.google.com/",
        }

    def get(self, url):
        """Fetch a URL with rotation and delay."""
        response = self.session.get(
            url, headers=self._get_headers(), timeout=self.timeout
        )
        response.raise_for_status()
        time.sleep(self.delay + random.uniform(0, 0.5))
        return response

    def get_soup(self, url):
        """Fetch and parse a URL."""
        response = self.get(url)
        return BeautifulSoup(response.text, "lxml")

    def scrape_urls(self, urls, parse_fn):
        """Scrape multiple URLs concurrently."""
        with ThreadPoolExecutor(max_workers=self.max_workers) as executor:
            futures = {executor.submit(self._safe_scrape, url, parse_fn): url for url in urls}
            for future in as_completed(futures):
                result = future.result()
                if result:
                    self.results.append(result)
        logger.info(f"Scraped {len(self.results)} items from {len(urls)} URLs")
        return self.results

    def _safe_scrape(self, url, parse_fn):
        try:
            soup = self.get_soup(url)
            return parse_fn(soup, url)
        except Exception as e:
            logger.error(f"Failed: {url} — {e}")
            return None

    def to_json(self, filepath):
        with open(filepath, "w") as f:
            json.dump(self.results, f, indent=2)
        logger.info(f"Saved {len(self.results)} items to {filepath}")

    def to_csv(self, filepath):
        if not self.results:
            return
        with open(filepath, "w", newline="") as f:
            writer = csv.DictWriter(f, fieldnames=self.results[0].keys())
            writer.writeheader()
            writer.writerows(self.results)
        logger.info(f"Saved {len(self.results)} items to {filepath}")


# --- Usage Example ---

def parse_product(soup, url):
    """Custom parsing function for a product page."""
    return {
        "url": url,
        "title": soup.select_one("h1").text.strip() if soup.select_one("h1") else "N/A",
        "price": soup.select_one(".price").text.strip() if soup.select_one(".price") else "N/A",
        "description": soup.select_one(".description").text.strip()[:200] if soup.select_one(".description") else "N/A",
    }

scraper = WebScraper(delay=1.0, max_workers=5)
urls = [f"https://example.com/product/{i}" for i in range(1, 101)]
results = scraper.scrape_urls(urls, parse_product)
scraper.to_json("products.json")
scraper.to_csv("products.csv")

14. Requests vs httpx vs aiohttp vs API

Feature	Requests	httpx	aiohttp	Mantis API
Async support	❌ No	✅ Yes	✅ Yes	✅ Yes
HTTP/2	❌ No	✅ Yes	❌ No	✅ Yes
Connection pooling	✅ Session	✅ Client	✅ Connector	✅ Managed
JS rendering	❌ No	❌ No	❌ No	✅ Yes
Anti-bot bypass	❌ Manual	❌ Manual	❌ Manual	✅ Automatic
Proxy management	❌ Manual	❌ Manual	❌ Manual	✅ Built-in
Learning curve	⭐ Easy	⭐ Easy	⭐⭐ Medium	⭐ Easy
Best for	Simple projects	Async + HTTP/2	High concurrency	Production at scale
Monthly cost	$0 + proxies ($100-500)	$0 + proxies ($100-500)	$0 + proxies ($100-500)	$29-299 all-inclusive

💡 When to switch from Requests:

Need async? → httpx (drop-in replacement with async support)
Need maximum concurrency? → aiohttp (purpose-built for async HTTP)
Need JS rendering + anti-detection? → Mantis API (handles everything)
Simple, synchronous scraping? → Stick with Requests

15. The API Shortcut: Skip HTTP Complexity

Building a production scraper with Requests means managing headers, proxies, retries, rate limits, CAPTCHAs, and anti-bot detection — all yourself. Or you can make one API call:

import requests

# DIY with Requests: 50+ lines of proxy rotation, header management, retry logic
# ...or...

# Mantis API: one call, done
response = requests.post(
    "https://api.mantisapi.com/v1/scrape",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    json={
        "url": "https://example.com/products",
        "extract": {
            "products": {
                "selector": ".product-card",
                "fields": {
                    "name": ".product-name",
                    "price": ".price",
                    "rating": ".rating"
                }
            }
        }
    }
)

data = response.json()
for product in data["products"]:
    print(f"{product['name']} — {product['price']}")

Stop Managing HTTP Infrastructure

Mantis handles proxies, headers, retries, JS rendering, and anti-detection. You write one API call.

Start Free → 100 requests/month

When to Use Requests vs an API

Use Requests when: Scraping simple, static sites with no anti-bot protection; learning web scraping; budget is $0; less than 1,000 pages/month
Use an API when: Scraping at scale (10K+ pages/month); sites have anti-bot protection; you need JS rendering; you value your engineering time; production reliability matters

FAQ

See the FAQ section above for answers to common questions about web scraping with Python Requests.

Next Steps

Web Scraping with BeautifulSoup — Deep dive into HTML parsing
Web Scraping with Scrapy — Full framework for large-scale crawling
Web Scraping with Playwright — Handle JavaScript-rendered pages
Web Scraping with Selenium — Browser automation for dynamic sites
How to Scrape Without Getting Blocked — Anti-detection techniques
Best Web Scraping APIs Comparison — Find the right tool for your needs

Web Scraping with Python Requests in 2026: The Complete Guide

Table of Contents

1. Installation & Setup

2. Your First Scraping Request

Handling Errors Properly

Setting Timeouts (Critical!)

3. HTTP Methods: GET, POST, PUT, DELETE

4. Request Headers & User-Agent Rotation

Setting Custom Headers

Rotating User-Agents

5. Session Objects & Cookie Persistence

Login with Session

6. Authentication: Basic, Token, OAuth

Basic Authentication

Bearer Token (API Keys)

Custom Auth (OAuth 2.0 Token Refresh)

7. Parsing HTML with BeautifulSoup

Extracting Structured Data

8. Working with JSON APIs

Finding Hidden APIs

9. Scraping Paginated Pages

URL-Based Pagination

API Pagination with Cursors

10. Proxy Rotation for Anti-Detection

Proxy with Session

SOCKS5 Proxy (Tor)

11. Rate Limiting & Retry Logic

Respecting Rate Limits

12. Concurrent Scraping with ThreadPoolExecutor

13. Production-Ready Scraper Class

14. Requests vs httpx vs aiohttp vs API

15. The API Shortcut: Skip HTTP Complexity

Stop Managing HTTP Infrastructure

When to Use Requests vs an API

FAQ

Next Steps