What is the best Python library for scraping Google?

For simple SERP scraping, requests + BeautifulSoup works well. For JavaScript-rendered results or anti-blocking, Playwright or Selenium are better choices. For production at scale, a dedicated SERP API or web scraping API like Mantis handles proxies, CAPTCHAs, and rate limiting automatically.

Can I use Google's official API instead of scraping?

Yes. The Google Custom Search JSON API provides structured results for up to 100 free queries/day ($5 per 1,000 after that). However, it only searches sites you configure (not the full Google index), and results may differ from regular Google Search. For full SERP data, scraping or a SERP API is needed.

How many Google results can I scrape per day?

Without proxies, Google typically blocks after 50-100 requests. With rotating residential proxies, you can scale to thousands of queries per day. With a web scraping API like Mantis, you can make up to 100,000 requests per month on the Scale plan without managing any infrastructure.

What data can I extract from Google search results?

You can extract: organic result titles, URLs, and snippets; featured snippets; People Also Ask questions; Knowledge Graph data; local pack results; image results; video carousels; ads data; pagination links; and related searches. The exact data depends on the query type and your scraping method.

How to Scrape Google Search Results in 2026 (Python, Node.js, API)

Q: Is it legal to scrape Google search results?

Scraping publicly available Google search results is generally legal in the US under the hiQ Labs v. LinkedIn precedent (2022). However, Google's Terms of Service prohibit automated access. Using an API-based approach or official Google APIs reduces legal risk. Always consult legal counsel for your specific use case.

Q: How do I scrape Google without getting blocked?

To avoid blocks: rotate IP addresses using proxy pools, randomize request delays (2-10 seconds), rotate User-Agent strings, use headless browsers for JavaScript rendering, and limit request volume. Alternatively, use a web scraping API like Mantis that handles anti-blocking automatically.

Why Scrape Google Search Results?

Google processes over 8.5 billion searches per day. That search data is a goldmine for:

SEO monitoring — Track your rankings and competitors' positions for target keywords
Market research — Understand what people search for in your industry
Lead generation — Find businesses ranking for specific terms
Content strategy — Discover topics, questions, and gaps to fill
Ad intelligence — Monitor competitors' paid search campaigns
Price monitoring — Track shopping results and product prices
AI agent data — Give AI agents real-time web search capabilities

Whether you're building an SEO tool, a research platform, or an AI agent that needs web search, scraping Google SERPs is a foundational capability.

What Data Can You Extract?

A Google search results page contains multiple data types:

SERP Feature	Data Available	CSS Selector Hint
Organic Results	Title, URL, snippet, position	#search .g
Featured Snippet	Answer text, source URL	.xpdopen
People Also Ask	Questions, expandable answers	.related-question-pair
Knowledge Graph	Entity info, images, facts	.kp-wholepage
Local Pack	Business name, rating, address	.VkpGBb
Image Results	Image URLs, source pages	#imagebox_bigimages
Shopping Results	Product, price, store	.commercial-unit-desktop-top
Related Searches	Suggested queries	#brs

⚠️ Selectors Change Frequently

Google updates its HTML structure regularly. The selectors above are approximate — always verify against the current page. Using an API that returns structured data (like Mantis) avoids selector maintenance entirely.

Method 1: Python + BeautifulSoup

The simplest approach for small-scale SERP scraping. Sends HTTP requests and parses the HTML response.

Installation

pip install requests beautifulsoup4 lxml

Basic Google SERP Scraper

import requests
from bs4 import BeautifulSoup
import time
import random

def scrape_google(query, num_results=10):
    """Scrape Google search results for a given query."""
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
                      "AppleWebKit/537.36 (KHTML, like Gecko) "
                      "Chrome/120.0.0.0 Safari/537.36",
        "Accept-Language": "en-US,en;q=0.9",
    }

    params = {
        "q": query,
        "num": num_results,
        "hl": "en",
        "gl": "us",
    }

    response = requests.get(
        "https://www.google.com/search",
        params=params,
        headers=headers,
        timeout=10,
    )
    response.raise_for_status()

    soup = BeautifulSoup(response.text, "lxml")
    results = []

    for g in soup.select("#search .g"):
        title_el = g.select_one("h3")
        link_el = g.select_one("a[href]")
        snippet_el = g.select_one(".VwiC3b, .IsZvec")

        if title_el and link_el:
            href = link_el["href"]
            if href.startswith("/url?q="):
                href = href.split("/url?q=")[1].split("&")[0]

            results.append({
                "position": len(results) + 1,
                "title": title_el.get_text(),
                "url": href,
                "snippet": snippet_el.get_text() if snippet_el else "",
            })

    return results

# Usage
results = scrape_google("web scraping API 2026")
for r in results:
    print(f"{r['position']}. {r['title']}")
    print(f"   {r['url']}")
    print(f"   {r['snippet'][:100]}...")
    print()

Adding Proxy Rotation

import itertools

PROXIES = [
    "http://user:pass@proxy1.example.com:8080",
    "http://user:pass@proxy2.example.com:8080",
    "http://user:pass@proxy3.example.com:8080",
]
proxy_pool = itertools.cycle(PROXIES)

def scrape_with_proxy(query):
    proxy = next(proxy_pool)
    response = requests.get(
        "https://www.google.com/search",
        params={"q": query, "num": 10, "hl": "en"},
        headers={"User-Agent": random.choice(USER_AGENTS)},
        proxies={"http": proxy, "https": proxy},
        timeout=15,
    )
    # ... parse as before

🔑 Pro Tip

Residential proxies work best for Google. Datacenter IPs get flagged quickly. Budget $50-200/month for a reliable residential proxy pool — or skip proxies entirely by using a scraping API.

Method 2: Playwright (Headless Browser)

When Google requires JavaScript rendering or you need to interact with the page (clicking "People Also Ask", loading more results):

import asyncio
from playwright.async_api import async_playwright

async def scrape_google_playwright(query):
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        context = await browser.new_context(
            user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
                       "AppleWebKit/537.36 Chrome/120.0.0.0 Safari/537.36",
            viewport={"width": 1920, "height": 1080},
            locale="en-US",
        )
        page = await context.new_page()

        # Navigate to Google
        await page.goto(
            f"https://www.google.com/search?q={query}&hl=en&gl=us",
            wait_until="networkidle",
        )

        # Handle consent page (EU/GDPR)
        try:
            await page.click("button:has-text('Accept all')", timeout=3000)
            await page.wait_for_load_state("networkidle")
        except:
            pass

        # Extract organic results
        results = await page.evaluate("""
            () => {
                const items = document.querySelectorAll('#search .g');
                return Array.from(items).map((el, i) => ({
                    position: i + 1,
                    title: el.querySelector('h3')?.textContent || '',
                    url: el.querySelector('a')?.href || '',
                    snippet: el.querySelector('.VwiC3b, .IsZvec')?.textContent || '',
                })).filter(r => r.title && r.url);
            }
        """)

        # Extract People Also Ask
        paa = await page.evaluate("""
            () => {
                const questions = document.querySelectorAll('.related-question-pair');
                return Array.from(questions).map(q => ({
                    question: q.querySelector('[data-q]')?.getAttribute('data-q')
                              || q.textContent?.trim() || '',
                }));
            }
        """)

        await browser.close()
        return {"organic": results, "people_also_ask": paa}

# Run
data = asyncio.run(scrape_google_playwright("best web scraping tools 2026"))
for r in data["organic"]:
    print(f"{r['position']}. {r['title']}")

Method 3: Node.js + Cheerio

Fast, lightweight scraping for Node.js applications:

import axios from 'axios';
import * as cheerio from 'cheerio';

async function scrapeGoogle(query, numResults = 10) {
  const { data } = await axios.get('https://www.google.com/search', {
    params: { q: query, num: numResults, hl: 'en', gl: 'us' },
    headers: {
      'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) '
        + 'AppleWebKit/537.36 Chrome/120.0.0.0 Safari/537.36',
      'Accept-Language': 'en-US,en;q=0.9',
    },
    timeout: 10000,
  });

  const $ = cheerio.load(data);
  const results = [];

  $('#search .g').each((i, el) => {
    const title = $(el).find('h3').text();
    const url = $(el).find('a').attr('href') || '';
    const snippet = $(el).find('.VwiC3b, .IsZvec').text();

    if (title && url.startsWith('http')) {
      results.push({
        position: results.length + 1,
        title,
        url,
        snippet,
      });
    }
  });

  return results;
}

// Usage
const results = await scrapeGoogle('web scraping API for AI agents');
results.forEach(r => {
  console.log(`${r.position}. ${r.title}`);
  console.log(`   ${r.url}\n`);
});

Method 4: Web Scraping API (Easiest)

The simplest production approach — no proxies, no selectors, no maintenance. Send the URL, get structured data back.

With Mantis API

import requests

# Scrape Google and extract structured data in one call
response = requests.post(
    "https://api.mantisapi.com/v1/extract",
    headers={
        "Authorization": "Bearer YOUR_API_KEY",
        "Content-Type": "application/json",
    },
    json={
        "url": "https://www.google.com/search?q=web+scraping+API+2026&num=10&hl=en",
        "schema": {
            "results": [{
                "position": "number",
                "title": "string",
                "url": "string",
                "snippet": "string",
            }],
            "people_also_ask": ["string"],
            "related_searches": ["string"],
        },
    },
)

data = response.json()
for r in data["results"]:
    print(f"{r['position']}. {r['title']}")
    print(f"   {r['url']}")

Or use the scrape endpoint for raw HTML

# Get the raw HTML — Mantis handles proxies, rendering, anti-blocking
response = requests.post(
    "https://api.mantisapi.com/v1/scrape",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    json={
        "url": "https://www.google.com/search?q=web+scraping+API&hl=en",
        "render_js": True,
        "wait_for": "#search",
    },
)

html = response.json()["html"]
# Parse with BeautifulSoup as in Method 1

Skip the Proxy Headaches

Mantis handles anti-blocking, proxy rotation, and JavaScript rendering. Extract structured data from any Google SERP in one API call.

View Pricing Try Mantis Free →

Avoiding Google's Anti-Bot Detection

Google is one of the hardest sites to scrape at scale. Here's what works:

Technique	Effectiveness	Cost
Rotate User-Agent strings	⭐⭐ Low	Free
Random delays (2-10s)	⭐⭐⭐ Medium	Free (slower)
Datacenter proxies	⭐⭐ Low	$10-50/mo
Residential proxies	⭐⭐⭐⭐ High	$50-300/mo
Headless browser + stealth	⭐⭐⭐ Medium	Server costs
Web scraping API	⭐⭐⭐⭐⭐ Highest	$29-299/mo

Essential Anti-Detection Checklist

Rotate IPs — Never send more than 10 requests from the same IP in a minute
Randomize delays — time.sleep(random.uniform(2, 8)) between requests
Vary User-Agents — Maintain a pool of 20+ current browser UA strings
Accept cookies — Handle Google's consent page, keep session cookies
Geo-target correctly — Use &gl=us&hl=en params consistently
Handle CAPTCHAs gracefully — Detect and rotate IP when you hit one
Respect robots.txt — Google's robots.txt disallows /search; consider the implications

import random
import time

USER_AGENTS = [
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 Chrome/120.0.0.0 Safari/537.36",
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 14_2) AppleWebKit/537.36 Chrome/120.0.0.0 Safari/537.36",
    "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 Chrome/120.0.0.0 Safari/537.36",
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:121.0) Gecko/20100101 Firefox/121.0",
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 14.2; rv:121.0) Gecko/20100101 Firefox/121.0",
]

def polite_scrape(queries):
    """Scrape multiple queries with anti-detection measures."""
    results = {}
    for query in queries:
        time.sleep(random.uniform(3, 10))  # Random delay
        headers = {"User-Agent": random.choice(USER_AGENTS)}
        try:
            data = scrape_google(query)  # from Method 1
            results[query] = data
        except requests.exceptions.HTTPError as e:
            if e.response.status_code == 429:
                print(f"Rate limited on '{query}' — waiting 60s")
                time.sleep(60)
            else:
                raise
    return results

Google Custom Search API vs Scraping

Google offers an official Custom Search JSON API. Here's how it compares to scraping:

Feature	Custom Search API	Scraping	Scraping API (Mantis)
Results source	Custom search engine only	Full Google index	Full Google index
Free tier	100 queries/day	Unlimited (with blocks)	100 requests/month
Cost at 10K/month	$50	$100-300 (proxies)	$99 (Pro plan)
SERP features	Limited (organic only)	All features	All features
Maintenance	Low	High (selectors break)	None
Rate limits	Strict	Depends on proxies	Plan-based
Legal risk	None (authorized)	Medium (ToS violation)	Low (API handles it)

💡 When to Use Each

Google Custom Search API: Best for low-volume, basic search integration (100/day free). Limited to your configured sites.
DIY scraping: Best for learning, prototyping, or when you need full control and have proxy infrastructure.
Web scraping API: Best for production — handles anti-blocking, scales easily, structured data extraction.

Method Comparison

Method	Difficulty	Scale	Maintenance	Best For
Python + requests	Easy	Low (50-100/day)	High	Learning, prototyping
Playwright/Selenium	Medium	Medium (with proxies)	High	JS-rendered content, PAA expansion
Node.js + Cheerio	Easy	Low (50-100/day)	High	Node.js codebases
Scraping API (Mantis)	Easiest	High (100K/month)	None	Production, AI agents
Google Custom Search	Easy	Low (100/day free)	Low	Basic search, limited sites

Real-World Use Cases

1. SEO Rank Tracking

def track_rankings(domain, keywords):
    """Track where a domain ranks for given keywords."""
    rankings = {}
    for kw in keywords:
        results = scrape_google(kw, num_results=50)
        for r in results:
            if domain in r["url"]:
                rankings[kw] = r["position"]
                break
        else:
            rankings[kw] = None  # Not in top 50
        time.sleep(random.uniform(5, 15))
    return rankings

# Track your rankings
my_rankings = track_rankings("mantisapi.com", [
    "web scraping API",
    "AI agent web scraping",
    "headless browser API",
])
print(my_rankings)

2. Competitor Ad Monitoring

def find_competitor_ads(keywords):
    """Find which competitors are running Google Ads for given keywords."""
    # Use Playwright to capture ad blocks
    # Ads appear in .uEierd or .commercial-unit-desktop-top
    pass  # Full implementation in our GitHub repo

3. AI Agent with Search Capability

import requests

def agent_search_tool(query: str) -> str:
    """Give an AI agent the ability to search Google."""
    response = requests.post(
        "https://api.mantisapi.com/v1/extract",
        headers={"Authorization": "Bearer YOUR_API_KEY"},
        json={
            "url": f"https://www.google.com/search?q={query}&hl=en&num=5",
            "schema": {
                "results": [{
                    "title": "string",
                    "url": "string",
                    "snippet": "string",
                }]
            },
        },
    )
    results = response.json()["results"]
    return "\n".join(
        f"- {r['title']}: {r['snippet']} ({r['url']})"
        for r in results[:5]
    )

# Use with LangChain, CrewAI, or any agent framework
# tool = Tool(name="google_search", func=agent_search_tool, ...)

Legal Considerations

Important points about scraping Google:

Terms of Service: Google's ToS prohibits automated access to search results. Violating ToS isn't necessarily illegal but can result in IP bans.
hiQ v. LinkedIn (2022): The US Ninth Circuit ruled that scraping publicly available data doesn't violate the CFAA. This sets a favorable precedent for public web scraping.
GDPR/CCPA: If you're collecting personal data from search results, data protection laws apply.
robots.txt: Google's robots.txt disallows /search. While not legally binding, respecting it shows good faith.
Safe approach: Use Google's official API for authorized access, or a web scraping API that handles compliance.

⚖️ Not Legal Advice

This article is for educational purposes. Consult a lawyer for your specific use case, especially for commercial scraping at scale.

Frequently Asked Questions

Is it legal to scrape Google search results?

Scraping publicly available data is generally legal under the hiQ v. LinkedIn precedent, but Google's ToS prohibit automated access. Using an API-based approach reduces legal risk. Consult legal counsel for commercial use.

How do I scrape Google without getting blocked?

Rotate IPs (residential proxies), randomize delays (2-10s), rotate User-Agents, and limit volume. Or use a web scraping API like Mantis that handles anti-blocking automatically.

What's the best Python library for scraping Google?

For simple scraping: requests + BeautifulSoup. For JS rendering: Playwright. For production: a web scraping API.

How many results can I scrape per day?

Without proxies: 50-100 before blocks. With residential proxies: thousands. With Mantis: up to 100,000/month on the Scale plan.

Ready to Scrape Google at Scale?

Start with 100 free API calls/month. No proxies, no CAPTCHAs, no broken selectors.

View Pricing Get Started Free →