๐Ÿ”ฑ Mantis

How to Scrape Google Search Results in 2026

Extract titles, URLs, snippets, and SERP features using Python, Node.js, and API-based approaches โ€” with production-ready code.

๐Ÿ“‘ Table of Contents

Why Scrape Google Search Results?

Google processes over 8.5 billion searches per day. That search data is a goldmine for:

Whether you're building an SEO tool, a research platform, or an AI agent that needs web search, scraping Google SERPs is a foundational capability.

What Data Can You Extract?

A Google search results page contains multiple data types:

SERP FeatureData AvailableCSS Selector Hint
Organic ResultsTitle, URL, snippet, position#search .g
Featured SnippetAnswer text, source URL.xpdopen
People Also AskQuestions, expandable answers.related-question-pair
Knowledge GraphEntity info, images, facts.kp-wholepage
Local PackBusiness name, rating, address.VkpGBb
Image ResultsImage URLs, source pages#imagebox_bigimages
Shopping ResultsProduct, price, store.commercial-unit-desktop-top
Related SearchesSuggested queries#brs
โš ๏ธ Selectors Change Frequently

Google updates its HTML structure regularly. The selectors above are approximate โ€” always verify against the current page. Using an API that returns structured data (like Mantis) avoids selector maintenance entirely.

Method 1: Python + BeautifulSoup

The simplest approach for small-scale SERP scraping. Sends HTTP requests and parses the HTML response.

Installation

pip install requests beautifulsoup4 lxml

Basic Google SERP Scraper

import requests
from bs4 import BeautifulSoup
import time
import random

def scrape_google(query, num_results=10):
    """Scrape Google search results for a given query."""
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
                      "AppleWebKit/537.36 (KHTML, like Gecko) "
                      "Chrome/120.0.0.0 Safari/537.36",
        "Accept-Language": "en-US,en;q=0.9",
    }

    params = {
        "q": query,
        "num": num_results,
        "hl": "en",
        "gl": "us",
    }

    response = requests.get(
        "https://www.google.com/search",
        params=params,
        headers=headers,
        timeout=10,
    )
    response.raise_for_status()

    soup = BeautifulSoup(response.text, "lxml")
    results = []

    for g in soup.select("#search .g"):
        title_el = g.select_one("h3")
        link_el = g.select_one("a[href]")
        snippet_el = g.select_one(".VwiC3b, .IsZvec")

        if title_el and link_el:
            href = link_el["href"]
            if href.startswith("/url?q="):
                href = href.split("/url?q=")[1].split("&")[0]

            results.append({
                "position": len(results) + 1,
                "title": title_el.get_text(),
                "url": href,
                "snippet": snippet_el.get_text() if snippet_el else "",
            })

    return results

# Usage
results = scrape_google("web scraping API 2026")
for r in results:
    print(f"{r['position']}. {r['title']}")
    print(f"   {r['url']}")
    print(f"   {r['snippet'][:100]}...")
    print()

Adding Proxy Rotation

import itertools

PROXIES = [
    "http://user:pass@proxy1.example.com:8080",
    "http://user:pass@proxy2.example.com:8080",
    "http://user:pass@proxy3.example.com:8080",
]
proxy_pool = itertools.cycle(PROXIES)

def scrape_with_proxy(query):
    proxy = next(proxy_pool)
    response = requests.get(
        "https://www.google.com/search",
        params={"q": query, "num": 10, "hl": "en"},
        headers={"User-Agent": random.choice(USER_AGENTS)},
        proxies={"http": proxy, "https": proxy},
        timeout=15,
    )
    # ... parse as before
๐Ÿ”‘ Pro Tip

Residential proxies work best for Google. Datacenter IPs get flagged quickly. Budget $50-200/month for a reliable residential proxy pool โ€” or skip proxies entirely by using a scraping API.

Method 2: Playwright (Headless Browser)

When Google requires JavaScript rendering or you need to interact with the page (clicking "People Also Ask", loading more results):

import asyncio
from playwright.async_api import async_playwright

async def scrape_google_playwright(query):
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        context = await browser.new_context(
            user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
                       "AppleWebKit/537.36 Chrome/120.0.0.0 Safari/537.36",
            viewport={"width": 1920, "height": 1080},
            locale="en-US",
        )
        page = await context.new_page()

        # Navigate to Google
        await page.goto(
            f"https://www.google.com/search?q={query}&hl=en&gl=us",
            wait_until="networkidle",
        )

        # Handle consent page (EU/GDPR)
        try:
            await page.click("button:has-text('Accept all')", timeout=3000)
            await page.wait_for_load_state("networkidle")
        except:
            pass

        # Extract organic results
        results = await page.evaluate("""
            () => {
                const items = document.querySelectorAll('#search .g');
                return Array.from(items).map((el, i) => ({
                    position: i + 1,
                    title: el.querySelector('h3')?.textContent || '',
                    url: el.querySelector('a')?.href || '',
                    snippet: el.querySelector('.VwiC3b, .IsZvec')?.textContent || '',
                })).filter(r => r.title && r.url);
            }
        """)

        # Extract People Also Ask
        paa = await page.evaluate("""
            () => {
                const questions = document.querySelectorAll('.related-question-pair');
                return Array.from(questions).map(q => ({
                    question: q.querySelector('[data-q]')?.getAttribute('data-q')
                              || q.textContent?.trim() || '',
                }));
            }
        """)

        await browser.close()
        return {"organic": results, "people_also_ask": paa}

# Run
data = asyncio.run(scrape_google_playwright("best web scraping tools 2026"))
for r in data["organic"]:
    print(f"{r['position']}. {r['title']}")

Method 3: Node.js + Cheerio

Fast, lightweight scraping for Node.js applications:

import axios from 'axios';
import * as cheerio from 'cheerio';

async function scrapeGoogle(query, numResults = 10) {
  const { data } = await axios.get('https://www.google.com/search', {
    params: { q: query, num: numResults, hl: 'en', gl: 'us' },
    headers: {
      'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) '
        + 'AppleWebKit/537.36 Chrome/120.0.0.0 Safari/537.36',
      'Accept-Language': 'en-US,en;q=0.9',
    },
    timeout: 10000,
  });

  const $ = cheerio.load(data);
  const results = [];

  $('#search .g').each((i, el) => {
    const title = $(el).find('h3').text();
    const url = $(el).find('a').attr('href') || '';
    const snippet = $(el).find('.VwiC3b, .IsZvec').text();

    if (title && url.startsWith('http')) {
      results.push({
        position: results.length + 1,
        title,
        url,
        snippet,
      });
    }
  });

  return results;
}

// Usage
const results = await scrapeGoogle('web scraping API for AI agents');
results.forEach(r => {
  console.log(`${r.position}. ${r.title}`);
  console.log(`   ${r.url}\n`);
});

Method 4: Web Scraping API (Easiest)

The simplest production approach โ€” no proxies, no selectors, no maintenance. Send the URL, get structured data back.

With Mantis API

import requests

# Scrape Google and extract structured data in one call
response = requests.post(
    "https://api.mantisapi.com/v1/extract",
    headers={
        "Authorization": "Bearer YOUR_API_KEY",
        "Content-Type": "application/json",
    },
    json={
        "url": "https://www.google.com/search?q=web+scraping+API+2026&num=10&hl=en",
        "schema": {
            "results": [{
                "position": "number",
                "title": "string",
                "url": "string",
                "snippet": "string",
            }],
            "people_also_ask": ["string"],
            "related_searches": ["string"],
        },
    },
)

data = response.json()
for r in data["results"]:
    print(f"{r['position']}. {r['title']}")
    print(f"   {r['url']}")

Or use the scrape endpoint for raw HTML

# Get the raw HTML โ€” Mantis handles proxies, rendering, anti-blocking
response = requests.post(
    "https://api.mantisapi.com/v1/scrape",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    json={
        "url": "https://www.google.com/search?q=web+scraping+API&hl=en",
        "render_js": True,
        "wait_for": "#search",
    },
)

html = response.json()["html"]
# Parse with BeautifulSoup as in Method 1

Skip the Proxy Headaches

Mantis handles anti-blocking, proxy rotation, and JavaScript rendering. Extract structured data from any Google SERP in one API call.

View Pricing Try Mantis Free โ†’

Avoiding Google's Anti-Bot Detection

Google is one of the hardest sites to scrape at scale. Here's what works:

TechniqueEffectivenessCost
Rotate User-Agent stringsโญโญ LowFree
Random delays (2-10s)โญโญโญ MediumFree (slower)
Datacenter proxiesโญโญ Low$10-50/mo
Residential proxiesโญโญโญโญ High$50-300/mo
Headless browser + stealthโญโญโญ MediumServer costs
Web scraping APIโญโญโญโญโญ Highest$29-299/mo

Essential Anti-Detection Checklist

  1. Rotate IPs โ€” Never send more than 10 requests from the same IP in a minute
  2. Randomize delays โ€” time.sleep(random.uniform(2, 8)) between requests
  3. Vary User-Agents โ€” Maintain a pool of 20+ current browser UA strings
  4. Accept cookies โ€” Handle Google's consent page, keep session cookies
  5. Geo-target correctly โ€” Use &gl=us&hl=en params consistently
  6. Handle CAPTCHAs gracefully โ€” Detect and rotate IP when you hit one
  7. Respect robots.txt โ€” Google's robots.txt disallows /search; consider the implications
import random
import time

USER_AGENTS = [
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 Chrome/120.0.0.0 Safari/537.36",
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 14_2) AppleWebKit/537.36 Chrome/120.0.0.0 Safari/537.36",
    "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 Chrome/120.0.0.0 Safari/537.36",
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:121.0) Gecko/20100101 Firefox/121.0",
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 14.2; rv:121.0) Gecko/20100101 Firefox/121.0",
]

def polite_scrape(queries):
    """Scrape multiple queries with anti-detection measures."""
    results = {}
    for query in queries:
        time.sleep(random.uniform(3, 10))  # Random delay
        headers = {"User-Agent": random.choice(USER_AGENTS)}
        try:
            data = scrape_google(query)  # from Method 1
            results[query] = data
        except requests.exceptions.HTTPError as e:
            if e.response.status_code == 429:
                print(f"Rate limited on '{query}' โ€” waiting 60s")
                time.sleep(60)
            else:
                raise
    return results

Google Custom Search API vs Scraping

Google offers an official Custom Search JSON API. Here's how it compares to scraping:

FeatureCustom Search APIScrapingScraping API (Mantis)
Results sourceCustom search engine onlyFull Google indexFull Google index
Free tier100 queries/dayUnlimited (with blocks)100 requests/month
Cost at 10K/month$50$100-300 (proxies)$99 (Pro plan)
SERP featuresLimited (organic only)All featuresAll features
MaintenanceLowHigh (selectors break)None
Rate limitsStrictDepends on proxiesPlan-based
Legal riskNone (authorized)Medium (ToS violation)Low (API handles it)
๐Ÿ’ก When to Use Each

Google Custom Search API: Best for low-volume, basic search integration (100/day free). Limited to your configured sites.
DIY scraping: Best for learning, prototyping, or when you need full control and have proxy infrastructure.
Web scraping API: Best for production โ€” handles anti-blocking, scales easily, structured data extraction.

Method Comparison

MethodDifficultyScaleMaintenanceBest For
Python + requestsEasyLow (50-100/day)HighLearning, prototyping
Playwright/SeleniumMediumMedium (with proxies)HighJS-rendered content, PAA expansion
Node.js + CheerioEasyLow (50-100/day)HighNode.js codebases
Scraping API (Mantis)EasiestHigh (100K/month)NoneProduction, AI agents
Google Custom SearchEasyLow (100/day free)LowBasic search, limited sites

Real-World Use Cases

1. SEO Rank Tracking

def track_rankings(domain, keywords):
    """Track where a domain ranks for given keywords."""
    rankings = {}
    for kw in keywords:
        results = scrape_google(kw, num_results=50)
        for r in results:
            if domain in r["url"]:
                rankings[kw] = r["position"]
                break
        else:
            rankings[kw] = None  # Not in top 50
        time.sleep(random.uniform(5, 15))
    return rankings

# Track your rankings
my_rankings = track_rankings("mantisapi.com", [
    "web scraping API",
    "AI agent web scraping",
    "headless browser API",
])
print(my_rankings)

2. Competitor Ad Monitoring

def find_competitor_ads(keywords):
    """Find which competitors are running Google Ads for given keywords."""
    # Use Playwright to capture ad blocks
    # Ads appear in .uEierd or .commercial-unit-desktop-top
    pass  # Full implementation in our GitHub repo

3. AI Agent with Search Capability

import requests

def agent_search_tool(query: str) -> str:
    """Give an AI agent the ability to search Google."""
    response = requests.post(
        "https://api.mantisapi.com/v1/extract",
        headers={"Authorization": "Bearer YOUR_API_KEY"},
        json={
            "url": f"https://www.google.com/search?q={query}&hl=en&num=5",
            "schema": {
                "results": [{
                    "title": "string",
                    "url": "string",
                    "snippet": "string",
                }]
            },
        },
    )
    results = response.json()["results"]
    return "\n".join(
        f"- {r['title']}: {r['snippet']} ({r['url']})"
        for r in results[:5]
    )

# Use with LangChain, CrewAI, or any agent framework
# tool = Tool(name="google_search", func=agent_search_tool, ...)

Important points about scraping Google:

โš–๏ธ Not Legal Advice

This article is for educational purposes. Consult a lawyer for your specific use case, especially for commercial scraping at scale.

Frequently Asked Questions

Is it legal to scrape Google search results?

Scraping publicly available data is generally legal under the hiQ v. LinkedIn precedent, but Google's ToS prohibit automated access. Using an API-based approach reduces legal risk. Consult legal counsel for commercial use.

How do I scrape Google without getting blocked?

Rotate IPs (residential proxies), randomize delays (2-10s), rotate User-Agents, and limit volume. Or use a web scraping API like Mantis that handles anti-blocking automatically.

What's the best Python library for scraping Google?

For simple scraping: requests + BeautifulSoup. For JS rendering: Playwright. For production: a web scraping API.

How many results can I scrape per day?

Without proxies: 50-100 before blocks. With residential proxies: thousands. With Mantis: up to 100,000/month on the Scale plan.

Ready to Scrape Google at Scale?

Start with 100 free API calls/month. No proxies, no CAPTCHAs, no broken selectors.

View Pricing Get Started Free โ†’

Related Guides