๐ Table of Contents
- Why Scrape Google Search Results?
- What Data Can You Extract?
- Method 1: Python + BeautifulSoup
- Method 2: Playwright (Headless Browser)
- Method 3: Node.js + Cheerio
- Method 4: Web Scraping API (Easiest)
- Avoiding Google's Anti-Bot Detection
- Google Custom Search API vs Scraping
- Method Comparison
- Real-World Use Cases
- Legal Considerations
- FAQ
Why Scrape Google Search Results?
Google processes over 8.5 billion searches per day. That search data is a goldmine for:
- SEO monitoring โ Track your rankings and competitors' positions for target keywords
- Market research โ Understand what people search for in your industry
- Lead generation โ Find businesses ranking for specific terms
- Content strategy โ Discover topics, questions, and gaps to fill
- Ad intelligence โ Monitor competitors' paid search campaigns
- Price monitoring โ Track shopping results and product prices
- AI agent data โ Give AI agents real-time web search capabilities
Whether you're building an SEO tool, a research platform, or an AI agent that needs web search, scraping Google SERPs is a foundational capability.
What Data Can You Extract?
A Google search results page contains multiple data types:
| SERP Feature | Data Available | CSS Selector Hint |
|---|---|---|
| Organic Results | Title, URL, snippet, position | #search .g |
| Featured Snippet | Answer text, source URL | .xpdopen |
| People Also Ask | Questions, expandable answers | .related-question-pair |
| Knowledge Graph | Entity info, images, facts | .kp-wholepage |
| Local Pack | Business name, rating, address | .VkpGBb |
| Image Results | Image URLs, source pages | #imagebox_bigimages |
| Shopping Results | Product, price, store | .commercial-unit-desktop-top |
| Related Searches | Suggested queries | #brs |
Google updates its HTML structure regularly. The selectors above are approximate โ always verify against the current page. Using an API that returns structured data (like Mantis) avoids selector maintenance entirely.
Method 1: Python + BeautifulSoup
The simplest approach for small-scale SERP scraping. Sends HTTP requests and parses the HTML response.
Installation
pip install requests beautifulsoup4 lxml
Basic Google SERP Scraper
import requests
from bs4 import BeautifulSoup
import time
import random
def scrape_google(query, num_results=10):
"""Scrape Google search results for a given query."""
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/120.0.0.0 Safari/537.36",
"Accept-Language": "en-US,en;q=0.9",
}
params = {
"q": query,
"num": num_results,
"hl": "en",
"gl": "us",
}
response = requests.get(
"https://www.google.com/search",
params=params,
headers=headers,
timeout=10,
)
response.raise_for_status()
soup = BeautifulSoup(response.text, "lxml")
results = []
for g in soup.select("#search .g"):
title_el = g.select_one("h3")
link_el = g.select_one("a[href]")
snippet_el = g.select_one(".VwiC3b, .IsZvec")
if title_el and link_el:
href = link_el["href"]
if href.startswith("/url?q="):
href = href.split("/url?q=")[1].split("&")[0]
results.append({
"position": len(results) + 1,
"title": title_el.get_text(),
"url": href,
"snippet": snippet_el.get_text() if snippet_el else "",
})
return results
# Usage
results = scrape_google("web scraping API 2026")
for r in results:
print(f"{r['position']}. {r['title']}")
print(f" {r['url']}")
print(f" {r['snippet'][:100]}...")
print()
Adding Proxy Rotation
import itertools
PROXIES = [
"http://user:pass@proxy1.example.com:8080",
"http://user:pass@proxy2.example.com:8080",
"http://user:pass@proxy3.example.com:8080",
]
proxy_pool = itertools.cycle(PROXIES)
def scrape_with_proxy(query):
proxy = next(proxy_pool)
response = requests.get(
"https://www.google.com/search",
params={"q": query, "num": 10, "hl": "en"},
headers={"User-Agent": random.choice(USER_AGENTS)},
proxies={"http": proxy, "https": proxy},
timeout=15,
)
# ... parse as before
Residential proxies work best for Google. Datacenter IPs get flagged quickly. Budget $50-200/month for a reliable residential proxy pool โ or skip proxies entirely by using a scraping API.
Method 2: Playwright (Headless Browser)
When Google requires JavaScript rendering or you need to interact with the page (clicking "People Also Ask", loading more results):
import asyncio
from playwright.async_api import async_playwright
async def scrape_google_playwright(query):
async with async_playwright() as p:
browser = await p.chromium.launch(headless=True)
context = await browser.new_context(
user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 Chrome/120.0.0.0 Safari/537.36",
viewport={"width": 1920, "height": 1080},
locale="en-US",
)
page = await context.new_page()
# Navigate to Google
await page.goto(
f"https://www.google.com/search?q={query}&hl=en&gl=us",
wait_until="networkidle",
)
# Handle consent page (EU/GDPR)
try:
await page.click("button:has-text('Accept all')", timeout=3000)
await page.wait_for_load_state("networkidle")
except:
pass
# Extract organic results
results = await page.evaluate("""
() => {
const items = document.querySelectorAll('#search .g');
return Array.from(items).map((el, i) => ({
position: i + 1,
title: el.querySelector('h3')?.textContent || '',
url: el.querySelector('a')?.href || '',
snippet: el.querySelector('.VwiC3b, .IsZvec')?.textContent || '',
})).filter(r => r.title && r.url);
}
""")
# Extract People Also Ask
paa = await page.evaluate("""
() => {
const questions = document.querySelectorAll('.related-question-pair');
return Array.from(questions).map(q => ({
question: q.querySelector('[data-q]')?.getAttribute('data-q')
|| q.textContent?.trim() || '',
}));
}
""")
await browser.close()
return {"organic": results, "people_also_ask": paa}
# Run
data = asyncio.run(scrape_google_playwright("best web scraping tools 2026"))
for r in data["organic"]:
print(f"{r['position']}. {r['title']}")
Method 3: Node.js + Cheerio
Fast, lightweight scraping for Node.js applications:
import axios from 'axios';
import * as cheerio from 'cheerio';
async function scrapeGoogle(query, numResults = 10) {
const { data } = await axios.get('https://www.google.com/search', {
params: { q: query, num: numResults, hl: 'en', gl: 'us' },
headers: {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) '
+ 'AppleWebKit/537.36 Chrome/120.0.0.0 Safari/537.36',
'Accept-Language': 'en-US,en;q=0.9',
},
timeout: 10000,
});
const $ = cheerio.load(data);
const results = [];
$('#search .g').each((i, el) => {
const title = $(el).find('h3').text();
const url = $(el).find('a').attr('href') || '';
const snippet = $(el).find('.VwiC3b, .IsZvec').text();
if (title && url.startsWith('http')) {
results.push({
position: results.length + 1,
title,
url,
snippet,
});
}
});
return results;
}
// Usage
const results = await scrapeGoogle('web scraping API for AI agents');
results.forEach(r => {
console.log(`${r.position}. ${r.title}`);
console.log(` ${r.url}\n`);
});
Method 4: Web Scraping API (Easiest)
The simplest production approach โ no proxies, no selectors, no maintenance. Send the URL, get structured data back.
With Mantis API
import requests
# Scrape Google and extract structured data in one call
response = requests.post(
"https://api.mantisapi.com/v1/extract",
headers={
"Authorization": "Bearer YOUR_API_KEY",
"Content-Type": "application/json",
},
json={
"url": "https://www.google.com/search?q=web+scraping+API+2026&num=10&hl=en",
"schema": {
"results": [{
"position": "number",
"title": "string",
"url": "string",
"snippet": "string",
}],
"people_also_ask": ["string"],
"related_searches": ["string"],
},
},
)
data = response.json()
for r in data["results"]:
print(f"{r['position']}. {r['title']}")
print(f" {r['url']}")
Or use the scrape endpoint for raw HTML
# Get the raw HTML โ Mantis handles proxies, rendering, anti-blocking
response = requests.post(
"https://api.mantisapi.com/v1/scrape",
headers={"Authorization": "Bearer YOUR_API_KEY"},
json={
"url": "https://www.google.com/search?q=web+scraping+API&hl=en",
"render_js": True,
"wait_for": "#search",
},
)
html = response.json()["html"]
# Parse with BeautifulSoup as in Method 1
Skip the Proxy Headaches
Mantis handles anti-blocking, proxy rotation, and JavaScript rendering. Extract structured data from any Google SERP in one API call.
View Pricing Try Mantis Free โAvoiding Google's Anti-Bot Detection
Google is one of the hardest sites to scrape at scale. Here's what works:
| Technique | Effectiveness | Cost |
|---|---|---|
| Rotate User-Agent strings | โญโญ Low | Free |
| Random delays (2-10s) | โญโญโญ Medium | Free (slower) |
| Datacenter proxies | โญโญ Low | $10-50/mo |
| Residential proxies | โญโญโญโญ High | $50-300/mo |
| Headless browser + stealth | โญโญโญ Medium | Server costs |
| Web scraping API | โญโญโญโญโญ Highest | $29-299/mo |
Essential Anti-Detection Checklist
- Rotate IPs โ Never send more than 10 requests from the same IP in a minute
- Randomize delays โ time.sleep(random.uniform(2, 8)) between requests
- Vary User-Agents โ Maintain a pool of 20+ current browser UA strings
- Accept cookies โ Handle Google's consent page, keep session cookies
- Geo-target correctly โ Use &gl=us&hl=en params consistently
- Handle CAPTCHAs gracefully โ Detect and rotate IP when you hit one
- Respect robots.txt โ Google's robots.txt disallows /search; consider the implications
import random
import time
USER_AGENTS = [
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 Chrome/120.0.0.0 Safari/537.36",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 14_2) AppleWebKit/537.36 Chrome/120.0.0.0 Safari/537.36",
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 Chrome/120.0.0.0 Safari/537.36",
"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:121.0) Gecko/20100101 Firefox/121.0",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 14.2; rv:121.0) Gecko/20100101 Firefox/121.0",
]
def polite_scrape(queries):
"""Scrape multiple queries with anti-detection measures."""
results = {}
for query in queries:
time.sleep(random.uniform(3, 10)) # Random delay
headers = {"User-Agent": random.choice(USER_AGENTS)}
try:
data = scrape_google(query) # from Method 1
results[query] = data
except requests.exceptions.HTTPError as e:
if e.response.status_code == 429:
print(f"Rate limited on '{query}' โ waiting 60s")
time.sleep(60)
else:
raise
return results
Google Custom Search API vs Scraping
Google offers an official Custom Search JSON API. Here's how it compares to scraping:
| Feature | Custom Search API | Scraping | Scraping API (Mantis) |
|---|---|---|---|
| Results source | Custom search engine only | Full Google index | Full Google index |
| Free tier | 100 queries/day | Unlimited (with blocks) | 100 requests/month |
| Cost at 10K/month | $50 | $100-300 (proxies) | $99 (Pro plan) |
| SERP features | Limited (organic only) | All features | All features |
| Maintenance | Low | High (selectors break) | None |
| Rate limits | Strict | Depends on proxies | Plan-based |
| Legal risk | None (authorized) | Medium (ToS violation) | Low (API handles it) |
Google Custom Search API: Best for low-volume, basic search integration (100/day free). Limited to your configured sites.
DIY scraping: Best for learning, prototyping, or when you need full control and have proxy infrastructure.
Web scraping API: Best for production โ handles anti-blocking, scales easily, structured data extraction.
Method Comparison
| Method | Difficulty | Scale | Maintenance | Best For |
|---|---|---|---|---|
| Python + requests | Easy | Low (50-100/day) | High | Learning, prototyping |
| Playwright/Selenium | Medium | Medium (with proxies) | High | JS-rendered content, PAA expansion |
| Node.js + Cheerio | Easy | Low (50-100/day) | High | Node.js codebases |
| Scraping API (Mantis) | Easiest | High (100K/month) | None | Production, AI agents |
| Google Custom Search | Easy | Low (100/day free) | Low | Basic search, limited sites |
Real-World Use Cases
1. SEO Rank Tracking
def track_rankings(domain, keywords):
"""Track where a domain ranks for given keywords."""
rankings = {}
for kw in keywords:
results = scrape_google(kw, num_results=50)
for r in results:
if domain in r["url"]:
rankings[kw] = r["position"]
break
else:
rankings[kw] = None # Not in top 50
time.sleep(random.uniform(5, 15))
return rankings
# Track your rankings
my_rankings = track_rankings("mantisapi.com", [
"web scraping API",
"AI agent web scraping",
"headless browser API",
])
print(my_rankings)
2. Competitor Ad Monitoring
def find_competitor_ads(keywords):
"""Find which competitors are running Google Ads for given keywords."""
# Use Playwright to capture ad blocks
# Ads appear in .uEierd or .commercial-unit-desktop-top
pass # Full implementation in our GitHub repo
3. AI Agent with Search Capability
import requests
def agent_search_tool(query: str) -> str:
"""Give an AI agent the ability to search Google."""
response = requests.post(
"https://api.mantisapi.com/v1/extract",
headers={"Authorization": "Bearer YOUR_API_KEY"},
json={
"url": f"https://www.google.com/search?q={query}&hl=en&num=5",
"schema": {
"results": [{
"title": "string",
"url": "string",
"snippet": "string",
}]
},
},
)
results = response.json()["results"]
return "\n".join(
f"- {r['title']}: {r['snippet']} ({r['url']})"
for r in results[:5]
)
# Use with LangChain, CrewAI, or any agent framework
# tool = Tool(name="google_search", func=agent_search_tool, ...)
Legal Considerations
Important points about scraping Google:
- Terms of Service: Google's ToS prohibits automated access to search results. Violating ToS isn't necessarily illegal but can result in IP bans.
- hiQ v. LinkedIn (2022): The US Ninth Circuit ruled that scraping publicly available data doesn't violate the CFAA. This sets a favorable precedent for public web scraping.
- GDPR/CCPA: If you're collecting personal data from search results, data protection laws apply.
- robots.txt: Google's robots.txt disallows /search. While not legally binding, respecting it shows good faith.
- Safe approach: Use Google's official API for authorized access, or a web scraping API that handles compliance.
This article is for educational purposes. Consult a lawyer for your specific use case, especially for commercial scraping at scale.
Frequently Asked Questions
Is it legal to scrape Google search results?
Scraping publicly available data is generally legal under the hiQ v. LinkedIn precedent, but Google's ToS prohibit automated access. Using an API-based approach reduces legal risk. Consult legal counsel for commercial use.
How do I scrape Google without getting blocked?
Rotate IPs (residential proxies), randomize delays (2-10s), rotate User-Agents, and limit volume. Or use a web scraping API like Mantis that handles anti-blocking automatically.
What's the best Python library for scraping Google?
For simple scraping: requests + BeautifulSoup. For JS rendering: Playwright. For production: a web scraping API.
How many results can I scrape per day?
Without proxies: 50-100 before blocks. With residential proxies: thousands. With Mantis: up to 100,000/month on the Scale plan.
Ready to Scrape Google at Scale?
Start with 100 free API calls/month. No proxies, no CAPTCHAs, no broken selectors.
View Pricing Get Started Free โ