Build a Web Scraping Agent with Google ADK (Agent Development Kit)

March 9, 2026 · 10 min read Google ADK AI Agents Python

Google's Agent Development Kit (ADK) is Google's open-source framework for building AI agents that can use tools, collaborate in teams, and run in production. If you're in the Google/Gemini ecosystem, ADK is the natural choice — and giving those agents web scraping capabilities makes them dramatically more useful.

In this guide, you'll build an ADK agent that can:

Scrape any URL and get clean, structured content
Screenshot webpages for visual analysis
Extract structured data with AI (prices, contacts, product specs)
Orchestrate multi-agent teams for complex research tasks

All powered by the WebPerception API.

What is Google ADK?

Google ADK is an open-source, code-first Python framework for building AI agents. Released in 2025, it's designed to work seamlessly with Gemini models but also supports other LLMs.

Why ADK stands out:

Multi-agent orchestration — Built-in support for agent teams with delegation
Flexible tool system — Register Python functions as tools with automatic schema generation
Streaming support — Real-time streaming of agent responses
Session management — Built-in conversation state and memory
Google Cloud integration — Deploy to Vertex AI Agent Engine with one command

Prerequisites

Python 3.10+
WebPerception API key — Get one free (100 calls/month)
Google ADK installed

pip install google-adk requests

Step 1: Define Your Web Scraping Tools

ADK uses Python functions with type hints as tools. The framework automatically generates the tool schema from your function signatures and docstrings.

# tools.py
import requests
import json
from typing import Optional

MANTIS_API_KEY = "your_api_key_here"  # Use env vars in production
BASE_URL = "https://api.mantisapi.com/v1"

def scrape_url(url: str, format: str = "markdown") -> str:
    """Scrape a webpage and return its content.
    
    Args:
        url: The URL to scrape
        format: Output format - 'markdown', 'html', or 'text'
    
    Returns:
        The scraped content of the webpage
    """
    response = requests.post(
        f"{BASE_URL}/scrape",
        headers={"Authorization": f"Bearer {MANTIS_API_KEY}"},
        json={"url": url, "format": format}
    )
    data = response.json()
    return data.get("content", f"Error: {data.get('error', 'Unknown error')}")


def screenshot_url(url: str, full_page: bool = False) -> str:
    """Take a screenshot of a webpage.
    
    Args:
        url: The URL to screenshot
        full_page: Whether to capture the full page or just viewport
    
    Returns:
        URL of the screenshot image
    """
    response = requests.post(
        f"{BASE_URL}/screenshot",
        headers={"Authorization": f"Bearer {MANTIS_API_KEY}"},
        json={"url": url, "full_page": full_page}
    )
    data = response.json()
    return data.get("screenshot_url", f"Error: {data.get('error', 'Unknown error')}")


def extract_data(url: str, schema: str) -> str:
    """Extract structured data from a webpage using AI.
    
    Args:
        url: The URL to extract data from
        schema: JSON schema describing what data to extract.
            Example: '{"name": "string", "price": "number", "rating": "number"}'
    
    Returns:
        JSON string of extracted data matching the schema
    """
    response = requests.post(
        f"{BASE_URL}/extract",
        headers={"Authorization": f"Bearer {MANTIS_API_KEY}"},
        json={"url": url, "schema": json.loads(schema)}
    )
    data = response.json()
    return json.dumps(data.get("extracted", data), indent=2)

Step 2: Create Your ADK Agent

# agent.py
from google.adk.agents import Agent
from tools import scrape_url, screenshot_url, extract_data

# Create the web research agent
web_agent = Agent(
    name="web_researcher",
    model="gemini-2.0-flash",
    description="An AI agent that can scrape websites, take screenshots, "
                "and extract structured data.",
    instruction="""You are a web research agent. You can:
    1. Scrape any URL to read its content
    2. Take screenshots of webpages
    3. Extract structured data from pages using AI
    
    When asked to research something:
    - Start by scraping the most relevant pages
    - Extract specific data points when needed
    - Take screenshots when visual context helps
    - Summarize findings clearly with sources
    
    Always cite your sources with URLs.""",
    tools=[scrape_url, screenshot_url, extract_data],
)

That's it. ADK handles tool schema generation, execution, and response parsing automatically.

Step 3: Run Your Agent

# run.py
from google.adk.runners import Runner
from google.adk.sessions import InMemorySessionService
from google.genai import types
from agent import web_agent

# Set up session and runner
session_service = InMemorySessionService()
runner = Runner(
    agent=web_agent,
    app_name="web_research_app",
    session_service=session_service,
)

# Create a session
session = session_service.create_session(
    app_name="web_research_app",
    user_id="user_1",
)

# Run the agent
message = types.Content(
    role="user",
    parts=[types.Part(text="Research the top 3 AI agent frameworks "
                           "in 2026. Compare features and pricing.")]
)

for event in runner.run(
    user_id="user_1",
    session_id=session.id,
    new_message=message,
):
    if event.is_final_response():
        print(event.content.parts[0].text)

Step 4: Multi-Agent Research Team

ADK's killer feature is multi-agent orchestration. Let's build a research team:

# team.py
from google.adk.agents import Agent
from tools import scrape_url, screenshot_url, extract_data

# Specialist: Web scraper
scraper_agent = Agent(
    name="scraper",
    model="gemini-2.0-flash",
    description="Scrapes websites and extracts raw content.",
    instruction="You scrape URLs and return clean content. "
                "Focus on accuracy and completeness.",
    tools=[scrape_url, screenshot_url],
)

# Specialist: Data extractor
extractor_agent = Agent(
    name="extractor",
    model="gemini-2.0-flash",
    description="Extracts structured data from web pages.",
    instruction="You extract specific data points from web pages "
                "into structured formats. Always validate the data.",
    tools=[extract_data],
)

# Orchestrator: Research lead
research_lead = Agent(
    name="research_lead",
    model="gemini-2.0-flash",
    description="Coordinates web research by delegating to specialists.",
    instruction="""You are a research team lead. You coordinate by:
    1. Breaking down research questions into sub-tasks
    2. Delegating scraping to the scraper agent
    3. Delegating data extraction to the extractor agent
    4. Synthesizing findings into a comprehensive report
    
    Always provide a final summary with key findings.""",
    sub_agents=[scraper_agent, extractor_agent],
)

The orchestrator agent automatically decides when to delegate to sub-agents based on the task.

Step 5: Real-World Use Cases

Competitor Price Monitoring

price_monitor = Agent(
    name="price_monitor",
    model="gemini-2.0-flash",
    description="Monitors competitor pricing across the web.",
    instruction="""You monitor competitor pricing. When given a product:
    1. Search for the product on competitor websites
    2. Extract current prices using the extract_data tool
    3. Compare prices and highlight the best deals
    4. Flag any price changes from previous checks""",
    tools=[scrape_url, extract_data],
)

Lead Generation

lead_generator = Agent(
    name="lead_gen",
    model="gemini-2.0-flash",
    description="Finds and qualifies business leads from the web.",
    instruction="""You find and qualify leads. For each company:
    1. Scrape their website for key information
    2. Extract company details (size, industry, tech stack)
    3. Check for hiring pages (indicates growth)
    4. Rate lead quality based on ICP fit""",
    tools=[scrape_url, extract_data],
)

Content Research Pipeline

content_researcher = Agent(
    name="content_researcher",
    model="gemini-2.0-flash",
    description="Researches topics for content creation.",
    instruction="""You research topics for blog posts. For each topic:
    1. Scrape top-ranking articles
    2. Identify common themes, gaps, and unique angles
    3. Extract key statistics and data points
    4. Create an outline that covers the topic better""",
    tools=[scrape_url, extract_data],
)

ADK + WebPerception: Why It Works

Feature	ADK Provides	WebPerception Provides
Agent orchestration	✅ Multi-agent teams	—
Tool system	✅ Auto-schema from Python	—
Web scraping	—	✅ Any URL, JS-rendered
Screenshots	—	✅ Full-page captures
AI extraction	—	✅ Structured data
Session management	✅ Built-in state	—
Deployment	✅ Vertex AI	✅ Cloud API, no infra

Together: You get production-ready AI agents with real-time web access, deployed on Google Cloud, with zero browser infrastructure to manage.

Deploying to Vertex AI

ADK agents deploy to Google's Vertex AI Agent Engine:

# Install gcloud CLI tools
pip install google-cloud-aiplatform

# Deploy your agent
adk deploy cloud_run \
  --project=your-gcp-project \
  --region=us-central1 \
  --app_name=web_research_app \
  --agent_name=web_researcher

Your agent runs as a managed service with auto-scaling, monitoring, and API endpoints — no servers to manage.

Error Handling Best Practices

import requests
from requests.exceptions import RequestException

def scrape_url_safe(url: str, format: str = "markdown") -> str:
    """Scrape a webpage with error handling."""
    try:
        response = requests.post(
            f"{BASE_URL}/scrape",
            headers={"Authorization": f"Bearer {MANTIS_API_KEY}"},
            json={"url": url, "format": format},
            timeout=30,
        )
        
        if response.status_code == 429:
            return "Rate limited. Please wait before retrying."
        
        response.raise_for_status()
        data = response.json()
        return data.get("content", "No content returned")
        
    except RequestException as e:
        return f"Failed to scrape {url}: {str(e)}"

Cost Optimization

WebPerception API uses simple per-call pricing:

Plan	Calls/Month	Cost per Call
Free	100	$0.00
Starter	5,000	$0.0058
Pro	25,000	$0.0040
Scale	100,000	$0.0030

Tips for keeping costs low:

Cache results for URLs that don't change frequently
Use format="text" when you don't need HTML structure
Batch related scraping tasks in a single agent run
Set reasonable timeouts to avoid wasted calls on unresponsive sites

Give Your ADK Agents Web Superpowers

Start scraping, screenshotting, and extracting data in minutes. Free tier included.

Get Your Free API Key →

Next Steps

WebPerception API Quickstart — Get your API key in 30 seconds
Google ADK Documentation — Deep dive into ADK features
AI Agent Tool Use Patterns — 7 architectures for agent tool use
MCP Web Scraping Server — Alternative: build an MCP server instead