Build a Web Scraping Agent with Google ADK (Agent Development Kit)

March 9, 2026 ยท 10 min read Google ADK AI Agents Python

Google's Agent Development Kit (ADK) is Google's open-source framework for building AI agents that can use tools, collaborate in teams, and run in production. If you're in the Google/Gemini ecosystem, ADK is the natural choice โ€” and giving those agents web scraping capabilities makes them dramatically more useful.

In this guide, you'll build an ADK agent that can:

All powered by the WebPerception API.

What is Google ADK?

Google ADK is an open-source, code-first Python framework for building AI agents. Released in 2025, it's designed to work seamlessly with Gemini models but also supports other LLMs.

Why ADK stands out:

Prerequisites

pip install google-adk requests

Step 1: Define Your Web Scraping Tools

ADK uses Python functions with type hints as tools. The framework automatically generates the tool schema from your function signatures and docstrings.

# tools.py
import requests
import json
from typing import Optional

MANTIS_API_KEY = "your_api_key_here"  # Use env vars in production
BASE_URL = "https://api.mantisapi.com/v1"

def scrape_url(url: str, format: str = "markdown") -> str:
    """Scrape a webpage and return its content.
    
    Args:
        url: The URL to scrape
        format: Output format - 'markdown', 'html', or 'text'
    
    Returns:
        The scraped content of the webpage
    """
    response = requests.post(
        f"{BASE_URL}/scrape",
        headers={"Authorization": f"Bearer {MANTIS_API_KEY}"},
        json={"url": url, "format": format}
    )
    data = response.json()
    return data.get("content", f"Error: {data.get('error', 'Unknown error')}")


def screenshot_url(url: str, full_page: bool = False) -> str:
    """Take a screenshot of a webpage.
    
    Args:
        url: The URL to screenshot
        full_page: Whether to capture the full page or just viewport
    
    Returns:
        URL of the screenshot image
    """
    response = requests.post(
        f"{BASE_URL}/screenshot",
        headers={"Authorization": f"Bearer {MANTIS_API_KEY}"},
        json={"url": url, "full_page": full_page}
    )
    data = response.json()
    return data.get("screenshot_url", f"Error: {data.get('error', 'Unknown error')}")


def extract_data(url: str, schema: str) -> str:
    """Extract structured data from a webpage using AI.
    
    Args:
        url: The URL to extract data from
        schema: JSON schema describing what data to extract.
            Example: '{"name": "string", "price": "number", "rating": "number"}'
    
    Returns:
        JSON string of extracted data matching the schema
    """
    response = requests.post(
        f"{BASE_URL}/extract",
        headers={"Authorization": f"Bearer {MANTIS_API_KEY}"},
        json={"url": url, "schema": json.loads(schema)}
    )
    data = response.json()
    return json.dumps(data.get("extracted", data), indent=2)

Step 2: Create Your ADK Agent

# agent.py
from google.adk.agents import Agent
from tools import scrape_url, screenshot_url, extract_data

# Create the web research agent
web_agent = Agent(
    name="web_researcher",
    model="gemini-2.0-flash",
    description="An AI agent that can scrape websites, take screenshots, "
                "and extract structured data.",
    instruction="""You are a web research agent. You can:
    1. Scrape any URL to read its content
    2. Take screenshots of webpages
    3. Extract structured data from pages using AI
    
    When asked to research something:
    - Start by scraping the most relevant pages
    - Extract specific data points when needed
    - Take screenshots when visual context helps
    - Summarize findings clearly with sources
    
    Always cite your sources with URLs.""",
    tools=[scrape_url, screenshot_url, extract_data],
)

That's it. ADK handles tool schema generation, execution, and response parsing automatically.

Step 3: Run Your Agent

# run.py
from google.adk.runners import Runner
from google.adk.sessions import InMemorySessionService
from google.genai import types
from agent import web_agent

# Set up session and runner
session_service = InMemorySessionService()
runner = Runner(
    agent=web_agent,
    app_name="web_research_app",
    session_service=session_service,
)

# Create a session
session = session_service.create_session(
    app_name="web_research_app",
    user_id="user_1",
)

# Run the agent
message = types.Content(
    role="user",
    parts=[types.Part(text="Research the top 3 AI agent frameworks "
                           "in 2026. Compare features and pricing.")]
)

for event in runner.run(
    user_id="user_1",
    session_id=session.id,
    new_message=message,
):
    if event.is_final_response():
        print(event.content.parts[0].text)

Step 4: Multi-Agent Research Team

ADK's killer feature is multi-agent orchestration. Let's build a research team:

# team.py
from google.adk.agents import Agent
from tools import scrape_url, screenshot_url, extract_data

# Specialist: Web scraper
scraper_agent = Agent(
    name="scraper",
    model="gemini-2.0-flash",
    description="Scrapes websites and extracts raw content.",
    instruction="You scrape URLs and return clean content. "
                "Focus on accuracy and completeness.",
    tools=[scrape_url, screenshot_url],
)

# Specialist: Data extractor
extractor_agent = Agent(
    name="extractor",
    model="gemini-2.0-flash",
    description="Extracts structured data from web pages.",
    instruction="You extract specific data points from web pages "
                "into structured formats. Always validate the data.",
    tools=[extract_data],
)

# Orchestrator: Research lead
research_lead = Agent(
    name="research_lead",
    model="gemini-2.0-flash",
    description="Coordinates web research by delegating to specialists.",
    instruction="""You are a research team lead. You coordinate by:
    1. Breaking down research questions into sub-tasks
    2. Delegating scraping to the scraper agent
    3. Delegating data extraction to the extractor agent
    4. Synthesizing findings into a comprehensive report
    
    Always provide a final summary with key findings.""",
    sub_agents=[scraper_agent, extractor_agent],
)

The orchestrator agent automatically decides when to delegate to sub-agents based on the task.

Step 5: Real-World Use Cases

Competitor Price Monitoring

price_monitor = Agent(
    name="price_monitor",
    model="gemini-2.0-flash",
    description="Monitors competitor pricing across the web.",
    instruction="""You monitor competitor pricing. When given a product:
    1. Search for the product on competitor websites
    2. Extract current prices using the extract_data tool
    3. Compare prices and highlight the best deals
    4. Flag any price changes from previous checks""",
    tools=[scrape_url, extract_data],
)

Lead Generation

lead_generator = Agent(
    name="lead_gen",
    model="gemini-2.0-flash",
    description="Finds and qualifies business leads from the web.",
    instruction="""You find and qualify leads. For each company:
    1. Scrape their website for key information
    2. Extract company details (size, industry, tech stack)
    3. Check for hiring pages (indicates growth)
    4. Rate lead quality based on ICP fit""",
    tools=[scrape_url, extract_data],
)

Content Research Pipeline

content_researcher = Agent(
    name="content_researcher",
    model="gemini-2.0-flash",
    description="Researches topics for content creation.",
    instruction="""You research topics for blog posts. For each topic:
    1. Scrape top-ranking articles
    2. Identify common themes, gaps, and unique angles
    3. Extract key statistics and data points
    4. Create an outline that covers the topic better""",
    tools=[scrape_url, extract_data],
)

ADK + WebPerception: Why It Works

FeatureADK ProvidesWebPerception Provides
Agent orchestrationโœ… Multi-agent teamsโ€”
Tool systemโœ… Auto-schema from Pythonโ€”
Web scrapingโ€”โœ… Any URL, JS-rendered
Screenshotsโ€”โœ… Full-page captures
AI extractionโ€”โœ… Structured data
Session managementโœ… Built-in stateโ€”
Deploymentโœ… Vertex AIโœ… Cloud API, no infra

Together: You get production-ready AI agents with real-time web access, deployed on Google Cloud, with zero browser infrastructure to manage.

Deploying to Vertex AI

ADK agents deploy to Google's Vertex AI Agent Engine:

# Install gcloud CLI tools
pip install google-cloud-aiplatform

# Deploy your agent
adk deploy cloud_run \
  --project=your-gcp-project \
  --region=us-central1 \
  --app_name=web_research_app \
  --agent_name=web_researcher

Your agent runs as a managed service with auto-scaling, monitoring, and API endpoints โ€” no servers to manage.

Error Handling Best Practices

import requests
from requests.exceptions import RequestException

def scrape_url_safe(url: str, format: str = "markdown") -> str:
    """Scrape a webpage with error handling."""
    try:
        response = requests.post(
            f"{BASE_URL}/scrape",
            headers={"Authorization": f"Bearer {MANTIS_API_KEY}"},
            json={"url": url, "format": format},
            timeout=30,
        )
        
        if response.status_code == 429:
            return "Rate limited. Please wait before retrying."
        
        response.raise_for_status()
        data = response.json()
        return data.get("content", "No content returned")
        
    except RequestException as e:
        return f"Failed to scrape {url}: {str(e)}"

Cost Optimization

WebPerception API uses simple per-call pricing:

PlanCalls/MonthCost per Call
Free100$0.00
Starter5,000$0.0058
Pro25,000$0.0040
Scale100,000$0.0030

Tips for keeping costs low:

Give Your ADK Agents Web Superpowers

Start scraping, screenshotting, and extracting data in minutes. Free tier included.

Get Your Free API Key โ†’

Next Steps