๐Ÿ”ฎ Pydantic AI Integration

Type-Safe Web Scraping for Pydantic AI Agents

WebPerception integrates as a typed tool with Pydantic AI. Get structured, validated web data with full type safety โ€” the Pythonic way.

Quick Setup

1 Install dependencies
pip install pydantic-ai requests
2 Define typed models and the scraping tool
from pydantic import BaseModel, Field
from pydantic_ai import Agent, RunContext
import requests

class ScrapedPage(BaseModel):
    """Structured result from a web scrape."""
    url: str
    title: str = ""
    content: str = ""
    status_code: int = 200

class ProductInfo(BaseModel):
    """Extracted product data."""
    name: str
    price: float
    currency: str = "USD"
    in_stock: bool = True
    description: str = ""

# Create the agent with a result type
agent = Agent(
    "openai:gpt-4",
    system_prompt="You are a web research assistant. Use the scrape tool to gather information.",
)

MANTIS_API_KEY = "YOUR_API_KEY"

@agent.tool
async def scrape_webpage(ctx: RunContext, url: str) -> ScrapedPage:
    """Scrape a webpage and return its content."""
    resp = requests.post(
        "https://api.mantisapi.com/v1/scrape",
        headers={"x-api-key": MANTIS_API_KEY},
        json={"url": url, "render_js": True}
    )
    data = resp.json()
    return ScrapedPage(
        url=url,
        title=data.get("title", ""),
        content=data.get("content", ""),
        status_code=resp.status_code
    )

@agent.tool
async def extract_product(ctx: RunContext, url: str) -> ProductInfo:
    """Extract structured product data from a URL using AI."""
    resp = requests.post(
        "https://api.mantisapi.com/v1/extract",
        headers={"x-api-key": MANTIS_API_KEY},
        json={
            "url": url,
            "schema": {
                "name": "string",
                "price": "number",
                "currency": "string",
                "in_stock": "boolean",
                "description": "string"
            }
        }
    )
    data = resp.json().get("extracted", {})
    return ProductInfo(**data)
3 Run the agent
import asyncio

async def main():
    result = await agent.run(
        "Find the price of the MacBook Pro M4 on Apple's website"
    )
    print(result.data)

asyncio.run(main())

Why WebPerception + Pydantic AI?

๐Ÿ”’

Full Type Safety

Every scraped result is a validated Pydantic model. No more untyped dictionaries or runtime surprises. Your IDE catches errors before they happen.

๐Ÿง 

AI Extraction โ†’ Pydantic Models

WebPerception's AI extraction returns JSON that maps directly to your Pydantic models. Define the shape you want, get validated data back.

๐Ÿ

Pythonic API

Pydantic AI's @agent.tool decorator + WebPerception = the cleanest web scraping integration in Python. Type hints, async/await, zero boilerplate.

๐Ÿ”„

Dependency Injection

Use Pydantic AI's RunContext to inject API keys, rate limiters, and configuration. Keep your tools clean and testable.

๐Ÿ“Š

Structured Results

Agent results are typed too. Build agents that return ProductInfo, CompanyProfile, or any custom model โ€” not just strings.

โšก

Async Native

Pydantic AI is async-first. Scrape multiple pages concurrently with WebPerception for blazing fast data collection.

Pydantic AI + WebPerception Use Cases

๐Ÿ›’ Type-Safe Price Comparison Agent

Build an agent that compares prices across stores with validated output. Every price, currency, and availability flag is type-checked.

from pydantic import BaseModel

class PriceComparison(BaseModel):
    product: str
    cheapest_store: str
    cheapest_price: float
    all_prices: list[ProductInfo]
    savings_percent: float

# Agent returns a typed PriceComparison
price_agent = Agent(
    "openai:gpt-4",
    result_type=PriceComparison,
    system_prompt="Compare prices across stores using the extract tool.",
)

result = await price_agent.run("Compare MacBook Air prices on Amazon, Best Buy, and B&H")
print(f"Best deal: {result.data.cheapest_store} โ€” ${result.data.cheapest_price}")

๐Ÿข Company Profile Builder

Scrape company websites and build validated profiles with team info, funding, tech stack โ€” all as typed Pydantic models ready for your database.

class CompanyProfile(BaseModel):
    name: str
    description: str
    founded: int | None = None
    team_size: str | None = None
    tech_stack: list[str] = []
    pricing_url: str | None = None

profile_agent = Agent(
    "openai:gpt-4",
    result_type=CompanyProfile,
    system_prompt="Research companies by scraping their website. Return structured profiles.",
)

๐Ÿ“ฐ News Monitoring Pipeline

Build a typed news aggregator that scrapes sources, extracts articles into validated models, and deduplicates by topic. Production-ready data pipelines with zero parsing bugs.

class NewsArticle(BaseModel):
    title: str
    source: str
    published_date: str
    summary: str
    topics: list[str]
    sentiment: float = Field(ge=-1.0, le=1.0)

class NewsBrief(BaseModel):
    articles: list[NewsArticle]
    top_topics: list[str]
    overall_sentiment: str

news_agent = Agent("openai:gpt-4", result_type=NewsBrief)

Type-Safe Web Scraping for Your Agents

100 free API calls/month. No credit card required.

Get Your API Key โ†’