WebPerception integrates as a typed tool with Pydantic AI. Get structured, validated web data with full type safety โ the Pythonic way.
pip install pydantic-ai requests
from pydantic import BaseModel, Field
from pydantic_ai import Agent, RunContext
import requests
class ScrapedPage(BaseModel):
"""Structured result from a web scrape."""
url: str
title: str = ""
content: str = ""
status_code: int = 200
class ProductInfo(BaseModel):
"""Extracted product data."""
name: str
price: float
currency: str = "USD"
in_stock: bool = True
description: str = ""
# Create the agent with a result type
agent = Agent(
"openai:gpt-4",
system_prompt="You are a web research assistant. Use the scrape tool to gather information.",
)
MANTIS_API_KEY = "YOUR_API_KEY"
@agent.tool
async def scrape_webpage(ctx: RunContext, url: str) -> ScrapedPage:
"""Scrape a webpage and return its content."""
resp = requests.post(
"https://api.mantisapi.com/v1/scrape",
headers={"x-api-key": MANTIS_API_KEY},
json={"url": url, "render_js": True}
)
data = resp.json()
return ScrapedPage(
url=url,
title=data.get("title", ""),
content=data.get("content", ""),
status_code=resp.status_code
)
@agent.tool
async def extract_product(ctx: RunContext, url: str) -> ProductInfo:
"""Extract structured product data from a URL using AI."""
resp = requests.post(
"https://api.mantisapi.com/v1/extract",
headers={"x-api-key": MANTIS_API_KEY},
json={
"url": url,
"schema": {
"name": "string",
"price": "number",
"currency": "string",
"in_stock": "boolean",
"description": "string"
}
}
)
data = resp.json().get("extracted", {})
return ProductInfo(**data)
import asyncio
async def main():
result = await agent.run(
"Find the price of the MacBook Pro M4 on Apple's website"
)
print(result.data)
asyncio.run(main())
Every scraped result is a validated Pydantic model. No more untyped dictionaries or runtime surprises. Your IDE catches errors before they happen.
WebPerception's AI extraction returns JSON that maps directly to your Pydantic models. Define the shape you want, get validated data back.
Pydantic AI's @agent.tool decorator + WebPerception = the cleanest web scraping integration in Python. Type hints, async/await, zero boilerplate.
Use Pydantic AI's RunContext to inject API keys, rate limiters, and configuration. Keep your tools clean and testable.
Agent results are typed too. Build agents that return ProductInfo, CompanyProfile, or any custom model โ not just strings.
Pydantic AI is async-first. Scrape multiple pages concurrently with WebPerception for blazing fast data collection.
Build an agent that compares prices across stores with validated output. Every price, currency, and availability flag is type-checked.
from pydantic import BaseModel
class PriceComparison(BaseModel):
product: str
cheapest_store: str
cheapest_price: float
all_prices: list[ProductInfo]
savings_percent: float
# Agent returns a typed PriceComparison
price_agent = Agent(
"openai:gpt-4",
result_type=PriceComparison,
system_prompt="Compare prices across stores using the extract tool.",
)
result = await price_agent.run("Compare MacBook Air prices on Amazon, Best Buy, and B&H")
print(f"Best deal: {result.data.cheapest_store} โ ${result.data.cheapest_price}")
Scrape company websites and build validated profiles with team info, funding, tech stack โ all as typed Pydantic models ready for your database.
class CompanyProfile(BaseModel):
name: str
description: str
founded: int | None = None
team_size: str | None = None
tech_stack: list[str] = []
pricing_url: str | None = None
profile_agent = Agent(
"openai:gpt-4",
result_type=CompanyProfile,
system_prompt="Research companies by scraping their website. Return structured profiles.",
)
Build a typed news aggregator that scrapes sources, extracts articles into validated models, and deduplicates by topic. Production-ready data pipelines with zero parsing bugs.
class NewsArticle(BaseModel):
title: str
source: str
published_date: str
summary: str
topics: list[str]
sentiment: float = Field(ge=-1.0, le=1.0)
class NewsBrief(BaseModel):
articles: list[NewsArticle]
top_topics: list[str]
overall_sentiment: str
news_agent = Agent("openai:gpt-4", result_type=NewsBrief)
100 free API calls/month. No credit card required.
Get Your API Key โ