Agent tool definition for OpenAI function calling
---|--------------|-------------|-------------|-------|-----------|
| AI Extraction | ✅ Built-in | ❌ | ❌ | ❌ | ❌ |
| Screenshots | ✅ | ❌ | ❌ | Via actors | ❌ |
| JS Rendering | ✅ | ✅ | ✅ | ✅ | ✅ |
| Anti-bot Bypass | ✅ | ✅ | ✅ | Varies | ✅ |
| Structured JSON | ✅ Auto | ❌ Manual | ❌ Manual | Varies | ❌ |
| Free Tier | ✅ 100/mo | ❌ | ❌ | ✅ Limited | ❌ |
| Starting Price | $29/mo | $49/mo | $500/mo | $49/mo | ~$30/mo |
| Agent-Ready | ✅ | ❌ | ❌ | Partial | ❌ |
When to Use a Web Scraping API vs. DIY
Use an API when:
- You need to scrape many different websites (not just one)
- Anti-bot systems are blocking your scrapers
- You want structured data without writing selectors
- You're building an AI agent that needs web access
- Your team doesn't have dedicated scraping engineers
- You need to scale quickly
Build DIY when:
- You're scraping one specific website with simple, static HTML
- You need full control over the browser session (cookies, auth flows)
- Cost per request matters more than development time
- You have a dedicated scraping engineering team
Building an AI Agent with a Web Scraping API
One of the fastest-growing use cases for web scraping APIs is powering AI agents. Agents need to perceive the web — read pages, extract data, take screenshots — to complete tasks for users.
Here's how you'd give an AI agent web perception using the WebPerception API:
import openai
import requests
MANTIS_KEY = "your_api_key"
def perceive_web(url: str, extract_fields: list = None):
"""Give an AI agent the ability to perceive any web page."""
payload = {"url": url, "screenshot": True}
if extract_fields:
payload["extract"] = {"data": {"fields": extract_fields}}
resp = requests.post(
"https://api.mantisapi.com/v1/perceive",
json=payload,
headers={"Authorization": f"Bearer {MANTIS_KEY}"}
)
return resp.json()
# Agent tool definition for OpenAI function calling
tools = [{
"type": "function",
"function": {
"name": "perceive_web",
"description": "View a web page: get screenshot, HTML, and extracted data",
"parameters": {
"type": "object",
"properties": {
"url": {"type": "string", "description": "URL to perceive"},
"extract_fields": {
"type": "array",
"items": {"type": "string"},
"description": "Data fields to extract (e.g., ['price', 'title'])"
}
},
"required": ["url"]
}
}
}]
# Now your agent can browse and understand ANY website
response = openai.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": "What's the top story on Hacker News?"}],
tools=tools
)
This pattern — giving agents web perception through an API — is replacing traditional web scraping for many use cases.
Getting Started
Sign up at mantisapi.com — free tier includes 100 API calls/month
Get your API key from the dashboard
Make your first call:
curl -X POST https://api.mantisapi.com/v1/scrape \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com"}'
Add AI extraction by including an extract field in your request
Scale up when you need more than 100 calls/month
Conclusion
Web scraping APIs have transformed how developers collect data from the web. Instead of maintaining fragile scraper infrastructure, you make an API call and get structured data back.
For AI agents and modern applications, WebPerception API offers the most complete solution — combining scraping, screenshots, and AI-powered extraction in a single call. No selectors to maintain, no browsers to manage, no anti-bot systems to fight.
The future of web scraping isn't scraping at all — it's perception.