Build an MCP Web Scraping Server: Give Any AI Agent Real-Time Web Access
The Model Context Protocol (MCP) is changing how AI agents interact with the world. Instead of hard-coding tool integrations into every agent, MCP provides a universal standard โ build one server, and every MCP-compatible client (Claude Desktop, Cursor, Windsurf, custom agents) can use it.
In this guide, you'll build an MCP server that gives any AI agent three powerful capabilities:
- Scrape any webpage and get clean, structured content
- Screenshot any URL for visual analysis
- Extract structured data using AI (prices, contacts, product info)
All powered by the WebPerception API.
What is MCP?
The Model Context Protocol is an open standard created by Anthropic that defines how AI applications connect to external tools and data sources. Think of it as USB-C for AI agents โ one universal connector instead of a different cable for every device.
Why MCP matters for web scraping:
- Write once, use everywhere โ Your scraping server works with Claude Desktop, Cursor, Windsurf, and any MCP-compatible client
- Standardized interface โ Agents discover your tools automatically via the MCP protocol
- Type-safe โ Input schemas are defined upfront, reducing errors
- Composable โ Agents can chain your scraping tools with other MCP servers (databases, file systems, APIs)
Prerequisites
- Node.js 18+ (we'll use TypeScript)
- WebPerception API key โ Get one free (100 calls/month on the free tier)
- Basic familiarity with TypeScript and npm
Project Setup
mkdir mcp-web-scraper
cd mcp-web-scraper
npm init -y
npm install @modelcontextprotocol/sdk zod node-fetch
npm install -D typescript @types/node
Create tsconfig.json:
{
"compilerOptions": {
"target": "ES2022",
"module": "Node16",
"moduleResolution": "Node16",
"outDir": "./dist",
"rootDir": "./src",
"strict": true,
"esModuleInterop": true
},
"include": ["src/**/*"]
}
Building the MCP Server
Create src/index.ts:
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import { z } from "zod";
const API_KEY = process.env.WEBPERCEPTION_API_KEY;
const API_BASE = "https://api.mantisapi.com/v1";
if (!API_KEY) {
console.error("Set WEBPERCEPTION_API_KEY environment variable");
process.exit(1);
}
const server = new McpServer({
name: "web-scraper",
version: "1.0.0",
});
Tool 1: Scrape a Webpage
server.tool(
"scrape_url",
"Scrape a webpage and return its content as clean text or markdown. " +
"Use this to read articles, documentation, product pages, or any web content.",
{
url: z.string().url().describe("The URL to scrape"),
format: z.enum(["text", "markdown", "html"]).default("markdown")
.describe("Output format for the scraped content"),
wait_for: z.string().optional()
.describe("CSS selector to wait for before scraping"),
},
async ({ url, format, wait_for }) => {
const response = await fetch(`${API_BASE}/scrape`, {
method: "POST",
headers: {
Authorization: `Bearer ${API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({ url, format, wait_for, js_rendering: true }),
});
if (!response.ok) {
const error = await response.text();
return {
content: [{ type: "text", text: `Error scraping ${url}: ${response.status} โ ${error}` }],
isError: true,
};
}
const data = await response.json();
return {
content: [{ type: "text", text: `# Scraped: ${url}\n\n${data.content}` }],
};
}
);
Tool 2: Screenshot a Page
server.tool(
"screenshot_url",
"Take a screenshot of any webpage. Returns the image as base64.",
{
url: z.string().url().describe("The URL to screenshot"),
full_page: z.boolean().default(false)
.describe("Capture the full scrollable page"),
width: z.number().default(1280).describe("Viewport width in pixels"),
height: z.number().default(720).describe("Viewport height in pixels"),
},
async ({ url, full_page, width, height }) => {
const response = await fetch(`${API_BASE}/screenshot`, {
method: "POST",
headers: {
Authorization: `Bearer ${API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({ url, full_page, viewport: { width, height } }),
});
if (!response.ok) {
const error = await response.text();
return {
content: [{ type: "text", text: `Error: ${response.status} โ ${error}` }],
isError: true,
};
}
const data = await response.json();
return {
content: [
{ type: "image", data: data.screenshot, mimeType: "image/png" },
{ type: "text", text: `Screenshot of ${url} (${width}x${height})` },
],
};
}
);
Tool 3: AI-Powered Data Extraction
server.tool(
"extract_data",
"Extract structured data from a webpage using AI. " +
"Describe what data you want and get it back as JSON.",
{
url: z.string().url().describe("The URL to extract data from"),
prompt: z.string().describe("Describe what data to extract"),
schema: z.record(z.string()).optional()
.describe("Optional JSON schema hint for output structure"),
},
async ({ url, prompt, schema }) => {
const response = await fetch(`${API_BASE}/extract`, {
method: "POST",
headers: {
Authorization: `Bearer ${API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({ url, prompt, schema, js_rendering: true }),
});
if (!response.ok) {
const error = await response.text();
return {
content: [{ type: "text", text: `Error: ${response.status} โ ${error}` }],
isError: true,
};
}
const data = await response.json();
return {
content: [{
type: "text",
text: `# Extracted Data\n\n\`\`\`json\n${JSON.stringify(data.extracted, null, 2)}\n\`\`\``
}],
};
}
);
// Start the server
async function main() {
const transport = new StdioServerTransport();
await server.connect(transport);
console.error("WebPerception MCP Server running on stdio");
}
main().catch(console.error);
Connect to Claude Desktop
Build the project first:
npx tsc
Add to your Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json on macOS):
{
"mcpServers": {
"web-scraper": {
"command": "node",
"args": ["/path/to/mcp-web-scraper/dist/index.js"],
"env": {
"WEBPERCEPTION_API_KEY": "sk_live_your_key_here"
}
}
}
}
Restart Claude Desktop. You'll see three new tools available: scrape_url, screenshot_url, and extract_data.
Connect to Cursor / Windsurf
For Cursor, add to .cursor/mcp.json in your project:
{
"mcpServers": {
"web-scraper": {
"command": "node",
"args": ["./mcp-web-scraper/dist/index.js"],
"env": {
"WEBPERCEPTION_API_KEY": "sk_live_your_key_here"
}
}
}
}
Real-World Use Cases
1. Research Agent
Ask Claude: "Research the top 5 YC companies from the current batch and extract their name, description, and funding amount"
Claude will use scrape_url to read the YC directory, then extract_data to pull structured company data.
2. Price Monitor
Ask Claude: "Check the price of the MacBook Pro M4 on Apple.com and Best Buy, compare them"
Claude scrapes both pages, extracts prices, and gives you a comparison.
3. Visual QA Testing
Ask Claude: "Screenshot our landing page at 1280px and 375px widths, compare the layouts"
Claude takes two screenshots and analyzes responsive design issues.
4. Competitive Intelligence
Ask Claude: "Extract the pricing tiers from [competitor].com/pricing and compare to our pricing"
Claude scrapes competitor pricing, extracts the data, and generates a comparison table.
Cost Optimization
| Action | API Calls | Cost (Pro Plan) |
|---|---|---|
| Scrape one page | 1 | $0.004 |
| Screenshot | 1 | $0.004 |
| AI extraction | 1 | $0.004 |
| Research 10 companies | ~12 | $0.048 |
| Daily price monitoring (5 products) | 5 | $0.02 |
At $99/month (Pro plan, 25K calls), you can run ~830 research tasks per month.
Performance Tips
- Use
wait_foron JS-heavy sites โ Pass a CSS selector to ensure dynamic content has loaded - Prefer
markdownformat โ It's the most token-efficient format for LLMs - Cache results โ Add a simple in-memory cache for repeated scrapes
- Batch extractions โ Use one detailed prompt instead of multiple calls
What's Next?
You now have a production-ready MCP server that gives any AI agent real-time web access. Extend it by:
- Adding caching with Redis or SQLite
- Adding a
search_webtool that combines search with scraping - Adding rate limiting to prevent runaway agent loops
- Publishing to npm for easy
npxinstallation
Get Started Free
100 API calls/month on the free tier. No credit card required.
Get Your API Key โ