Build an MCP Web Scraping Server: Give Any AI Agent Real-Time Web Access

March 9, 2026 · 12 min read MCP AI Agents

The Model Context Protocol (MCP) is changing how AI agents interact with the world. Instead of hard-coding tool integrations into every agent, MCP provides a universal standard — build one server, and every MCP-compatible client (Claude Desktop, Cursor, Windsurf, custom agents) can use it.

In this guide, you'll build an MCP server that gives any AI agent three powerful capabilities:

Scrape any webpage and get clean, structured content
Screenshot any URL for visual analysis
Extract structured data using AI (prices, contacts, product info)

All powered by the WebPerception API.

What is MCP?

The Model Context Protocol is an open standard created by Anthropic that defines how AI applications connect to external tools and data sources. Think of it as USB-C for AI agents — one universal connector instead of a different cable for every device.

Why MCP matters for web scraping:

Write once, use everywhere — Your scraping server works with Claude Desktop, Cursor, Windsurf, and any MCP-compatible client
Standardized interface — Agents discover your tools automatically via the MCP protocol
Type-safe — Input schemas are defined upfront, reducing errors
Composable — Agents can chain your scraping tools with other MCP servers (databases, file systems, APIs)

Prerequisites

Node.js 18+ (we'll use TypeScript)
WebPerception API key — Get one free (100 calls/month on the free tier)
Basic familiarity with TypeScript and npm

Project Setup

mkdir mcp-web-scraper
cd mcp-web-scraper
npm init -y
npm install @modelcontextprotocol/sdk zod node-fetch
npm install -D typescript @types/node

Create tsconfig.json:

{
  "compilerOptions": {
    "target": "ES2022",
    "module": "Node16",
    "moduleResolution": "Node16",
    "outDir": "./dist",
    "rootDir": "./src",
    "strict": true,
    "esModuleInterop": true
  },
  "include": ["src/**/*"]
}

Building the MCP Server

Create src/index.ts:

import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import { z } from "zod";

const API_KEY = process.env.WEBPERCEPTION_API_KEY;
const API_BASE = "https://api.mantisapi.com/v1";

if (!API_KEY) {
  console.error("Set WEBPERCEPTION_API_KEY environment variable");
  process.exit(1);
}

const server = new McpServer({
  name: "web-scraper",
  version: "1.0.0",
});

Tool 1: Scrape a Webpage

server.tool(
  "scrape_url",
  "Scrape a webpage and return its content as clean text or markdown. " +
    "Use this to read articles, documentation, product pages, or any web content.",
  {
    url: z.string().url().describe("The URL to scrape"),
    format: z.enum(["text", "markdown", "html"]).default("markdown")
      .describe("Output format for the scraped content"),
    wait_for: z.string().optional()
      .describe("CSS selector to wait for before scraping"),
  },
  async ({ url, format, wait_for }) => {
    const response = await fetch(`${API_BASE}/scrape`, {
      method: "POST",
      headers: {
        Authorization: `Bearer ${API_KEY}`,
        "Content-Type": "application/json",
      },
      body: JSON.stringify({ url, format, wait_for, js_rendering: true }),
    });

    if (!response.ok) {
      const error = await response.text();
      return {
        content: [{ type: "text", text: `Error scraping ${url}: ${response.status} — ${error}` }],
        isError: true,
      };
    }

    const data = await response.json();
    return {
      content: [{ type: "text", text: `# Scraped: ${url}\n\n${data.content}` }],
    };
  }
);

Tool 2: Screenshot a Page

server.tool(
  "screenshot_url",
  "Take a screenshot of any webpage. Returns the image as base64.",
  {
    url: z.string().url().describe("The URL to screenshot"),
    full_page: z.boolean().default(false)
      .describe("Capture the full scrollable page"),
    width: z.number().default(1280).describe("Viewport width in pixels"),
    height: z.number().default(720).describe("Viewport height in pixels"),
  },
  async ({ url, full_page, width, height }) => {
    const response = await fetch(`${API_BASE}/screenshot`, {
      method: "POST",
      headers: {
        Authorization: `Bearer ${API_KEY}`,
        "Content-Type": "application/json",
      },
      body: JSON.stringify({ url, full_page, viewport: { width, height } }),
    });

    if (!response.ok) {
      const error = await response.text();
      return {
        content: [{ type: "text", text: `Error: ${response.status} — ${error}` }],
        isError: true,
      };
    }

    const data = await response.json();
    return {
      content: [
        { type: "image", data: data.screenshot, mimeType: "image/png" },
        { type: "text", text: `Screenshot of ${url} (${width}x${height})` },
      ],
    };
  }
);

Tool 3: AI-Powered Data Extraction

server.tool(
  "extract_data",
  "Extract structured data from a webpage using AI. " +
    "Describe what data you want and get it back as JSON.",
  {
    url: z.string().url().describe("The URL to extract data from"),
    prompt: z.string().describe("Describe what data to extract"),
    schema: z.record(z.string()).optional()
      .describe("Optional JSON schema hint for output structure"),
  },
  async ({ url, prompt, schema }) => {
    const response = await fetch(`${API_BASE}/extract`, {
      method: "POST",
      headers: {
        Authorization: `Bearer ${API_KEY}`,
        "Content-Type": "application/json",
      },
      body: JSON.stringify({ url, prompt, schema, js_rendering: true }),
    });

    if (!response.ok) {
      const error = await response.text();
      return {
        content: [{ type: "text", text: `Error: ${response.status} — ${error}` }],
        isError: true,
      };
    }

    const data = await response.json();
    return {
      content: [{
        type: "text",
        text: `# Extracted Data\n\n\`\`\`json\n${JSON.stringify(data.extracted, null, 2)}\n\`\`\``
      }],
    };
  }
);

// Start the server
async function main() {
  const transport = new StdioServerTransport();
  await server.connect(transport);
  console.error("WebPerception MCP Server running on stdio");
}

main().catch(console.error);

Connect to Claude Desktop

Build the project first:

npx tsc

Add to your Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json on macOS):

{
  "mcpServers": {
    "web-scraper": {
      "command": "node",
      "args": ["/path/to/mcp-web-scraper/dist/index.js"],
      "env": {
        "WEBPERCEPTION_API_KEY": "sk_live_your_key_here"
      }
    }
  }
}

Restart Claude Desktop. You'll see three new tools available: scrape_url, screenshot_url, and extract_data.

Connect to Cursor / Windsurf

For Cursor, add to .cursor/mcp.json in your project:

{
  "mcpServers": {
    "web-scraper": {
      "command": "node",
      "args": ["./mcp-web-scraper/dist/index.js"],
      "env": {
        "WEBPERCEPTION_API_KEY": "sk_live_your_key_here"
      }
    }
  }
}

Real-World Use Cases

1. Research Agent

Ask Claude: "Research the top 5 YC companies from the current batch and extract their name, description, and funding amount"

Claude will use scrape_url to read the YC directory, then extract_data to pull structured company data.

2. Price Monitor

Ask Claude: "Check the price of the MacBook Pro M4 on Apple.com and Best Buy, compare them"

Claude scrapes both pages, extracts prices, and gives you a comparison.

3. Visual QA Testing

Ask Claude: "Screenshot our landing page at 1280px and 375px widths, compare the layouts"

Claude takes two screenshots and analyzes responsive design issues.

4. Competitive Intelligence

Ask Claude: "Extract the pricing tiers from [competitor].com/pricing and compare to our pricing"

Claude scrapes competitor pricing, extracts the data, and generates a comparison table.

Cost Optimization

Action	API Calls	Cost (Pro Plan)
Scrape one page	1	$0.004
Screenshot	1	$0.004
AI extraction	1	$0.004
Research 10 companies	~12	$0.048
Daily price monitoring (5 products)	5	$0.02

At $99/month (Pro plan, 25K calls), you can run ~830 research tasks per month.

Performance Tips

Use wait_for on JS-heavy sites — Pass a CSS selector to ensure dynamic content has loaded
Prefer markdown format — It's the most token-efficient format for LLMs
Cache results — Add a simple in-memory cache for repeated scrapes
Batch extractions — Use one detailed prompt instead of multiple calls

What's Next?

You now have a production-ready MCP server that gives any AI agent real-time web access. Extend it by:

Adding caching with Redis or SQLite
Adding a search_web tool that combines search with scraping
Adding rate limiting to prevent runaway agent loops
Publishing to npm for easy npx installation

Get Started Free

100 API calls/month on the free tier. No credit card required.

Get Your API Key →