Build Web Scraping Tools with the Vercel AI SDK

March 9, 2026 ยท 10 min read Vercel AI SDK Next.js TypeScript

The Vercel AI SDK is the most popular TypeScript framework for building AI-powered applications. With its tool() primitive, you can give any language model the ability to scrape websites, take screenshots, and extract structured data โ€” all within a Next.js or Node.js app.

This guide shows you how to build web scraping tools with the Vercel AI SDK and the WebPerception API.

Why Vercel AI SDK + Web Scraping?

The AI SDK's tool system is elegant: define a schema with Zod, write an execute function, and the model calls it automatically. Combined with web scraping, your AI apps can:

Prerequisites

npm install ai @ai-sdk/openai zod

Get your WebPerception API key at mantisapi.com.

Define Your Scraping Tools

import { tool } from 'ai';
import { z } from 'zod';

const scrapeTool = tool({
  description: 'Scrape a webpage and return its content as clean text or markdown',
  parameters: z.object({
    url: z.string().url().describe('The URL to scrape'),
    format: z.enum(['text', 'markdown']).default('markdown')
      .describe('Output format'),
  }),
  execute: async ({ url, format }) => {
    const response = await fetch('https://api.mantisapi.com/v1/scrape', {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${process.env.MANTIS_API_KEY}`,
        'Content-Type': 'application/json',
      },
      body: JSON.stringify({ url, format }),
    });
    const data = await response.json();
    return data.content;
  },
});

const screenshotTool = tool({
  description: 'Take a screenshot of a webpage',
  parameters: z.object({
    url: z.string().url().describe('The URL to screenshot'),
    fullPage: z.boolean().default(false)
      .describe('Capture the full page or just the viewport'),
  }),
  execute: async ({ url, fullPage }) => {
    const response = await fetch('https://api.mantisapi.com/v1/screenshot', {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${process.env.MANTIS_API_KEY}`,
        'Content-Type': 'application/json',
      },
      body: JSON.stringify({ url, full_page: fullPage }),
    });
    const data = await response.json();
    return { imageUrl: data.url, width: data.width, height: data.height };
  },
});

const extractTool = tool({
  description: 'Extract structured data from a webpage using AI',
  parameters: z.object({
    url: z.string().url().describe('The URL to extract data from'),
    prompt: z.string().describe('What data to extract'),
  }),
  execute: async ({ url, prompt }) => {
    const response = await fetch('https://api.mantisapi.com/v1/extract', {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${process.env.MANTIS_API_KEY}`,
        'Content-Type': 'application/json',
      },
      body: JSON.stringify({ url, prompt }),
    });
    return await response.json();
  },
});

Use Tools in a Next.js Chat Route (App Router)

// app/api/chat/route.ts
import { openai } from '@ai-sdk/openai';
import { streamText } from 'ai';

export async function POST(req: Request) {
  const { messages } = await req.json();

  const result = streamText({
    model: openai('gpt-4o'),
    messages,
    tools: {
      scrape: scrapeTool,
      screenshot: screenshotTool,
      extract: extractTool,
    },
    maxSteps: 5, // Allow multi-step tool use
  });

  return result.toDataStreamResponse();
}

Use Tools in a Node.js Script

import { openai } from '@ai-sdk/openai';
import { generateText } from 'ai';

const { text, toolResults } = await generateText({
  model: openai('gpt-4o'),
  tools: {
    scrape: scrapeTool,
    screenshot: screenshotTool,
    extract: extractTool,
  },
  maxSteps: 5,
  prompt: 'What are the top 3 trending repositories on GitHub right now?',
});

console.log(text);

Multi-Step Agent: Competitor Price Monitor

const result = await generateText({
  model: openai('gpt-4o'),
  tools: {
    scrape: scrapeTool,
    extract: extractTool,
  },
  maxSteps: 10,
  system: `You are a price monitoring agent. When asked to compare prices:
    1. Scrape each competitor's pricing page
    2. Extract pricing tiers and features
    3. Create a comparison summary`,
  prompt: 'Compare the pricing of Vercel, Netlify, and Cloudflare Pages',
});

Streaming Tool Results in the UI

The AI SDK's useChat hook handles tool calls automatically in your React components:

'use client';
import { useChat } from 'ai/react';

export function Chat() {
  const { messages, input, handleInputChange, handleSubmit } = useChat();

  return (
    <div>
      {messages.map(m => (
        <div key={m.id}>
          <strong>{m.role}:</strong>
          {m.toolInvocations?.map(tool => (
            <div key={tool.toolCallId}>
              ๐Ÿ”ง Used {tool.toolName}
              {tool.state === 'result' && (
                <pre>{JSON.stringify(tool.result, null, 2)}</pre>
              )}
            </div>
          ))}
          {m.content}
        </div>
      ))}
      <form onSubmit={handleSubmit}>
        <input value={input} onChange={handleInputChange} />
      </form>
    </div>
  );
}

Best Practices

  1. Set maxSteps โ€” Without it, the model makes one tool call and stops. Set 5-10 for agent-like behavior.
  2. Cache scrape results โ€” Use Vercel KV or Redis to avoid re-scraping the same URL within a time window.
  3. Add rate limiting โ€” Protect your API credits with middleware on your chat route.
  4. Use streaming โ€” streamText + useChat gives real-time feedback as tools execute.
  5. Error handling โ€” Wrap tool execute functions in try/catch and return error messages the model can interpret.

Cost Optimization

OperationWebPerception CostTypical Use
Scrape1 creditContent extraction
Screenshot1 creditVisual verification
AI Extract2 creditsStructured data

With the Free tier (100 calls/month), you can build and test extensively before upgrading.

What You Can Build

Start Building with WebPerception API

Get 100 free API calls/month. No credit card required.

Get Your API Key โ†’

Next Steps