POST /extract

AI-powered structured data extraction. Our most powerful endpoint.

POST https://api.mantisapi.com/v1/extract

The /extract endpoint combines web scraping with AI to return structured JSON data from any web page. Define a JSON schema for the data you want, optionally add a natural language prompt, and the API extracts exactly what you need.

Why this matters: Traditional scraping breaks when websites change their HTML. AI extraction understands page content, not structure. Your scrapers keep working even after site redesigns.

Request Body

ParameterTypeRequiredDescription
urlstringRequiredThe URL to extract data from.
schemaobjectRequiredJSON Schema defining the structure of data to extract. Supports object, array, string, number, integer, boolean.
promptstringOptionalNatural language instruction to guide extraction. Improves accuracy for ambiguous pages.
render_jsbooleanOptionalRender JavaScript before extraction. Default: true
wait_forstringOptionalCSS selector to wait for before extracting.
wait_msintegerOptionalAdditional ms to wait. Max: 10000.
max_tokensintegerOptionalMaximum tokens for AI processing. Higher = more thorough extraction. Default: 4096
proxy_countrystringOptionalTwo-letter country code for geo-targeted access.

Schema Design

The schema parameter uses standard JSON Schema. Here are common patterns:

Extract a single object

{
  "schema": {
    "type": "object",
    "properties": {
      "company_name": { "type": "string" },
      "founded": { "type": "integer" },
      "headquarters": { "type": "string" },
      "ceo": { "type": "string" }
    }
  }
}

Extract an array of items

{
  "schema": {
    "type": "array",
    "items": {
      "type": "object",
      "properties": {
        "product": { "type": "string" },
        "price": { "type": "number" },
        "currency": { "type": "string" },
        "in_stock": { "type": "boolean" }
      }
    }
  }
}

Nested objects

{
  "schema": {
    "type": "object",
    "properties": {
      "article_title": { "type": "string" },
      "author": {
        "type": "object",
        "properties": {
          "name": { "type": "string" },
          "bio": { "type": "string" }
        }
      },
      "tags": {
        "type": "array",
        "items": { "type": "string" }
      }
    }
  }
}

Example: E-commerce Product Extraction

curl -X POST https://api.mantisapi.com/v1/extract \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://store.example.com/laptops",
    "schema": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "name": { "type": "string" },
          "price": { "type": "number" },
          "original_price": { "type": "number" },
          "discount_percent": { "type": "integer" },
          "rating": { "type": "number" },
          "review_count": { "type": "integer" },
          "specs": {
            "type": "object",
            "properties": {
              "cpu": { "type": "string" },
              "ram_gb": { "type": "integer" },
              "storage": { "type": "string" }
            }
          }
        }
      }
    },
    "prompt": "Extract all laptops with pricing, ratings, and key specifications"
  }'

Response

{
  "status": 200,
  "url": "https://store.example.com/laptops",
  "data": [
    {
      "name": "ThinkPad X1 Carbon Gen 11",
      "price": 1249.99,
      "original_price": 1549.99,
      "discount_percent": 19,
      "rating": 4.7,
      "review_count": 342,
      "specs": {
        "cpu": "Intel Core i7-1365U",
        "ram_gb": 16,
        "storage": "512GB SSD"
      }
    },
    {
      "name": "MacBook Air M3",
      "price": 1099.00,
      "original_price": 1099.00,
      "discount_percent": 0,
      "rating": 4.9,
      "review_count": 1205,
      "specs": {
        "cpu": "Apple M3",
        "ram_gb": 8,
        "storage": "256GB SSD"
      }
    }
  ],
  "tokens_used": 3241,
  "credits_used": 3,
  "credits_remaining": 4997
}

Example: Job Listing Extraction

import requests

resp = requests.post(
    "https://api.mantisapi.com/v1/extract",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    json={
        "url": "https://company.com/careers",
        "schema": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "title": {"type": "string"},
                    "department": {"type": "string"},
                    "location": {"type": "string"},
                    "remote": {"type": "boolean"},
                    "salary_range": {"type": "string"}
                }
            }
        },
        "prompt": "Extract all open positions. Mark remote as true if the listing mentions remote or hybrid."
    }
)

jobs = resp.json()["data"]
remote_jobs = [j for j in jobs if j["remote"]]
print(f"Found {len(remote_jobs)} remote positions out of {len(jobs)} total")

Prompt Tips

Credit Usage

3 credits per extraction call (includes rendering + AI processing). This is the most powerful endpoint — one call replaces scraping + parsing + cleaning.

Error Handling

⚠️ If the AI cannot extract the requested data, the response will include "data": null with an "extraction_error" field explaining why. Common causes: page requires authentication, content is behind a paywall, or the schema doesn't match page content.