⭐ POST /extract

AI-powered structured data extraction. Our most powerful endpoint.

POST https://api.mantisapi.com/v1/extract

The /extract endpoint combines web scraping with AI to return structured JSON data from any web page. Define a JSON schema for the data you want, optionally add a natural language prompt, and the API extracts exactly what you need.

Why this matters: Traditional scraping breaks when websites change their HTML. AI extraction understands page content, not structure. Your scrapers keep working even after site redesigns.

Request Body

Parameter	Type	Required	Description
url	string	Required	The URL to extract data from.
schema	object	Required	JSON Schema defining the structure of data to extract. Supports `object`, `array`, `string`, `number`, `integer`, `boolean`.
prompt	string	Optional	Natural language instruction to guide extraction. Improves accuracy for ambiguous pages.
render_js	boolean	Optional	Render JavaScript before extraction. Default: true
wait_for	string	Optional	CSS selector to wait for before extracting.
wait_ms	integer	Optional	Additional ms to wait. Max: 10000.
max_tokens	integer	Optional	Maximum tokens for AI processing. Higher = more thorough extraction. Default: 4096
proxy_country	string	Optional	Two-letter country code for geo-targeted access.

Schema Design

The schema parameter uses standard JSON Schema. Here are common patterns:

Extract a single object

{
  "schema": {
    "type": "object",
    "properties": {
      "company_name": { "type": "string" },
      "founded": { "type": "integer" },
      "headquarters": { "type": "string" },
      "ceo": { "type": "string" }
    }
  }
}

Extract an array of items

{
  "schema": {
    "type": "array",
    "items": {
      "type": "object",
      "properties": {
        "product": { "type": "string" },
        "price": { "type": "number" },
        "currency": { "type": "string" },
        "in_stock": { "type": "boolean" }
      }
    }
  }
}

Nested objects

{
  "schema": {
    "type": "object",
    "properties": {
      "article_title": { "type": "string" },
      "author": {
        "type": "object",
        "properties": {
          "name": { "type": "string" },
          "bio": { "type": "string" }
        }
      },
      "tags": {
        "type": "array",
        "items": { "type": "string" }
      }
    }
  }
}

Example: E-commerce Product Extraction

curl -X POST https://api.mantisapi.com/v1/extract \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://store.example.com/laptops",
    "schema": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "name": { "type": "string" },
          "price": { "type": "number" },
          "original_price": { "type": "number" },
          "discount_percent": { "type": "integer" },
          "rating": { "type": "number" },
          "review_count": { "type": "integer" },
          "specs": {
            "type": "object",
            "properties": {
              "cpu": { "type": "string" },
              "ram_gb": { "type": "integer" },
              "storage": { "type": "string" }
            }
          }
        }
      }
    },
    "prompt": "Extract all laptops with pricing, ratings, and key specifications"
  }'

Response

{
  "status": 200,
  "url": "https://store.example.com/laptops",
  "data": [
    {
      "name": "ThinkPad X1 Carbon Gen 11",
      "price": 1249.99,
      "original_price": 1549.99,
      "discount_percent": 19,
      "rating": 4.7,
      "review_count": 342,
      "specs": {
        "cpu": "Intel Core i7-1365U",
        "ram_gb": 16,
        "storage": "512GB SSD"
      }
    },
    {
      "name": "MacBook Air M3",
      "price": 1099.00,
      "original_price": 1099.00,
      "discount_percent": 0,
      "rating": 4.9,
      "review_count": 1205,
      "specs": {
        "cpu": "Apple M3",
        "ram_gb": 8,
        "storage": "256GB SSD"
      }
    }
  ],
  "tokens_used": 3241,
  "credits_used": 3,
  "credits_remaining": 4997
}

Example: Job Listing Extraction

import requests

resp = requests.post(
    "https://api.mantisapi.com/v1/extract",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    json={
        "url": "https://company.com/careers",
        "schema": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "title": {"type": "string"},
                    "department": {"type": "string"},
                    "location": {"type": "string"},
                    "remote": {"type": "boolean"},
                    "salary_range": {"type": "string"}
                }
            }
        },
        "prompt": "Extract all open positions. Mark remote as true if the listing mentions remote or hybrid."
    }
)

jobs = resp.json()["data"]
remote_jobs = [j for j in jobs if j["remote"]]
print(f"Found {len(remote_jobs)} remote positions out of {len(jobs)} total")

Prompt Tips

Be specific: "Extract the top 5 products" is better than "Get products"
Handle ambiguity: "If price is not shown, set to null" helps with missing data
Add context: "Prices are in EUR" prevents currency confusion
Set limits: "Extract at most 20 items" controls output size and token usage

Credit Usage

3 credits per extraction call (includes rendering + AI processing). This is the most powerful endpoint — one call replaces scraping + parsing + cleaning.

Error Handling

⚠️ If the AI cannot extract the requested data, the response will include "data": null with an "extraction_error" field explaining why. Common causes: page requires authentication, content is behind a paywall, or the schema doesn't match page content.