⭐ POST /extract
AI-powered structured data extraction. Our most powerful endpoint.
The /extract endpoint combines web scraping with AI to return structured JSON data from any web page. Define a JSON schema for the data you want, optionally add a natural language prompt, and the API extracts exactly what you need.
Why this matters: Traditional scraping breaks when websites change their HTML. AI extraction understands page content, not structure. Your scrapers keep working even after site redesigns.
Request Body
| Parameter | Type | Required | Description |
|---|---|---|---|
| url | string | Required | The URL to extract data from. |
| schema | object | Required | JSON Schema defining the structure of data to extract. Supports object, array, string, number, integer, boolean. |
| prompt | string | Optional | Natural language instruction to guide extraction. Improves accuracy for ambiguous pages. |
| render_js | boolean | Optional | Render JavaScript before extraction. Default: true |
| wait_for | string | Optional | CSS selector to wait for before extracting. |
| wait_ms | integer | Optional | Additional ms to wait. Max: 10000. |
| max_tokens | integer | Optional | Maximum tokens for AI processing. Higher = more thorough extraction. Default: 4096 |
| proxy_country | string | Optional | Two-letter country code for geo-targeted access. |
Schema Design
The schema parameter uses standard JSON Schema. Here are common patterns:
Extract a single object
{
"schema": {
"type": "object",
"properties": {
"company_name": { "type": "string" },
"founded": { "type": "integer" },
"headquarters": { "type": "string" },
"ceo": { "type": "string" }
}
}
}
Extract an array of items
{
"schema": {
"type": "array",
"items": {
"type": "object",
"properties": {
"product": { "type": "string" },
"price": { "type": "number" },
"currency": { "type": "string" },
"in_stock": { "type": "boolean" }
}
}
}
}
Nested objects
{
"schema": {
"type": "object",
"properties": {
"article_title": { "type": "string" },
"author": {
"type": "object",
"properties": {
"name": { "type": "string" },
"bio": { "type": "string" }
}
},
"tags": {
"type": "array",
"items": { "type": "string" }
}
}
}
}
Example: E-commerce Product Extraction
curl -X POST https://api.mantisapi.com/v1/extract \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://store.example.com/laptops",
"schema": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": { "type": "string" },
"price": { "type": "number" },
"original_price": { "type": "number" },
"discount_percent": { "type": "integer" },
"rating": { "type": "number" },
"review_count": { "type": "integer" },
"specs": {
"type": "object",
"properties": {
"cpu": { "type": "string" },
"ram_gb": { "type": "integer" },
"storage": { "type": "string" }
}
}
}
}
},
"prompt": "Extract all laptops with pricing, ratings, and key specifications"
}'
Response
{
"status": 200,
"url": "https://store.example.com/laptops",
"data": [
{
"name": "ThinkPad X1 Carbon Gen 11",
"price": 1249.99,
"original_price": 1549.99,
"discount_percent": 19,
"rating": 4.7,
"review_count": 342,
"specs": {
"cpu": "Intel Core i7-1365U",
"ram_gb": 16,
"storage": "512GB SSD"
}
},
{
"name": "MacBook Air M3",
"price": 1099.00,
"original_price": 1099.00,
"discount_percent": 0,
"rating": 4.9,
"review_count": 1205,
"specs": {
"cpu": "Apple M3",
"ram_gb": 8,
"storage": "256GB SSD"
}
}
],
"tokens_used": 3241,
"credits_used": 3,
"credits_remaining": 4997
}
Example: Job Listing Extraction
import requests
resp = requests.post(
"https://api.mantisapi.com/v1/extract",
headers={"Authorization": "Bearer YOUR_API_KEY"},
json={
"url": "https://company.com/careers",
"schema": {
"type": "array",
"items": {
"type": "object",
"properties": {
"title": {"type": "string"},
"department": {"type": "string"},
"location": {"type": "string"},
"remote": {"type": "boolean"},
"salary_range": {"type": "string"}
}
}
},
"prompt": "Extract all open positions. Mark remote as true if the listing mentions remote or hybrid."
}
)
jobs = resp.json()["data"]
remote_jobs = [j for j in jobs if j["remote"]]
print(f"Found {len(remote_jobs)} remote positions out of {len(jobs)} total")
Prompt Tips
- Be specific: "Extract the top 5 products" is better than "Get products"
- Handle ambiguity: "If price is not shown, set to null" helps with missing data
- Add context: "Prices are in EUR" prevents currency confusion
- Set limits: "Extract at most 20 items" controls output size and token usage
Credit Usage
3 credits per extraction call (includes rendering + AI processing). This is the most powerful endpoint — one call replaces scraping + parsing + cleaning.
Error Handling
⚠️ If the AI cannot extract the requested data, the response will include "data": null with an "extraction_error" field explaining why. Common causes: page requires authentication, content is behind a paywall, or the schema doesn't match page content.