POST /scrape
Fetch HTML content, rendered text, and metadata from any URL.
The /scrape endpoint fetches web page content with optional JavaScript rendering. Returns raw HTML, clean text, and page metadata. Use this when you need the full page content for your own parsing logic.
💡 Tip: If you want structured data (JSON), use POST /extract instead — it combines scraping with AI extraction in a single call.
Request Body
| Parameter | Type | Required | Description |
|---|---|---|---|
| url | string | Required | The URL to scrape. Must include protocol (http:// or https://). |
| render_js | boolean | Optional | Enable headless browser rendering for JavaScript-heavy pages (SPAs, React, etc.). Default: false |
| wait_for | string | Optional | CSS selector to wait for before capturing content. Only works with render_js: true. Example: "#main-content" |
| wait_ms | integer | Optional | Additional milliseconds to wait after page load. Max: 10000. Default: 0 |
| headers | object | Optional | Custom HTTP headers to send with the request. Example: {"Accept-Language": "en-US"} |
| proxy_country | string | Optional | Two-letter country code for geo-targeted proxying. Example: "US", "DE", "JP" |
| include_html | boolean | Optional | Include raw HTML in response. Default: true |
| include_text | boolean | Optional | Include cleaned plain text in response. Default: true |
| include_links | boolean | Optional | Include extracted links array in response. Default: false |
Example Request
curl -X POST https://api.mantisapi.com/v1/scrape \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com/products",
"render_js": true,
"wait_for": ".product-list",
"wait_ms": 2000,
"include_links": true,
"proxy_country": "US"
}'
Response
{
"status": 200,
"url": "https://example.com/products",
"final_url": "https://example.com/products?ref=organic",
"html": "<!DOCTYPE html>...",
"text": "Products\n\nWidget Pro — $49.99\nWidget Basic — $29.99...",
"links": [
{ "text": "Widget Pro", "href": "https://example.com/products/widget-pro" },
{ "text": "Widget Basic", "href": "https://example.com/products/widget-basic" }
],
"metadata": {
"title": "Our Products — Example Store",
"description": "Browse our full product catalog.",
"og_image": "https://example.com/og-products.jpg",
"response_time_ms": 1842,
"rendered": true
},
"credits_used": 1,
"credits_remaining": 4999
}
Response Fields
| Field | Type | Description |
|---|---|---|
| status | integer | HTTP status code of the target page |
| url | string | Original requested URL |
| final_url | string | Final URL after redirects |
| html | string | Raw HTML content (if include_html is true) |
| text | string | Cleaned plain text (if include_text is true) |
| links | array | Extracted links with text and href (if include_links is true) |
| metadata | object | Page metadata: title, description, OG tags, response time |
| credits_used | integer | API credits consumed by this call |
| credits_remaining | integer | Remaining credits in your billing period |
Status Codes
200 Success — page content returned
400 Bad request — invalid URL or parameters
401 Unauthorized — invalid or missing API key
422 Target page returned an error (details in response body)
429 Rate limit exceeded — slow down requests
500 Internal server error — retry with exponential backoff
504 Timeout — target page took too long to respond
Credit Usage
1 credit per standard scrape request. 2 credits when render_js: true (headless browser rendering). Proxy routing does not consume additional credits.
Python Example
import requests
# Scrape a JavaScript-heavy SPA
resp = requests.post(
"https://api.mantisapi.com/v1/scrape",
headers={"Authorization": "Bearer YOUR_API_KEY"},
json={
"url": "https://spa-example.com/dashboard",
"render_js": True,
"wait_for": "#data-table",
"wait_ms": 3000
}
)
data = resp.json()
print(f"Title: {data['metadata']['title']}")
print(f"Text length: {len(data['text'])} chars")
print(f"Credits remaining: {data['credits_remaining']}")