POST /scrape

Fetch HTML content, rendered text, and metadata from any URL.

POST https://api.mantisapi.com/v1/scrape

The /scrape endpoint fetches web page content with optional JavaScript rendering. Returns raw HTML, clean text, and page metadata. Use this when you need the full page content for your own parsing logic.

💡 Tip: If you want structured data (JSON), use POST /extract instead — it combines scraping with AI extraction in a single call.

Request Body

ParameterTypeRequiredDescription
url string Required The URL to scrape. Must include protocol (http:// or https://).
render_js boolean Optional Enable headless browser rendering for JavaScript-heavy pages (SPAs, React, etc.). Default: false
wait_for string Optional CSS selector to wait for before capturing content. Only works with render_js: true. Example: "#main-content"
wait_ms integer Optional Additional milliseconds to wait after page load. Max: 10000. Default: 0
headers object Optional Custom HTTP headers to send with the request. Example: {"Accept-Language": "en-US"}
proxy_country string Optional Two-letter country code for geo-targeted proxying. Example: "US", "DE", "JP"
include_html boolean Optional Include raw HTML in response. Default: true
include_text boolean Optional Include cleaned plain text in response. Default: true
include_links boolean Optional Include extracted links array in response. Default: false

Example Request

curl -X POST https://api.mantisapi.com/v1/scrape \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/products",
    "render_js": true,
    "wait_for": ".product-list",
    "wait_ms": 2000,
    "include_links": true,
    "proxy_country": "US"
  }'

Response

{
  "status": 200,
  "url": "https://example.com/products",
  "final_url": "https://example.com/products?ref=organic",
  "html": "<!DOCTYPE html>...",
  "text": "Products\n\nWidget Pro — $49.99\nWidget Basic — $29.99...",
  "links": [
    { "text": "Widget Pro", "href": "https://example.com/products/widget-pro" },
    { "text": "Widget Basic", "href": "https://example.com/products/widget-basic" }
  ],
  "metadata": {
    "title": "Our Products — Example Store",
    "description": "Browse our full product catalog.",
    "og_image": "https://example.com/og-products.jpg",
    "response_time_ms": 1842,
    "rendered": true
  },
  "credits_used": 1,
  "credits_remaining": 4999
}

Response Fields

FieldTypeDescription
statusintegerHTTP status code of the target page
urlstringOriginal requested URL
final_urlstringFinal URL after redirects
htmlstringRaw HTML content (if include_html is true)
textstringCleaned plain text (if include_text is true)
linksarrayExtracted links with text and href (if include_links is true)
metadataobjectPage metadata: title, description, OG tags, response time
credits_usedintegerAPI credits consumed by this call
credits_remainingintegerRemaining credits in your billing period

Status Codes

200 Success — page content returned

400 Bad request — invalid URL or parameters

401 Unauthorized — invalid or missing API key

422 Target page returned an error (details in response body)

429 Rate limit exceeded — slow down requests

500 Internal server error — retry with exponential backoff

504 Timeout — target page took too long to respond

Credit Usage

1 credit per standard scrape request. 2 credits when render_js: true (headless browser rendering). Proxy routing does not consume additional credits.

Python Example

import requests

# Scrape a JavaScript-heavy SPA
resp = requests.post(
    "https://api.mantisapi.com/v1/scrape",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    json={
        "url": "https://spa-example.com/dashboard",
        "render_js": True,
        "wait_for": "#data-table",
        "wait_ms": 3000
    }
)

data = resp.json()
print(f"Title: {data['metadata']['title']}")
print(f"Text length: {len(data['text'])} chars")
print(f"Credits remaining: {data['credits_remaining']}")