POST /scrape

Fetch HTML content, rendered text, and metadata from any URL.

POST https://api.mantisapi.com/v1/scrape

The /scrape endpoint fetches web page content with optional JavaScript rendering. Returns raw HTML, clean text, and page metadata. Use this when you need the full page content for your own parsing logic.

💡 Tip: If you want structured data (JSON), use POST /extract instead — it combines scraping with AI extraction in a single call.

Request Body

Parameter	Type	Required	Description
url	string	Required	The URL to scrape. Must include protocol (http:// or https://).
render_js	boolean	Optional	Enable headless browser rendering for JavaScript-heavy pages (SPAs, React, etc.). Default: false
wait_for	string	Optional	CSS selector to wait for before capturing content. Only works with `render_js: true`. Example: `"#main-content"`
wait_ms	integer	Optional	Additional milliseconds to wait after page load. Max: 10000. Default: 0
headers	object	Optional	Custom HTTP headers to send with the request. Example: `{"Accept-Language": "en-US"}`
proxy_country	string	Optional	Two-letter country code for geo-targeted proxying. Example: `"US"`, `"DE"`, `"JP"`
include_html	boolean	Optional	Include raw HTML in response. Default: true
include_text	boolean	Optional	Include cleaned plain text in response. Default: true
include_links	boolean	Optional	Include extracted links array in response. Default: false

Example Request

curl -X POST https://api.mantisapi.com/v1/scrape \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/products",
    "render_js": true,
    "wait_for": ".product-list",
    "wait_ms": 2000,
    "include_links": true,
    "proxy_country": "US"
  }'

Response

{
  "status": 200,
  "url": "https://example.com/products",
  "final_url": "https://example.com/products?ref=organic",
  "html": "<!DOCTYPE html>...",
  "text": "Products\n\nWidget Pro — $49.99\nWidget Basic — $29.99...",
  "links": [
    { "text": "Widget Pro", "href": "https://example.com/products/widget-pro" },
    { "text": "Widget Basic", "href": "https://example.com/products/widget-basic" }
  ],
  "metadata": {
    "title": "Our Products — Example Store",
    "description": "Browse our full product catalog.",
    "og_image": "https://example.com/og-products.jpg",
    "response_time_ms": 1842,
    "rendered": true
  },
  "credits_used": 1,
  "credits_remaining": 4999
}

Response Fields

Field	Type	Description
status	integer	HTTP status code of the target page
url	string	Original requested URL
final_url	string	Final URL after redirects
html	string	Raw HTML content (if include_html is true)
text	string	Cleaned plain text (if include_text is true)
links	array	Extracted links with text and href (if include_links is true)
metadata	object	Page metadata: title, description, OG tags, response time
credits_used	integer	API credits consumed by this call
credits_remaining	integer	Remaining credits in your billing period

Status Codes

200 Success — page content returned

400 Bad request — invalid URL or parameters

401 Unauthorized — invalid or missing API key

422 Target page returned an error (details in response body)

429 Rate limit exceeded — slow down requests

500 Internal server error — retry with exponential backoff

504 Timeout — target page took too long to respond

Credit Usage

1 credit per standard scrape request. 2 credits when render_js: true (headless browser rendering). Proxy routing does not consume additional credits.

Python Example

import requests

# Scrape a JavaScript-heavy SPA
resp = requests.post(
    "https://api.mantisapi.com/v1/scrape",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    json={
        "url": "https://spa-example.com/dashboard",
        "render_js": True,
        "wait_for": "#data-table",
        "wait_ms": 3000
    }
)

data = resp.json()
print(f"Title: {data['metadata']['title']}")
print(f"Text length: {len(data['text'])} chars")
print(f"Credits remaining: {data['credits_remaining']}")