What is the best PHP library for web scraping?

The most popular stack is Guzzle (HTTP client) + Symfony DomCrawler (CSS/XPath parsing). For simpler needs, PHP's built-in cURL and DOMDocument work well. For JavaScript-rendered pages, use Symfony Panther or chrome-php/chrome, which control a headless browser.

Can PHP scrape JavaScript-rendered websites?

Yes, using Symfony Panther or chrome-php/chrome to control a headless Chromium browser. Alternatively, use a web scraping API like Mantis that handles JavaScript rendering server-side, so your PHP code only makes simple HTTP requests.

How do I parse HTML in PHP?

Use Symfony DomCrawler for CSS selector and XPath-based parsing, or PHP's built-in DOMDocument class. DomCrawler provides a jQuery-like API that's intuitive for web developers. For simple extraction, regular expressions work but are fragile and not recommended for complex HTML.

Is PHP slower than Python for web scraping?

PHP and Python have similar performance for web scraping since both are interpreted languages and scraping is I/O-bound. PHP 8.x with JIT compilation can actually be faster for HTML parsing. The main difference is ecosystem — Python has more scraping-specific tools, while PHP has better integration with web applications.

Should I use PHP or Python for web scraping?

Use PHP if you're already working in a PHP codebase (Laravel, WordPress, Symfony) or building scraping into a PHP application. Use Python if you're starting fresh and want the largest ecosystem of scraping tools. For production workloads, consider a web scraping API like Mantis that works with any language.

Web Scraping with PHP in 2026: The Complete Guide

Q: Is PHP good for web scraping?

Yes. PHP has mature tools for web scraping including cURL (built-in), Guzzle (HTTP client), and Symfony DomCrawler (HTML parsing). PHP powers over 75% of websites, so many developers already know it. For scraping tasks integrated into existing PHP applications (Laravel, WordPress plugins, Symfony apps), PHP is an excellent choice.

Published March 28, 2026 · 12 min read

Why PHP for Web Scraping?
The PHP Scraping Tool Stack
Quick Start: cURL + DOMDocument
Guzzle: The Modern HTTP Client
Symfony DomCrawler: jQuery-Like Parsing
Goutte: All-in-One Scraping
Scraping JavaScript-Rendered Pages
Concurrent Scraping with Guzzle Promises
Anti-Blocking Techniques
Web Scraping in Laravel
When to Use a Web Scraping API
PHP vs Python vs Node.js
FAQ

Why PHP for Web Scraping?

PHP powers over 75% of all websites with a known server-side language. If you're working in a PHP codebase — Laravel, Symfony, WordPress, or any custom application — web scraping in PHP means no context switching, no polyglot complexity, and direct integration with your existing stack.

PHP's web scraping strengths:

Built-in tools — cURL and DOMDocument ship with every PHP installation
Mature HTTP ecosystem — Guzzle is one of the most battle-tested HTTP clients in any language
Symfony components — DomCrawler, BrowserKit, and Panther provide enterprise-grade scraping tools
PHP 8.x performance — JIT compilation, fibers, and named arguments make modern PHP fast and expressive
Deployment everywhere — PHP runs on virtually any hosting environment

The PHP Scraping Tool Stack

Tool	Purpose	Best For
cURL (built-in)	HTTP requests	Simple scripts, no dependencies
Guzzle	HTTP client	Production scraping, async requests
DOMDocument (built-in)	HTML/XML parsing	Basic parsing without Composer
Symfony DomCrawler	CSS/XPath parsing	Complex extraction, jQuery-like API
Goutte	HTTP + parsing combined	Quick scraping projects
Symfony Panther	Headless browser	JavaScript-rendered pages
chrome-php/chrome	Chrome DevTools Protocol	Full browser control

Quick Start: cURL + DOMDocument

No Composer required. This works with any PHP installation:

<?php
// Fetch the page
$ch = curl_init();
curl_setopt_array($ch, [
    CURLOPT_URL            => 'https://example.com/products',
    CURLOPT_RETURNTRANSFER => true,
    CURLOPT_FOLLOWLOCATION => true,
    CURLOPT_HTTPHEADER     => [
        'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) Chrome/120.0.0.0',
        'Accept: text/html,application/xhtml+xml',
    ],
]);
$html = curl_exec($ch);
$httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);

if ($httpCode !== 200) {
    die("Request failed with status: $httpCode");
}

// Parse with DOMDocument
libxml_use_internal_errors(true);
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);

// Extract all product titles
$titles = $xpath->query('//h2[@class="product-title"]');
foreach ($titles as $title) {
    echo $title->textContent . "\n";
}

This is PHP scraping at its simplest — zero dependencies, runs anywhere. For anything beyond simple scripts, use Guzzle and DomCrawler.

Guzzle: The Modern HTTP Client

Guzzle is the standard HTTP client for PHP. It handles cookies, redirects, retries, and concurrent requests out of the box:

<?php
require 'vendor/autoload.php';

use GuzzleHttp\Client;
use GuzzleHttp\RequestOptions;

$client = new Client([
    'base_uri' => 'https://example.com',
    'timeout'  => 30,
    'headers'  => [
        'User-Agent' => 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) Chrome/120.0.0.0',
        'Accept'     => 'text/html,application/xhtml+xml',
    ],
    'cookies' => true,  // Enable cookie jar
]);

// Simple GET request
$response = $client->get('/products');
$html = (string) $response->getBody();

// POST with form data (login, search)
$response = $client->post('/search', [
    RequestOptions::FORM_PARAMS => [
        'query' => 'web scraping',
        'page'  => 1,
    ],
]);

// Handle JSON APIs
$response = $client->get('/api/products', [
    RequestOptions::QUERY => ['category' => 'electronics'],
]);
$data = json_decode((string) $response->getBody(), true);

Retry with Exponential Backoff

<?php
use GuzzleHttp\Client;
use GuzzleHttp\HandlerStack;
use GuzzleHttp\Middleware;
use GuzzleHttp\Psr7\Request;
use GuzzleHttp\Psr7\Response;
use GuzzleHttp\Exception\ConnectException;

$stack = HandlerStack::create();
$stack->push(Middleware::retry(
    function (int $retries, Request $request, ?Response $response, ?\Throwable $e) {
        if ($retries >= 3) return false;
        if ($e instanceof ConnectException) return true;
        if ($response && $response->getStatusCode() >= 500) return true;
        if ($response && $response->getStatusCode() === 429) return true;
        return false;
    },
    function (int $retries) {
        return 1000 * pow(2, $retries); // 1s, 2s, 4s
    }
));

$client = new Client(['handler' => $stack]);

Symfony DomCrawler: jQuery-Like Parsing

DomCrawler gives you CSS selectors and XPath in a clean, chainable API:

<?php
require 'vendor/autoload.php';
// composer require symfony/dom-crawler symfony/css-selector

use Symfony\Component\DomCrawler\Crawler;

$html = file_get_contents('https://example.com/products');
$crawler = new Crawler($html);

// CSS selectors (like jQuery)
$products = $crawler->filter('.product-card')->each(function (Crawler $node) {
    return [
        'title' => $node->filter('h2.title')->text(''),
        'price' => $node->filter('.price')->text(''),
        'link'  => $node->filter('a')->attr('href'),
        'image' => $node->filter('img')->attr('src'),
    ];
});

// XPath for complex queries
$reviews = $crawler->filterXPath('//div[@data-rating > 4]')->each(function (Crawler $node) {
    return $node->text();
});

// Extract table data
$rows = $crawler->filter('table.data tbody tr')->each(function (Crawler $row) {
    $cells = $row->filter('td')->each(fn(Crawler $cell) => trim($cell->text()));
    return $cells;
});

print_r($products);

Goutte: All-in-One Scraping

Goutte combines Guzzle and DomCrawler into a single package with built-in link clicking and form submission:

<?php
// composer require fabpot/goutte
use Goutte\Client;

$client = new Client();

// Navigate and scrape
$crawler = $client->request('GET', 'https://example.com/products');

// Click links (follows the link, returns new page)
$detailPage = $client->click($crawler->filter('a.product-link')->link());

// Submit forms
$crawler = $client->request('GET', 'https://example.com/login');
$form = $crawler->filter('form#login')->form([
    'username' => 'user@example.com',
    'password' => 'secret',
]);
$client->submit($form);

// Now scrape authenticated pages
$dashboard = $client->request('GET', 'https://example.com/dashboard');
$data = $dashboard->filter('.metric-value')->each(fn($node) => $node->text());

Scraping JavaScript-Rendered Pages

For SPAs and dynamic content, use Symfony Panther or chrome-php:

Symfony Panther (Headless Chrome)

<?php
// composer require symfony/panther
use Symfony\Component\Panther\Client;

$client = Client::createChromeClient();

$crawler = $client->request('GET', 'https://example.com/spa');

// Wait for JavaScript to render
$client->waitFor('.dynamic-content', 10); // Wait up to 10 seconds

// Now extract the rendered content
$items = $crawler->filter('.dynamic-content .item')->each(function ($node) {
    return [
        'title' => $node->filter('.title')->text(''),
        'data'  => $node->filter('.data')->text(''),
    ];
});

// Take a screenshot
$client->takeScreenshot('page.png');

$client->quit();

chrome-php (Low-Level Chrome Control)

<?php
// composer require chrome-php/chrome
use HeadlessChromium\BrowserFactory;

$browserFactory = new BrowserFactory();
$browser = $browserFactory->createBrowser([
    'headless'      => true,
    'windowSize'    => [1920, 1080],
    'noSandbox'     => true,
]);

$page = $browser->createPage();
$page->navigate('https://example.com/app')->waitForNavigation();

// Execute JavaScript
$result = $page->evaluate('document.title')->getReturnValue();

// Get full HTML after JS rendering
$html = $page->getHtml();

// Screenshot
$page->screenshot()->saveToFile('screenshot.png');

$browser->close();

💡 Tip: Headless browsers are resource-intensive. For most scraping tasks, a web scraping API like Mantis handles JavaScript rendering server-side — your PHP code just makes simple HTTP requests.

Concurrent Scraping with Guzzle Promises

Scrape multiple pages simultaneously using Guzzle's async capabilities:

<?php
use GuzzleHttp\Client;
use GuzzleHttp\Pool;
use GuzzleHttp\Psr7\Request;
use Symfony\Component\DomCrawler\Crawler;

$client = new Client(['timeout' => 30]);
$results = [];

// Generate requests for 50 pages
$requests = function () {
    for ($page = 1; $page <= 50; $page++) {
        yield new Request('GET', "https://example.com/products?page=$page");
    }
};

$pool = new Pool($client, $requests(), [
    'concurrency' => 5,  // 5 concurrent requests
    'fulfilled' => function ($response, $index) use (&$results) {
        $crawler = new Crawler((string) $response->getBody());
        $products = $crawler->filter('.product')->each(function ($node) {
            return [
                'title' => $node->filter('.title')->text(''),
                'price' => $node->filter('.price')->text(''),
            ];
        });
        $results = array_merge($results, $products);
        echo "Page " . ($index + 1) . ": " . count($products) . " products\n";
    },
    'rejected' => function ($reason, $index) {
        echo "Page " . ($index + 1) . " failed: " . $reason->getMessage() . "\n";
    },
]);

$pool->promise()->wait();
echo "Total products: " . count($results) . "\n";

// Export to CSV
$fp = fopen('products.csv', 'w');
fputcsv($fp, ['Title', 'Price']);
foreach ($results as $product) {
    fputcsv($fp, $product);
}
fclose($fp);

Anti-Blocking Techniques

Avoid getting blocked when scraping at scale:

User-Agent Rotation

<?php
$userAgents = [
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) Chrome/120.0.0.0 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 14_0) Chrome/120.0.0.0 Safari/537.36',
    'Mozilla/5.0 (X11; Linux x86_64) Firefox/121.0',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:121.0) Gecko/20100101 Firefox/121.0',
];

$client = new Client([
    'headers' => [
        'User-Agent' => $userAgents[array_rand($userAgents)],
        'Accept'     => 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
        'Accept-Language' => 'en-US,en;q=0.9',
    ],
]);

Proxy Rotation

<?php
$proxies = [
    'http://user:pass@proxy1.example.com:8080',
    'http://user:pass@proxy2.example.com:8080',
    'http://user:pass@proxy3.example.com:8080',
];

$client = new Client([
    'proxy' => $proxies[array_rand($proxies)],
]);

Rate Limiting

<?php
function scrapePage(Client $client, string $url): string
{
    static $lastRequest = 0;

    // Minimum 2 seconds between requests
    $elapsed = microtime(true) - $lastRequest;
    if ($elapsed < 2.0) {
        usleep((int)((2.0 - $elapsed) * 1_000_000));
    }

    $response = $client->get($url);
    $lastRequest = microtime(true);

    return (string) $response->getBody();
}

For production scraping where anti-blocking matters, a managed API handles all of this for you. See the complete anti-blocking guide.

Web Scraping in Laravel

Laravel makes scraping clean with its HTTP client (Guzzle wrapper) and job system:

<?php
// app/Jobs/ScrapeProducts.php
namespace App\Jobs;

use Illuminate\Bus\Queueable;
use Illuminate\Contracts\Queue\ShouldQueue;
use Illuminate\Support\Facades\Http;
use Symfony\Component\DomCrawler\Crawler;
use App\Models\Product;

class ScrapeProducts implements ShouldQueue
{
    use Queueable;

    public function __construct(private string $url) {}

    public function handle(): void
    {
        $response = Http::withHeaders([
            'User-Agent' => 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) Chrome/120.0.0.0',
        ])->retry(3, 1000)->get($this->url);

        $crawler = new Crawler($response->body());

        $crawler->filter('.product-card')->each(function (Crawler $node) {
            Product::updateOrCreate(
                ['sku' => $node->filter('.sku')->text('')],
                [
                    'title' => $node->filter('.title')->text(''),
                    'price' => (float) str_replace('$', '', $node->filter('.price')->text('0')),
                ]
            );
        });
    }
}

// Dispatch from a controller or command
// ScrapeProducts::dispatch('https://example.com/products');

When to Use a Web Scraping API

Building a production scraper in PHP means maintaining:

Proxy rotation infrastructure
CAPTCHA solving
JavaScript rendering (headless browsers)
Anti-bot detection evasion
Retry logic and error handling
Session and cookie management

A web scraping API like Mantis handles all of this with a single HTTP call:

<?php
// Using Mantis API — one line replaces hundreds of lines of scraping code
$response = Http::withHeaders([
    'Authorization' => 'Bearer YOUR_API_KEY',
])->post('https://api.mantisapi.com/v1/scrape', [
    'url' => 'https://example.com/products',
    'render_js' => true,
    'extract' => [
        'products' => [
            'selector' => '.product-card',
            'fields' => [
                'title' => '.title',
                'price' => '.price',
            ],
        ],
    ],
]);

$products = $response->json('data.products');

Approach	Setup Time	Maintenance	JS Rendering	Anti-Blocking	Cost (10K pages/mo)
DIY PHP + Guzzle	Days	Ongoing	Need headless browser	Build yourself	$50-200 (proxies + servers)
Mantis API	Minutes	Zero	Built-in	Built-in	$29/mo

Skip the Infrastructure. Start Scraping.

Mantis handles proxies, JavaScript rendering, and anti-blocking — so you can focus on your data.

View Pricing Try Mantis Free

PHP vs Python vs Node.js for Web Scraping

Feature	PHP	Python	Node.js
HTTP Client	Guzzle / cURL	Requests / httpx	Axios / node-fetch
HTML Parser	DomCrawler / DOMDocument	BeautifulSoup / lxml	Cheerio
Headless Browser	Panther / chrome-php	Playwright / Selenium	Puppeteer / Playwright
Async Support	Guzzle Promises / Fibers	asyncio / httpx	Native async/await
Concurrency	Pool (5-10 concurrent)	asyncio.gather	Promise.all
Web Framework Integration	⭐⭐⭐ Laravel/Symfony	⭐⭐ Django/Flask	⭐⭐ Express
Scraping Ecosystem	⭐⭐ Good	⭐⭐⭐ Best	⭐⭐ Good
Learning Curve	Low (web devs)	Low	Low (JS devs)
Best For	PHP codebases, WordPress	Standalone scrapers	JS-heavy sites

Bottom line: Use the language your project already uses. If you're in a PHP codebase, scrape with PHP. The tools are mature and battle-tested. For new standalone projects, Python has the largest scraping ecosystem. For any language, a web scraping API eliminates the complexity entirely.

Frequently Asked Questions

Is PHP good for web scraping?

Yes. PHP has mature scraping tools — cURL and DOMDocument are built-in, and libraries like Guzzle and Symfony DomCrawler are production-grade. If you're already working in PHP, there's no need to switch languages for scraping.

What's the best PHP library for web scraping?

Guzzle (HTTP client) + Symfony DomCrawler (HTML parsing) is the most popular production stack. For quick projects, Goutte combines both into a single package. For JavaScript-rendered pages, use Symfony Panther.

Can PHP scrape JavaScript-rendered pages?

Yes — use Symfony Panther or chrome-php/chrome for headless browser control. Or use a web scraping API that handles rendering server-side.

How fast is PHP for web scraping?

With Guzzle's concurrent request pool, PHP can scrape hundreds of pages per minute. PHP 8.x with JIT compilation handles HTML parsing efficiently. The bottleneck is always network I/O, not PHP performance.

Is web scraping legal?

Web scraping publicly available data is generally legal, but always check the website's Terms of Service and robots.txt. Respect rate limits and don't scrape personal data without consent. See our legal compliance guide.

Should I use a web scraping API instead of building my own?

If you need to scrape at scale, handle JavaScript rendering, or avoid blocks — yes. A web scraping API like Mantis handles the infrastructure so you can focus on data extraction. It's especially cost-effective compared to maintaining your own proxy infrastructure.