webscraping-ai / webscraping-ai-php
Official PHP client for the WebScraping.AI API — LLM-powered web scraping with rotating proxies and Chromium JavaScript rendering.
Package info
github.com/webscraping-ai/webscraping-ai-php
pkg:composer/webscraping-ai/webscraping-ai-php
Requires
- php: ^8.2
- ext-json: *
- php-http/discovery: ^1.19
- psr/http-client: ^1.0
- psr/http-factory: ^1.0
- psr/http-message: ^1.1 || ^2.0
Requires (Dev)
- friendsofphp/php-cs-fixer: ^3.60
- guzzlehttp/guzzle: ^7.8
- nyholm/psr7: ^1.8
- php-http/mock-client: ^1.6
- phpstan/phpstan: ^1.11
- phpunit/phpunit: ^11.0
This package is auto-updated.
Last update: 2026-05-12 08:25:45 UTC
README
Official PHP client for the WebScraping.AI API.
The API gives you LLM-powered scraping tools with Chromium JavaScript rendering, rotating proxies, and built-in HTML parsing — full HTML, visible text, selected page areas, AI-extracted fields, and free-form question answering over any URL.
Requirements
- PHP 8.2 or newer
- A PSR-18 HTTP client — Guzzle, Symfony HttpClient, or any other implementation
- A PSR-17 message factory
If you don't already have these installed, the simplest pair is:
composer require guzzlehttp/guzzle nyholm/psr7
php-http/discovery (a transitive dependency) will pick them up automatically.
Installation
composer require webscraping-ai/webscraping-ai-php
Quick start
use WebScrapingAI\Client; $client = new Client(apiKey: getenv('WEBSCRAPING_AI_KEY')); // Full HTML $html = $client->html(url: 'https://example.com'); // Visible text $text = $client->text(url: 'https://example.com'); // HTML for one selector $h1 = $client->selected(url: 'https://example.com', selector: 'h1'); // HTML for multiple selectors (returns array) $chunks = $client->selectedMultiple( url: 'https://example.com', selectors: ['h1', 'p', 'a'], ); // LLM question over a page $answer = $client->question( url: 'https://example.com', question: 'What is the main topic?', ); // LLM-extracted structured fields $fields = $client->fields( url: 'https://example.com', fields: [ 'title' => 'Main product title', 'price' => 'Current price', ], ); // Account quota $account = $client->account();
All optional parameters (headers, timeout, js, js_timeout, wait_for, proxy, country, custom_proxy, device, error_on_404, error_on_redirect, js_script, …) are PHP named arguments. See the API docs for the full parameter reference.
Bring your own HTTP client
By default, php-http/discovery resolves a PSR-18 client at runtime from whatever's installed. To pin a specific client, pass it explicitly:
use GuzzleHttp\Client as Guzzle; use Nyholm\Psr7\Factory\Psr17Factory; use WebScrapingAI\Client; $factory = new Psr17Factory(); $client = new Client( apiKey: getenv('WEBSCRAPING_AI_KEY'), httpClient: new Guzzle(['timeout' => 30.0]), requestFactory: $factory, uriFactory: $factory, );
Configure transport-level timeouts on your HTTP client. The timeout parameter accepted by each endpoint method controls server-side page retrieval timeout, not the HTTP transport.
Errors
The client raises typed exceptions for every documented status code:
| Status | Exception |
|---|---|
| 400 | WebScrapingAI\Exception\BadRequestException |
| 402 | WebScrapingAI\Exception\PaymentRequiredException |
| 403 | WebScrapingAI\Exception\AuthenticationException |
| 429 | WebScrapingAI\Exception\RateLimitException |
| 500 | WebScrapingAI\Exception\ServerException |
| 504 | WebScrapingAI\Exception\GatewayTimeoutException |
All inherit from WebScrapingAI\Exception\ApiException, which exposes $message, $status, $statusCode, $statusMessage, $body, and $responseBody. The latter three are populated when the API surfaces target-page errors as 500s.
Transport-level failures raise WebScrapingAI\Exception\ApiTimeoutException (the PSR-18 client timed out) or WebScrapingAI\Exception\ApiConnectionException (DNS / connection refused / TLS).
All SDK-originated exceptions implement the marker interface WebScrapingAI\Exception\WebScrapingAIException, so a single catch (WebScrapingAIException $e) block catches everything.
Response shapes
The client returns whatever the API returns — it does not normalise or unwrap. A couple of current quirks worth knowing:
fields()returns['result' => [...fields...]](the live API wraps the extracted fields under aresultkey).selectedMultiple()returnsarray<int, array<int, string>>— an outer wrapper containing all matched chunks concatenated.
These are upstream spec/server drifts; the official Ruby and Python clients return the same shapes.
Migration from 3.x
3.x was generated from the OpenAPI spec under the namespace OpenAPI\Client\ and used per-tag classes (AIApi, HTMLApi, etc.). 4.0 is a hand-authored rewrite with a single WebScrapingAI\Client entry point. There are no deprecation shims — pin to ^3.2 if you need the old surface.
Development
composer install composer test # PHPUnit composer lint # php-cs-fixer (dry-run) composer analyse # PHPStan
License
MIT — see LICENSE.