rkdhatterwal / decodo-scraper
A Laravel package for the Decodo Web Scraping API — real-time, async, webhooks, DB tracking, caching, and testing helpers
Requires
- php: ^8.1
- illuminate/cache: ^10.0|^11.0|^12.0|^13.0
- illuminate/console: ^10.0|^11.0|^12.0|^13.0
- illuminate/database: ^10.0|^11.0|^12.0|^13.0
- illuminate/events: ^10.0|^11.0|^12.0|^13.0
- illuminate/http: ^10.0|^11.0|^12.0|^13.0
- illuminate/routing: ^10.0|^11.0|^12.0|^13.0
- illuminate/support: ^10.0|^11.0|^12.0|^13.0
Requires (Dev)
- orchestra/testbench: ^8.0|^9.0|^10.0
- pestphp/pest: ^2.0|^3.0|^4.0
- pestphp/pest-plugin-laravel: ^2.0|^3.0|^4.0
- phpunit/phpunit: ^10.0|^11.0|^12.0
README
A clean, well-tested Laravel wrapper for the Decodo Web Scraping API — supports both real-time (v2) and async/batch (v3) scraping.
Requirements
- PHP 8.1+
- Laravel 10, 11, 12, or 13
Installation
composer require rkdhatterwal/decodo-scraper
Publish the config and migrations:
php artisan vendor:publish --tag=decodo-config php artisan vendor:publish --tag=decodo-migrations php artisan migrate
Add credentials to .env:
DECODO_TOKEN=your_basic_auth_token_here DECODO_TIMEOUT=120
Your token is in the Decodo dashboard under Scraping APIs → Username / Token.
Real-time Scraping (v2)
Use the Decodo facade or inject DecodoClient.
use Rkdhatterwal\DecodoScraper\Facades\Decodo; // Simple scrape → ScrapeResult $result = Decodo::scrape('https://example.com'); echo $result->content; // raw HTML echo $result->statusCode; // upstream HTTP status // JavaScript rendering $result = Decodo::scrapeWithJs('https://example.com'); // Screenshot (PNG) $result = Decodo::screenshot('https://example.com'); // Geo-targeted $result = Decodo::scrapeFromGeo('https://example.com', 'United States'); // Return Markdown (great for LLM pipelines) $result = Decodo::scrapeAsMarkdown('https://example.com'); // Structured data via a target template's parser $result = Decodo::scrapeWithParser('amazon_pricing', 'https://amazon.com/dp/B0BS1QCF'); // Scrape multiple URLs at once $results = Decodo::scrapeMany(['https://example.com', 'https://another.com']);
Full control with PayloadBuilder
use Rkdhatterwal\DecodoScraper\PayloadBuilder; use Rkdhatterwal\DecodoScraper\Facades\Decodo; $results = Decodo::send( (new PayloadBuilder()) ->url('https://example.com') ->headless('html') ->geo('Germany') ->locale('de-DE') ->deviceType('mobile') ->proxyPool('premium') ->markdown() ->successfulStatusCodes([200, 301]) );
Database Tracking
When database.enabled is true (default), the package automatically tracks every async task and batch in your database using the decodo_tasks and decodo_batches tables.
Models
Rkdhatterwal\DecodoScraper\Models\DecodoTaskRkdhatterwal\DecodoScraper\Models\DecodoBatch
You can associate a task with one of your own models (e.g., a Product or Lead) by passing it to queueTask:
$product = Product::find(1); DecodoAsync::queueTask('https://example.com', scrapeable: $product); // Later retrieve it $task = $product->decodoTasks()->latest()->first();
Async Scraping (v3)
Use the DecodoAsync facade or inject AsyncDecodoClient.
Queue a single task
use Rkdhatterwal\DecodoScraper\Facades\DecodoAsync; // Queue and get a task ID immediately $task = DecodoAsync::queueTask('https://example.com'); echo $task->id; // "7434928397127555073" echo $task->status; // "pending" // With a webhook callback $task = DecodoAsync::queueTask( url: 'https://example.com', options: ['headless' => 'html', 'geo' => 'United States'], callbackUrl: 'https://my.app/webhook/decodo', passthrough: 'my-verification-secret', // echoed back for auth );
Queue with PayloadBuilder
You can also use the PayloadBuilder with async tasks for a more fluent experience:
use Rkdhatterwal\DecodoScraper\PayloadBuilder; use Rkdhatterwal\DecodoScraper\Facades\DecodoAsync; $task = DecodoAsync::queueTaskWithBuilder( (new PayloadBuilder()) ->url('https://example.com') ->headless('html') ->geo('United States') ->markdown() );
Queue a batch
Decodo enforces a 1-request-per-second rate limit on batch submissions. The package handles this for you automatically.
$batch = DecodoAsync::queueBatch( urls: ['https://site1.com', 'https://site2.com', 'https://site3.com'], options: ['geo' => 'United States'], callbackUrl: 'https://my.app/webhook/decodo-batch', batchName: 'Weekly SEO Audit', // Optional name for easier tracking ); $batch->id; // Internal batch ID (v3) $batch->tasks; // Collection of TaskResponse DTOs $batch->ids(); // Collection of task IDs $batch->count(); // 3
Check status & retrieve results
// Poll status manually $status = DecodoAsync::getTaskStatus($task->id); $status->isPending(); // true / false $status->isDone(); // true / false $status->isFaulted(); // true / false // Retrieve results once done (valid for 24 hours) $results = DecodoAsync::getTaskResults($task->id); // Collection<ScrapeResult> $result = DecodoAsync::getFirstTaskResult($task->id); // ScrapeResult // Convenience: poll and block until done (for scripts/queues) $results = DecodoAsync::pollUntilDone($task->id, intervalMs: 2000, maxAttempts: 30);
v3 Batch Format
Decodo v3 supports passing an array of URLs in a single request. The package handles this via queueBatch or by passing URLs to buildBatch() in the PayloadBuilder:
$payload = (new PayloadBuilder()) ->geo('United States') ->buildBatch(['https://site1.com', 'https://site2.com']);
Webhooks
The package includes a built-in webhook handler that automatically updates your local database records when Decodo tasks complete.
Setup
- Ensure
webhook.enabledistruein your config. - Configure a
DECODO_WEBHOOK_SECRETin your.envto enable passthrough verification. - Exempt the webhook path from CSRF protection in
app/Http/Middleware/VerifyCsrfToken.php(Laravel 10) or yourbootstrap/app.php(Laravel 11+):'decodo/webhook/*'
Security
The built-in webhook handler uses the VerifyDecodoWebhook middleware to ensure callbacks are authentic. It compares the passthrough value from the request against your configured secret using hash_equals.
Automatic Injection
When webhook.auto_inject_callback is enabled, the package will automatically append the correct callback_url to every async request. You don't need to pass it manually unless you want to override it.
Result Caching
To avoid redundant API calls and save credits, enable the DecodoResultCache. It caches results for "done" tasks (which are immutable) for up to 23 hours.
'cache' => [ 'enabled' => true, 'ttl' => 82800, ],
Logging
The package includes a DecodoLogger that routes all package-specific activities to a Laravel log channel. By default, it uses your application's default channel.
To isolate Decodo logs, set the channel in config/decodo.php:
'logging' => [ 'channel' => 'decodo', ],
And add the channel to your config/logging.php:
'decodo' => [ 'driver' => 'daily', 'path' => storage_path('logs/decodo.log'), 'level' => 'debug', 'days' => 14, ],
Events
You can listen for the following events to trigger your own logic:
Rkdhatterwal\DecodoScraper\Events\DecodoTaskCompletedRkdhatterwal\DecodoScraper\Events\DecodoTaskFaultedRkdhatterwal\DecodoScraper\Events\DecodoTaskExpiredRkdhatterwal\DecodoScraper\Events\DecodoBatchCompleted
// Example: notify when a batch finishes Event::listen(DecodoBatchCompleted::class, function ($event) { Log::info("Batch {$event->batch->id} is done!"); });
Artisan Commands
php artisan decodo:status {taskId}— Check the status of a specific task.php artisan decodo:retry {taskId}— Retry a faulted task.php artisan decodo:prune— Clean up old database records.
Pruning Configuration
Configure retention periods in config/decodo.php:
'pruning' => [ 'content_days' => 7, // Keep raw HTML for 7 days 'tasks_days' => 30, // Keep task records for 30 days 'pending_tasks_days' => 3, // Delete abandoned tasks after 3 days 'batches_days' => 60, // Keep batch records for 60 days 'schedule_enabled' => true, 'schedule_frequency' => 'daily', // Laravel scheduler frequency ],
DTOs & Public Properties
All DTOs in this package use PHP 8.1+ public readonly properties for a better developer experience. You can access properties directly: echo $result->content;.
ScrapeResult DTO
| Property | Type | Description |
|---|---|---|
content |
string |
Raw HTML, Markdown, or parsed JSON |
statusCode |
int |
HTTP status of the upstream page |
url |
string |
The URL that was scraped |
taskId |
string |
Decodo task ID |
createdAt |
string |
Task creation timestamp |
updatedAt |
string |
Task completion timestamp |
$result->isSuccessful(); // true when statusCode is 2xx $result->toArray();
TaskResponse DTO
Returned by queueTask() and getTaskStatus().
| Property | Type | Description |
|---|---|---|
id |
string |
Decodo task ID |
status |
string |
pending, done, or faulted |
url |
string |
Target URL |
target |
?string |
Scraper template target |
geo |
?string |
Geographical location |
domain |
string |
TLD used |
deviceType |
string |
desktop, mobile, or tablet |
httpMethod |
string |
get or post |
createdAt |
string |
Creation timestamp |
updatedAt |
string |
Last update timestamp |
$task->id; // Task ID for later retrieval $task->status; // "pending" | "done" | "faulted" $task->isPending(); // bool $task->isDone(); // bool $task->isFaulted(); // bool $task->toArray();
BatchTaskResponse DTO
Returned by queueBatch().
$batch->id; // Batch ID (v3) $batch->tasks; // Collection of TaskResponse DTOs $batch->ids(); // Collection of task IDs $batch->count(); // Total task count $batch->toArray();
All Payload Parameters
See the Decodo parameters docs.
| Method on PayloadBuilder | API Parameter | Default |
|---|---|---|
->url($url) |
url |
required |
->query($q) |
query |
— |
->target($t) |
target |
null |
->proxyPool('standard') |
proxy_pool |
premium |
->headless('html'/'png') |
headless |
null |
->geo('United States') |
geo |
auto |
->domain('co.uk') |
domain |
com |
->locale('en-GB') |
locale |
matched |
->headers([...]) |
headers |
null |
->forceHeaders() |
force_headers |
false |
->cookies([...]) |
cookies |
null |
->forceCookies() |
force_cookies |
false |
->deviceType('mobile') |
device_type |
desktop |
->parse() |
parse |
false |
->sessionId('1234') |
session_id |
null |
->httpMethod('post') |
http_method |
get |
->payload($body) |
payload (base64) |
null |
->successfulStatusCodes([]) |
successful_status_codes |
null |
->markdown() |
markdown |
false |
->xhr() |
xhr |
false |
->callbackUrl($url) |
callback_url |
null |
->passthrough($val) |
passthrough |
null |
Testing
The package provides a powerful DecodoFake helper to mock API responses and assert that requests were sent.
use Rkdhatterwal\DecodoScraper\Testing\DecodoFake; $fake = DecodoFake::make()->swap(); // Stub a response $fake->fakeScrape('<html>Hello World</html>'); // Act $result = Decodo::scrape('https://example.com'); // Assert $fake->assertScraped('https://example.com'); $this->assertEquals('<html>Hello World</html>', $result->content);
For async tasks:
$fake->fakeTask('task-123'); DecodoAsync::queueTask('https://example.com'); $fake->assertTaskQueued('https://example.com');
For batches:
$fake->fakeBatch(['task-1', 'task-2']); DecodoAsync::queueBatch(['https://a.com', 'https://b.com']); $fake->assertBatchQueued(2); // asserts batch with 2 URLs was queued
Other Assertions & Helpers
$fake->assertNotScraped('https://example.com'); $fake->assertScrapeCount(5); $fake->assertTaskNotQueued('https://example.com'); $fake->assertTaskQueuedCount(3); $fake->assertBatchQueuedCount(1); $fake->assertNothingSent(); // Access recorded calls directly $scrapes = $fake->recordedScrapes(); $tasks = $fake->recordedTasks();
Changelog
See CHANGELOG.md.
License
MIT