rkdhatterwal/decodo-scraper

A Laravel package for the Decodo Web Scraping API — real-time, async, webhooks, DB tracking, caching, and testing helpers

Maintainers

Package info

github.com/rkdhatterwal/decodo-scraper

pkg:composer/rkdhatterwal/decodo-scraper

Statistics

Installs: 33

Dependents: 0

Suggesters: 0

Stars: 0

Open Issues: 0

v1.1.3 2026-05-30 17:03 UTC

This package is auto-updated.

Last update: 2026-06-02 07:50:23 UTC


README

A clean, well-tested Laravel wrapper for the Decodo Web Scraping API — supports both real-time (v2) and async/batch (v3) scraping.

Requirements

  • PHP 8.1+
  • Laravel 10, 11, 12, or 13

Installation

composer require rkdhatterwal/decodo-scraper

Publish the config and migrations:

php artisan vendor:publish --tag=decodo-config
php artisan vendor:publish --tag=decodo-migrations
php artisan migrate

Add credentials to .env:

DECODO_TOKEN=your_basic_auth_token_here
DECODO_TIMEOUT=120

Your token is in the Decodo dashboard under Scraping APIs → Username / Token.

Real-time Scraping (v2)

Use the Decodo facade or inject DecodoClient.

use Rkdhatterwal\DecodoScraper\Facades\Decodo;

// Simple scrape → ScrapeResult
$result = Decodo::scrape('https://example.com');
echo $result->content;     // raw HTML
echo $result->statusCode;  // upstream HTTP status

// JavaScript rendering
$result = Decodo::scrapeWithJs('https://example.com');

// Screenshot (PNG)
$result = Decodo::screenshot('https://example.com');

// Geo-targeted
$result = Decodo::scrapeFromGeo('https://example.com', 'United States');

// Return Markdown (great for LLM pipelines)
$result = Decodo::scrapeAsMarkdown('https://example.com');

// Structured data via a target template's parser
$result = Decodo::scrapeWithParser('amazon_pricing', 'https://amazon.com/dp/B0BS1QCF');

// Scrape multiple URLs at once
$results = Decodo::scrapeMany(['https://example.com', 'https://another.com']);

Full control with PayloadBuilder

use Rkdhatterwal\DecodoScraper\PayloadBuilder;
use Rkdhatterwal\DecodoScraper\Facades\Decodo;

$results = Decodo::send(
    (new PayloadBuilder())
        ->url('https://example.com')
        ->headless('html')
        ->geo('Germany')
        ->locale('de-DE')
        ->deviceType('mobile')
        ->proxyPool('premium')
        ->markdown()
        ->successfulStatusCodes([200, 301])
);

Database Tracking

When database.enabled is true (default), the package automatically tracks every async task and batch in your database using the decodo_tasks and decodo_batches tables.

Models

  • Rkdhatterwal\DecodoScraper\Models\DecodoTask
  • Rkdhatterwal\DecodoScraper\Models\DecodoBatch

You can associate a task with one of your own models (e.g., a Product or Lead) by passing it to queueTask:

$product = Product::find(1);
DecodoAsync::queueTask('https://example.com', scrapeable: $product);

// Later retrieve it
$task = $product->decodoTasks()->latest()->first();

Async Scraping (v3)

Use the DecodoAsync facade or inject AsyncDecodoClient.

Queue a single task

use Rkdhatterwal\DecodoScraper\Facades\DecodoAsync;

// Queue and get a task ID immediately
$task = DecodoAsync::queueTask('https://example.com');
echo $task->id;      // "7434928397127555073"
echo $task->status;  // "pending"

// With a webhook callback
$task = DecodoAsync::queueTask(
    url:         'https://example.com',
    options:     ['headless' => 'html', 'geo' => 'United States'],
    callbackUrl: 'https://my.app/webhook/decodo',
    passthrough: 'my-verification-secret',   // echoed back for auth
);

Queue with PayloadBuilder

You can also use the PayloadBuilder with async tasks for a more fluent experience:

use Rkdhatterwal\DecodoScraper\PayloadBuilder;
use Rkdhatterwal\DecodoScraper\Facades\DecodoAsync;

$task = DecodoAsync::queueTaskWithBuilder(
    (new PayloadBuilder())
        ->url('https://example.com')
        ->headless('html')
        ->geo('United States')
        ->markdown()
);

Queue a batch

Decodo enforces a 1-request-per-second rate limit on batch submissions. The package handles this for you automatically.

$batch = DecodoAsync::queueBatch(
    urls:        ['https://site1.com', 'https://site2.com', 'https://site3.com'],
    options:     ['geo' => 'United States'],
    callbackUrl: 'https://my.app/webhook/decodo-batch',
    batchName:   'Weekly SEO Audit', // Optional name for easier tracking
);

$batch->id;      // Internal batch ID (v3)
$batch->tasks;   // Collection of TaskResponse DTOs
$batch->ids();   // Collection of task IDs
$batch->count(); // 3

Check status & retrieve results

// Poll status manually
$status = DecodoAsync::getTaskStatus($task->id);
$status->isPending();  // true / false
$status->isDone();     // true / false
$status->isFaulted();  // true / false

// Retrieve results once done (valid for 24 hours)
$results = DecodoAsync::getTaskResults($task->id);  // Collection<ScrapeResult>
$result  = DecodoAsync::getFirstTaskResult($task->id);  // ScrapeResult

// Convenience: poll and block until done (for scripts/queues)
$results = DecodoAsync::pollUntilDone($task->id, intervalMs: 2000, maxAttempts: 30);

v3 Batch Format

Decodo v3 supports passing an array of URLs in a single request. The package handles this via queueBatch or by passing URLs to buildBatch() in the PayloadBuilder:

$payload = (new PayloadBuilder())
    ->geo('United States')
    ->buildBatch(['https://site1.com', 'https://site2.com']);

Webhooks

The package includes a built-in webhook handler that automatically updates your local database records when Decodo tasks complete.

Setup

  1. Ensure webhook.enabled is true in your config.
  2. Configure a DECODO_WEBHOOK_SECRET in your .env to enable passthrough verification.
  3. Exempt the webhook path from CSRF protection in app/Http/Middleware/VerifyCsrfToken.php (Laravel 10) or your bootstrap/app.php (Laravel 11+):
    'decodo/webhook/*'

Security

The built-in webhook handler uses the VerifyDecodoWebhook middleware to ensure callbacks are authentic. It compares the passthrough value from the request against your configured secret using hash_equals.

Automatic Injection

When webhook.auto_inject_callback is enabled, the package will automatically append the correct callback_url to every async request. You don't need to pass it manually unless you want to override it.

Result Caching

To avoid redundant API calls and save credits, enable the DecodoResultCache. It caches results for "done" tasks (which are immutable) for up to 23 hours.

'cache' => [
    'enabled' => true,
    'ttl' => 82800,
],

Logging

The package includes a DecodoLogger that routes all package-specific activities to a Laravel log channel. By default, it uses your application's default channel.

To isolate Decodo logs, set the channel in config/decodo.php:

'logging' => [
    'channel' => 'decodo',
],

And add the channel to your config/logging.php:

'decodo' => [
    'driver' => 'daily',
    'path'   => storage_path('logs/decodo.log'),
    'level'  => 'debug',
    'days'   => 14,
],

Events

You can listen for the following events to trigger your own logic:

  • Rkdhatterwal\DecodoScraper\Events\DecodoTaskCompleted
  • Rkdhatterwal\DecodoScraper\Events\DecodoTaskFaulted
  • Rkdhatterwal\DecodoScraper\Events\DecodoTaskExpired
  • Rkdhatterwal\DecodoScraper\Events\DecodoBatchCompleted
// Example: notify when a batch finishes
Event::listen(DecodoBatchCompleted::class, function ($event) {
    Log::info("Batch {$event->batch->id} is done!");
});

Artisan Commands

  • php artisan decodo:status {taskId} — Check the status of a specific task.
  • php artisan decodo:retry {taskId} — Retry a faulted task.
  • php artisan decodo:prune — Clean up old database records.

Pruning Configuration

Configure retention periods in config/decodo.php:

'pruning' => [
    'content_days'       => 7,  // Keep raw HTML for 7 days
    'tasks_days'         => 30, // Keep task records for 30 days
    'pending_tasks_days' => 3,  // Delete abandoned tasks after 3 days
    'batches_days'       => 60, // Keep batch records for 60 days
    'schedule_enabled'   => true,
    'schedule_frequency' => 'daily', // Laravel scheduler frequency
],

DTOs & Public Properties

All DTOs in this package use PHP 8.1+ public readonly properties for a better developer experience. You can access properties directly: echo $result->content;.

ScrapeResult DTO

Property Type Description
content string Raw HTML, Markdown, or parsed JSON
statusCode int HTTP status of the upstream page
url string The URL that was scraped
taskId string Decodo task ID
createdAt string Task creation timestamp
updatedAt string Task completion timestamp
$result->isSuccessful(); // true when statusCode is 2xx
$result->toArray();

TaskResponse DTO

Returned by queueTask() and getTaskStatus().

Property Type Description
id string Decodo task ID
status string pending, done, or faulted
url string Target URL
target ?string Scraper template target
geo ?string Geographical location
domain string TLD used
deviceType string desktop, mobile, or tablet
httpMethod string get or post
createdAt string Creation timestamp
updatedAt string Last update timestamp
$task->id;          // Task ID for later retrieval
$task->status;      // "pending" | "done" | "faulted"
$task->isPending(); // bool
$task->isDone();    // bool
$task->isFaulted(); // bool
$task->toArray();

BatchTaskResponse DTO

Returned by queueBatch().

$batch->id;         // Batch ID (v3)
$batch->tasks;      // Collection of TaskResponse DTOs
$batch->ids();       // Collection of task IDs
$batch->count();    // Total task count
$batch->toArray();

All Payload Parameters

See the Decodo parameters docs.

Method on PayloadBuilder API Parameter Default
->url($url) url required
->query($q) query
->target($t) target null
->proxyPool('standard') proxy_pool premium
->headless('html'/'png') headless null
->geo('United States') geo auto
->domain('co.uk') domain com
->locale('en-GB') locale matched
->headers([...]) headers null
->forceHeaders() force_headers false
->cookies([...]) cookies null
->forceCookies() force_cookies false
->deviceType('mobile') device_type desktop
->parse() parse false
->sessionId('1234') session_id null
->httpMethod('post') http_method get
->payload($body) payload (base64) null
->successfulStatusCodes([]) successful_status_codes null
->markdown() markdown false
->xhr() xhr false
->callbackUrl($url) callback_url null
->passthrough($val) passthrough null

Testing

The package provides a powerful DecodoFake helper to mock API responses and assert that requests were sent.

use Rkdhatterwal\DecodoScraper\Testing\DecodoFake;

$fake = DecodoFake::make()->swap();

// Stub a response
$fake->fakeScrape('<html>Hello World</html>');

// Act
$result = Decodo::scrape('https://example.com');

// Assert
$fake->assertScraped('https://example.com');
$this->assertEquals('<html>Hello World</html>', $result->content);

For async tasks:

$fake->fakeTask('task-123');
DecodoAsync::queueTask('https://example.com');

$fake->assertTaskQueued('https://example.com');

For batches:

$fake->fakeBatch(['task-1', 'task-2']);
DecodoAsync::queueBatch(['https://a.com', 'https://b.com']);

$fake->assertBatchQueued(2); // asserts batch with 2 URLs was queued

Other Assertions & Helpers

$fake->assertNotScraped('https://example.com');
$fake->assertScrapeCount(5);
$fake->assertTaskNotQueued('https://example.com');
$fake->assertTaskQueuedCount(3);
$fake->assertBatchQueuedCount(1);
$fake->assertNothingSent();

// Access recorded calls directly
$scrapes = $fake->recordedScrapes();
$tasks   = $fake->recordedTasks();

Changelog

See CHANGELOG.md.

License

MIT