tamirrental/laravel-text-extractor

A Laravel package for extracting structured data from documents via OCR APIs. Ships with Koncile AI provider.

Maintainers

Package info

github.com/TamirRental/laravel-text-extractor

pkg:composer/tamirrental/laravel-text-extractor

Statistics

Installs: 249

Dependents: 0

Suggesters: 0

Stars: 1

1.0.0 2026-03-05 05:49 UTC

This package is auto-updated.

Last update: 2026-03-05 05:56:29 UTC


README

Tests Type Coverage PHP Laravel

A Laravel package for extracting structured data from documents (images, PDFs) via OCR APIs. Ships with a Koncile AI provider out of the box.

Features

  • Extract structured data from documents using OCR providers
  • Fluent API — chainable metadata(), force(), and submit() methods
  • Async processing via Laravel queues
  • Pluggable provider architecture — bring your own OCR provider
  • Facade for clean, expressive syntax
  • Built-in model scopes for querying extractions

Requirements

  • PHP 8.4+
  • Laravel 11 or 12

Installation

composer require tamirrental/laravel-text-extractor

Run the install command to publish the config file and migration:

php artisan document-extraction:install

Then run the migration:

php artisan migrate

Configuration

config/document-extraction.php

Provider connection settings.

return [
    'default' => env('EXTRACTION_PROVIDER', 'koncile_ai'),

    'providers' => [
        'koncile_ai' => [
            'url' => env('KONCILE_AI_API_URL', 'https://api.koncile.ai'),
            'key' => env('KONCILE_AI_API_KEY'),
            'webhook_secret' => env('KONCILE_AI_WEBHOOK_SECRET'),
        ],
    ],
];

Environment Variables

Add these to your .env file:

KONCILE_AI_API_KEY=your-api-key
KONCILE_AI_WEBHOOK_SECRET=your-webhook-secret

Usage

Basic Usage with Facade

use TamirRental\DocumentExtraction\Facades\DocumentExtraction;

// Store the uploaded file
$path = $file->store('documents/car-licenses', 's3');

// Extract — creates a record and dispatches async processing
$extraction = DocumentExtraction::extract('car_license', $path)
    ->metadata([
        'template_id' => 'your-koncile-template-id',
        'folder_id' => 'optional-folder-id',         // optional
        'identifier_field' => 'license_number',       // optional — used to resolve identifier from extracted data
    ])
    ->submit();

The package automatically dispatches a queued job to download the file from storage, upload it to the OCR provider, and track the result.

Metadata

The metadata() method accepts a key-value array that gets stored on the extraction record and passed to the provider. This is how you supply provider-specific data without any config files.

Key Required Description
template_id Yes (Koncile AI) The OCR template ID on the provider side
folder_id No Optional folder/organization ID on the provider side
identifier_field No The field name from extracted data to use as a unique identifier (e.g. license_number)

Force Re-extraction

If an extraction already exists for a file, chain force() to create a new one:

$extraction = DocumentExtraction::extract('car_license', $path)
    ->metadata(['template_id' => 'your-template-id'])
    ->force()
    ->submit();

Conditional Force

Using the Conditionable trait, you can conditionally chain methods:

$extraction = DocumentExtraction::extract('car_license', $path)
    ->metadata(['template_id' => 'your-template-id'])
    ->when($shouldForce, fn ($pending) => $pending->force())
    ->submit();

Checking Extraction Status

use TamirRental\DocumentExtraction\Enums\DocumentExtractionStatusEnum;
use TamirRental\DocumentExtraction\Models\DocumentExtraction;

$extraction = DocumentExtraction::find($id);

if ($extraction->status === DocumentExtractionStatusEnum::Completed) {
    $data = $extraction->extracted_data;
    $identifier = $extraction->identifier; // e.g. "12-345-67"
}

Querying Extractions

The DocumentExtraction model includes useful scopes:

use TamirRental\DocumentExtraction\Models\DocumentExtraction;

// Filter by status
DocumentExtraction::pending()->get();
DocumentExtraction::completed()->get();
DocumentExtraction::failed()->get();

// Filter by type or file
DocumentExtraction::forType('car_license')->get();
DocumentExtraction::forFile('documents/license.png')->get();

// Combine scopes
DocumentExtraction::forType('car_license')->completed()->latest()->first();

How It Works

1. Your App                    2. Queue Worker               3. Provider (Koncile AI)
   │                              │                              │
   ├─ Store file to Storage       │                              │
   ├─ extract()->submit() ───────►│                              │
   │  (auto-dispatches event)     ├─ Download from Storage       │
   │                              ├─ Upload to provider ────────►│
   │                              ├─ Save external_task_id       │
   │                              │                              ├─ OCR Processing...
   │                              │                              │
   │◄─────────────── Provider webhook callback ◄────────────────┤
   ├─ Your controller handles it  │                              │
   ├─ complete() / fail()         │                              │
   │                              │                              │
   ├─ Check status / display      │                              │

Extraction Lifecycle

Stage Status external_task_id extracted_data
Record created pending null {}
Sent to provider pending task-abc-123 {}
Provider succeeds completed task-abc-123 {...provider data}
Provider fails failed task-abc-123 {}

Handling Webhooks

The package does not register webhook routes — you own the entire webhook flow. Create your own controller to receive provider callbacks and use the service to update extractions:

<?php

namespace App\Http\Controllers;

use Illuminate\Http\JsonResponse;
use Illuminate\Http\Request;
use TamirRental\DocumentExtraction\Services\DocumentExtractionService;

class KoncileWebhookController extends Controller
{
    public function handle(Request $request, DocumentExtractionService $service): JsonResponse
    {
        // Validate the webhook (signature verification, etc.)

        $taskId = $request->input('task_id');
        $status = $request->input('status');

        match ($status) {
            'DONE' => $service->complete(
                $taskId,
                (object) $request->all(),
                $request->input('General_fields.license_number.value', ''),
            ),
            'FAILED' => $service->fail($taskId, $request->input('error_message', 'Provider error')),
            default => null,
        };

        return response()->json(['message' => 'Webhook processed']);
    }
}

Then register the route in your application:

// routes/api.php
Route::post('/webhooks/koncile', [KoncileWebhookController::class, 'handle']);

Available Service Methods

Method Description
$service->complete(string $taskId, object $data, string $identifier = '') Mark extraction as completed with extracted data
$service->fail(string $taskId, string $message) Mark extraction as failed with error message

Custom Providers

You can create your own extraction provider by implementing the DocumentExtractionProvider contract:

<?php

namespace App\Services;

use Illuminate\Support\Facades\Storage;
use TamirRental\DocumentExtraction\Contracts\DocumentExtractionProvider;
use TamirRental\DocumentExtraction\Enums\DocumentExtractionStatusEnum;
use TamirRental\DocumentExtraction\Models\DocumentExtraction;

class MyCustomProvider implements DocumentExtractionProvider
{
    /**
     * Process a document extraction request.
     *
     * The provider owns the full workflow: downloading the file,
     * calling the extraction API, and updating the model.
     */
    public function process(DocumentExtraction $extraction): void
    {
        // Your extraction logic here...
        // Download file: Storage::get($extraction->filename)
        // Call your API, then update the model:

        $extraction->update([
            'status' => DocumentExtractionStatusEnum::Completed,
            'extracted_data' => (object) ['field' => 'value'],
            'identifier' => 'parsed-id',
        ]);
    }
}

Then register it in the service provider by extending the package's binding:

// AppServiceProvider.php
use TamirRental\DocumentExtraction\Contracts\DocumentExtractionProvider;

public function register(): void
{
    $this->app->bind(DocumentExtractionProvider::class, MyCustomProvider::class);
}

Events

Event Dispatched When
DocumentExtractionRequested Automatically dispatched when extract()->submit() creates a new extraction

The event is dispatched internally — you don't need to dispatch it yourself. The queued listener downloads the file from storage and uploads it to the provider.

Listen for extraction completion in your app by creating your own listener that watches for model updates.

Testing

The package ships with model factories for testing:

use TamirRental\DocumentExtraction\Models\DocumentExtraction;

// Default (pending, no task ID)
$extraction = DocumentExtraction::factory()->create();

// With external task ID
$extraction = DocumentExtraction::factory()->pending()->create();

// Completed with data
$extraction = DocumentExtraction::factory()->completed()->create();

// Failed with error
$extraction = DocumentExtraction::factory()->failed()->create();

// With metadata
$extraction = DocumentExtraction::factory()->create([
    'metadata' => [
        'template_id' => 'your-template-id',
        'identifier_field' => 'license_number',
    ],
]);

License

MIT