mayaram / laravel-ocr
Laravel OCR & Document Data Extractor - A powerful OCR and document parsing engine for Laravel
Installs: 43
Dependents: 0
Suggesters: 0
Security: 0
Stars: 1
Watchers: 0
Forks: 0
Open Issues: 0
pkg:composer/mayaram/laravel-ocr
Requires
- php: ^8.2
- aws/aws-sdk-php: ^3.369
- guzzlehttp/guzzle: ^7.0
- illuminate/support: ^9.0|^10.0|^11.0|^12.0
- intervention/image: ^3.0
- smalot/pdfparser: ^2.0
- thiagoalessio/tesseract_ocr: ^2.12
Requires (Dev)
- orchestra/testbench: ^8.0|^9.0
- pestphp/pest: ^3.8
- pestphp/pest-plugin-laravel: ^3.2
- phpunit/phpunit: ^10.0|^11.0
Suggests
- aws/aws-sdk-php: Required to use the AWS Textract driver.
- google/cloud-vision: Required to use the Google Vision driver.
- guzzlehttp/guzzle: Required to use the Azure OCR driver.
- laravel/ai: Required for AI-powered cleanup and structured extraction
README
Turn any image or PDF into structured, actionable data.
A powerful, developer-friendly Laravel package that reads text from images and PDFs, understands the content, fixes scanning errors with AI, and delivers clean, structured data directly to your application.
Why this package? Most OCR tools just give you a dump of raw text. This package gives you objects, arrays, and confidence scores. It knows the difference between an Invoice Number and a Phone Number.
โจ Features
- ๐ง Laravel OCR Engine: Seamlessly switch between Tesseract (offline/privacy-first), Google Vision, AWS Textract, or Azure AI drivers.
- ๐ค AI-Powered Cleanup: Uses OpenAI or Anthropic to fix OCR typos (e.g.,
1NV01CE->INVOICE) and normalize data formats. - ๐ฆ Structured Data Objects: Returns typed
OcrResultDTOs, not just extraction arrays. - ๐ Advanced Table Extraction: specialized algorithms to extract line items, quantities, and prices from complex invoice layouts.
- ๐ Auto-Classification: Automatically detects document types (Invoice, Receipt, Contract, Purchase Order, etc.).
- โก Workflows: Define custom processing pipelines in your config (e.g., "If Invoice -> Extract Tables -> Verify Totals").
- ๐จ Blade Components: Built-in
x-laravel-ocr::document-previewcomponent to visualize results with bounding boxes. - ๐ Enterprise Security: Encrypted storage options, malware scanning, and full offline support for sensitive data.
๐ Installation
Requires PHP 8.2+ and Laravel 10.0+ (compatible with Laravel 11 & 12).
1. Install via Composer
composer require mayaram/laravel-ocr
2. Publish Configuration & Assets
php artisan vendor:publish --tag=laravel-ocr-config php artisan migrate
โ๏ธ Configuration
Set your preferred driver and credentials in your .env file.
Offline / Privacy-First (Default)
Calculations are done on your server. No data leaves your infrastructure.
LARAVEL_OCR_DRIVER=tesseract TESSERACT_BINARY=/usr/bin/tesseract
Cloud Drivers (Higher Accuracy)
# Google Cloud Vision LARAVEL_OCR_DRIVER=google_vision GOOGLE_VISION_KEY_FILE=/path/to/service-account.json # AWS Textract LARAVEL_OCR_DRIVER=aws_textract AWS_ACCESS_KEY_ID=your-key AWS_SECRET_ACCESS_KEY=your-secret AWS_DEFAULT_REGION=us-east-1
AI Cleanup (Optional but Recommended)
Enable AI to fix scanning errors and structure messy data.
LARAVEL_OCR_AI_CLEANUP=true LARAVEL_OCR_AI_PROVIDER=openai OPENAI_API_KEY=sk-...
๐ Usage
1. Simple Text Extraction
The LaravelOcr facade provides a simple entry point for basic extraction.
use Mayaram\LaravelOcr\Facades\LaravelOcr; // Extract from a local file, URL, or UploadedFile $result = LaravelOcr::extract(request()->file('document')); echo $result['text']; // "INVOICE #1001..."
2. Smart Parsing (Structured Data)
For powerful data extraction, use the DocumentParser. This returns a rich OcrResult DTO.
use Mayaram\LaravelOcr\Enums\DocumentType; $parser = app('laravel-ocr.parser'); $result = $parser->parse('storage/invoices/inv-2024.pdf', [ 'document_type' => DocumentType::INVOICE, 'use_ai_cleanup' => true ]); // 1. Access Clean Data $invoiceNumber = $result->fields['invoice_number']['value']; $totalAmount = $result->fields['totals']['total']['amount']; // 2. Access Metadata echo $result->confidence; // 0.98 echo $result->metadata['processing_time']; // 1.2s
3. Working with Line Items & Tables
The package includes an Advanced Invoice Extractor capable of parsing complex invoice tables into structured arrays.
$result = $parser->parse($invoicePath, ['extract_advanced_line_items' => true]); foreach ($result->fields['line_items'] as $item) { echo "{$item['description']}: {$item['quantity']} x \${$item['unit_price']} = \${$item['total']}\n"; } // Output: // Web Hosting: 12 x $10.00 = $120.00 // Domain Name: 1 x $15.00 = $15.00
4. Templates
Define reusable templates to target specific fields using Regex patterns. Clean and maintainable.
use Mayaram\LaravelOcr\Facades\LaravelOcr; // 1. Create a Template $template = app('laravel-ocr.templates')->create([ 'name' => 'TechCorp Invoice', 'type' => 'invoice', 'fields' => [ [ 'key' => 'order_id', 'pattern' => '/Order\s*ID:\s*([A-F0-9]+)/i', 'type' => 'string' ] ] ]); // 2. Apply it $result = LaravelOcr::extractWithTemplate($file, $template->id);
5. Workflows
Configure processing pipelines in config/laravel-ocr.php to standardize how different document types are handled.
// config/laravel-ocr.php 'workflows' => [ 'receipt' => [ 'options' => ['use_ai_cleanup' => true, 'extract_line_items' => true], 'validators' => [ ['type' => 'required_fields', 'fields' => ['total', 'date']] ] ] ], // Usage $result = $parser->parseWithWorkflow($file, 'receipt');
๐จ Blade Component
Preview the extracted document and data directly in your UI.
<x-laravel-ocr::document-preview :document="$processedDocument" :show-overlay="true" />
๐งช Testing
The package relies on Pest for testing.
composer test
๐ License
The MIT License (MIT). Please see License File for more information.