ferdiunal / php-firepdf
PHP FFI wrapper for the Rust pdf-inspector library.
Fund package maintenance!
Requires
- php: ^8.4
- illuminate/contracts: ^11.0||^12.0||^13.0
- spatie/laravel-package-tools: ^1.16
Requires (Dev)
- larastan/larastan: ^3.0
- laravel/ai: ^0.6
- laravel/pint: ^1.14
- nunomaduro/collision: ^8.8
- orchestra/testbench: ^10.0.0||^9.0.0
- pestphp/pest: ^4.0
- pestphp/pest-plugin-arch: ^4.0
- pestphp/pest-plugin-laravel: ^4.0
- phpstan/extension-installer: ^1.4
- phpstan/phpstan-deprecation-rules: ^2.0
- phpstan/phpstan-phpunit: ^2.0
- spatie/laravel-ray: ^1.35
Suggests
- ext-ffi: Required for native Rust FFI bindings.
This package is auto-updated.
Last update: 2026-04-28 23:57:05 UTC
README
PHP FFI wrapper for pdf-inspector, a fast Rust library for PDF classification and text extraction.
Documentation: English | Türkçe
This package exposes the full pdf-inspector API surface (process, detect, classify, extract text, region extraction, per-page markdown) through PHP FFI. It includes a Laravel service provider and facade for seamless integration.
Requirements
- PHP 8.4+
- PHP FFI extension enabled (
extension=ffi) - Rust toolchain (only needed when you build native binaries yourself)
Installation
composer require ferdiunal/php-firepdf
Native Library Resolution (Production)
The package resolves the shared library in this order:
FIREPDF_LIB_PATH(or Laravelphp-firepdf.lib_path)- Bundled package path:
native/lib/<os>-<arch>/ - Dev fallback:
native/pdf-inspector-ffi/target/release/
If your production deployment does not set FIREPDF_LIB_PATH, make sure the package contains the prebuilt file under native/lib/<os>-<arch>/.
Build the Rust FFI bridge (local/dev)
cd vendor/ferdiunal/php-firepdf/native/pdf-inspector-ffi cargo build --release --locked # copy the built file into package bundle layout cd ../.. ./scripts/stage-native-bundle.sh
Build Bundles for Win/macOS/Linux
Use GitHub Actions workflow native-bundles to produce bundle artifacts for Linux, macOS, and Windows.
The output folder name includes runner architecture (for example: linux-x86_64, darwin-arm64, windows-x86_64).
Each artifact contains:
native/lib/<os>-<arch>/<library>
Include these files in the package release, or set FIREPDF_LIB_PATH explicitly at runtime.
Laravel config (optional)
php artisan vendor:publish --tag="php-firepdf-config"
Usage
Standalone
use Ferdiunal\FirePdf\FirePdf; $pdf = new FirePdf(); // Full processing: detect + extract + markdown $result = $pdf->processPdf('document.pdf'); echo $result->pdfType; // TextBased, Scanned, ImageBased, Mixed echo $result->markdown; // Markdown string or null // Fast detection only $info = $pdf->detectPdf('document.pdf'); // From bytes (no filesystem) $bytes = file_get_contents('document.pdf'); $result = $pdf->processPdfBytes($bytes); // Per-page markdown $pages = $pdf->extractPagesMarkdown('document.pdf'); foreach ($pages->pages as $page) { echo "Page {$page->page}: {$page->markdown}"; }
Laravel
use Ferdiunal\FirePdf\Facades\FirePdf; $result = FirePdf::processPdf('document.pdf');
Laravel AI SDK Tools (Laravel 13)
This package ships optional AI SDK-compatible tools under the
Ferdiunal\FirePdf\Ai\Tools namespace:
DetectPdfToolClassifyPdfToolProcessPdfToolExtractTextToolExtractPagesMarkdownTool
These tools follow the Laravel AI SDK Tool contract and can be returned
explicitly from your agent's tools() method:
<?php namespace App\Ai\Agents; use Ferdiunal\FirePdf\Ai\Tools\ClassifyPdfTool; use Ferdiunal\FirePdf\Ai\Tools\DetectPdfTool; use Ferdiunal\FirePdf\Ai\Tools\ExtractPagesMarkdownTool; use Ferdiunal\FirePdf\Ai\Tools\ExtractTextTool; use Ferdiunal\FirePdf\Ai\Tools\ProcessPdfTool; use Laravel\Ai\Contracts\Agent; use Laravel\Ai\Contracts\HasTools; use Laravel\Ai\Contracts\Tool; use Laravel\Ai\Promptable; final class PdfAssistant implements Agent, HasTools { use Promptable; /** * @return Tool[] */ public function tools(): iterable { return [ new DetectPdfTool(), new ClassifyPdfTool(), new ProcessPdfTool(), new ExtractTextTool(), new ExtractPagesMarkdownTool(), ]; } }
Tool input is storage-scoped and requires a relative path argument. Configure
the disk and base path:
// config/php-firepdf.php 'ai_tools' => [ 'disk' => env('FIREPDF_AI_TOOLS_DISK', 'local'), 'base_path' => env('FIREPDF_AI_TOOLS_BASE_PATH', 'incoming/pdfs'), ],
Example call payload (from an AI tool invocation):
{
"path": "contracts/sample.pdf"
}
If you use these tools, install the Laravel AI SDK in your Laravel app:
composer require laravel/ai
Laravel Validation Rules (Real PDF Check)
Object rule:
use Ferdiunal\FirePdf\Rules\ValidPdf; $rules = [ 'document' => ['required', 'file', new ValidPdf()], ];
String alias:
$rules = [ 'document' => ['required', 'file', 'firepdf_pdf'], ];
Recommended for early filtering + deep validation:
$rules = [ 'document' => ['required', 'file', 'mimetypes:application/pdf', 'firepdf_pdf'], ];
API Reference
| Method | Description |
|---|---|
processPdf(path, pages?) |
Full processing (detect + extract + markdown) |
processPdfBytes(data, pages?) |
Full processing from bytes |
detectPdf(path) |
Fast detection only |
detectPdfBytes(data) |
Fast detection from bytes |
classifyPdf(path) |
Lightweight classification |
classifyPdfBytes(data) |
Lightweight classification from bytes |
extractText(path) |
Plain text extraction |
extractTextBytes(data) |
Plain text from bytes |
extractTextWithPositions(path, pages?) |
Text with X/Y coords and font info |
extractTextWithPositionsBytes(data, pages?) |
Positions from bytes |
extractTextInRegions(path, pageRegions) |
Extract text in bounding-box regions |
extractTextInRegionsBytes(data, pageRegions) |
Region extraction from bytes |
extractTablesInRegions(path, pageRegions) |
Table markdown in regions |
extractTablesInRegionsBytes(data, pageRegions) |
Table regions from bytes |
extractPagesMarkdown(path, pages?) |
Per-page markdown + layout metadata |
extractPagesMarkdownBytes(data, pages?) |
Per-page markdown from bytes |
getRuntimeSnapshot() |
Returns aggregate runtime telemetry for worker memory/speed |
resetRuntimeSnapshot() |
Resets aggregate runtime telemetry counters |
shouldRecycleWorker() |
Returns true when configured soft/hard memory limit was exceeded |
close() |
Closes the FFI handle and runs a GC cycle |
Validation extensions:
Ferdiunal\FirePdf\Rules\ValidPdf(object rule)firepdf_pdf(string alias)
Server Recipes
Parse Time & Memory Telemetry
$firePdf->resetRuntimeSnapshot(); $result = $firePdf->processPdf($path); $snapshot = $firePdf->getRuntimeSnapshot(); echo $snapshot->lastDurationMs; // last operation duration echo $snapshot->averageDurationMs; // average duration echo $snapshot->currentMemoryBytes; // current process memory echo $snapshot->peakMemoryBytes; // process peak memory
For quick markdown + telemetry reports on sample PDFs:
php scripts/test-user-pdfs.php
Swoole / OpenSwoole (request loop)
$result = $firePdf->processPdf($path); if ($firePdf->shouldRecycleWorker()) { // Mark worker for graceful recycle at end of request. }
FrankenPHP worker mode
$result = $firePdf->processPdf($path); if ($firePdf->shouldRecycleWorker()) { // Trigger worker restart in your supervisor/worker control flow. }
RoadRunner worker
$result = $firePdf->processPdf($path); if ($firePdf->shouldRecycleWorker()) { // Stop current worker and let RR spawn a fresh one. }
Recommended policy:
- Use worker
max requestsandshouldRecycleWorker()together. - Set
soft_limit_mbbelow your process hard limit. - Set
hard_limit_mbas a deterministic recycle threshold.
Testing
# Native build cd native/pdf-inspector-ffi cargo build --release --locked # PHP tests (requires the FFI library to be built) composer test # PHP static analysis composer analyse
License
MIT. Please see License File for more information.