xp-forge / pdf-parser
PDF Parser
v0.1.0
2026-05-30 16:36 UTC
Requires
- php: >=7.4.0
- xp-forge/compression: ^2.0 | ^1.3
- xp-framework/core: ^12.5 | ^11.10
Requires (Dev)
- xp-framework/test: ^2.0
README
Parses PDF files to extract text and images.
Example
Low-level usage:
use com\adobe\pdf\PdfReader; use util\cmd\Console; use io\streams\FileInputStream; $reader= new PdfReader(new FileInputStream($argv[1])); // Create objects lookup table while streaming $objects= $trailer= []; foreach ($reader->objects() as $kind => $value) { if ('object' === $kind) { $objects[$value['id']->hashCode()]= $value['dict']; } else if ('trailer' === $kind) { $trailer+= $value; } } Console::writeLine('Trailer: ', $trailer); // Optional meta information like author and creation date if ($info= ($trailer['Info'] ?? null)) { Console::writeLine('Info: ', $objects[$info->hashCode()]); } // Root catalogue and pages enumeration Console::writeLine('Root: ', $objects[$trailer['Root']->hashCode()]); Console::writeLine('Pages: ', $objects[$trailer['Pages']->hashCode()]);

