adamroyle / php-xpdf
XPDF PHP, an Object Oriented library to manipulate XPDF
Requires
- php: >=7.0.0
- alchemy/binary-driver: ~1.5
Requires (Dev)
- phpunit/phpunit: ^11.4
This package is auto-updated.
Last update: 2024-11-01 09:06:25 UTC
README
PHP-XPDF is an object oriented wrapper for XPDF.
Currently available:
- PdfToText
- PdfImages
- PdfInfo
- PdfToPpm
Installation
It is recommended to install PHP-XPDF through Composer :
{ "require": { "adamroyle/php-xpdf": "^1.0.0" } }
Dependencies :
In order to use PHP-XPDF, you need to install XPDF. Depending of your configuration, please follow the instructions at on the XPDF website.
Documentation
Driver Initialization
The easiest way to instantiate the driver is to call the `create method.
$pdfToText = XPDF\PdfToText::create();
You can optionaly pass a configuration and a logger (any
Psr\Logger\LoggerInterface
).
$pdfToText = XPDF\PdfToText::create(array( 'pdftotext.binaries' => '/opt/local/xpdf/bin/pdftotext', 'pdftotext.timeout' => 30, // timeout for the underlying process ), $logger);
Extract text
To extract text from PDF, use the getText
method.
$pdfToText = XPDF\PdfToText::create(); $text = $pdfToText->getText('document.pdf');
You can optionally extract from a page to another page.
$text = $pdfToText->getText('document.pdf', $from = 1, $to = 4);
You can also predefined how much pages would be extracted on any call.
$pdfToText->setPageQuantity(2); $pdfToText->getText('document.pdf'); // extracts page 1 and 2
Extract embedded images
To extract embedded images from PDF, use the PdfImages::getImages
method.
$pdfImage = XPDF\PdfImage::create(); $pdfImage->setOutputFormat('jpeg'); $images = $pdfImage->getImages('document.pdf');
This will return an array of filenames in a temp directory.
Generate images
To convert the entire page to an images, use the PdfToPpm::getImages
method.
$pdfToPpm = XPDF\PdfToPpm::create(); $pdfToPpm->setOutputFormat('png'); // optional, set an output resolution $pdfToPpm->setResolution(300); // default is 150 // alternatively, set the max width/height in pixels. this overrides the resolution setting. // $pdfToPpm->setMaxDimension(2000); $images = $pdfToPpm->getImages('document.pdf');
This will return an array of filenames in a temp directory.
License
This project is licensed under the MIT license.