adamroyle/php-xpdf

XPDF PHP, an Object Oriented library to manipulate XPDF

1.0.1 2024-11-01 07:34 UTC

This package is auto-updated.

Last update: 2024-12-31 09:34:20 UTC


README

PHP-XPDF is an object oriented wrapper for XPDF.

Currently available:

  • PdfToText
  • PdfImages
  • PdfInfo
  • PdfToPpm

Installation

It is recommended to install PHP-XPDF through Composer :

{
  "require": {
    "adamroyle/php-xpdf": "^1.0.0"
  }
}

Dependencies :

In order to use PHP-XPDF, you need to install XPDF. Depending of your configuration, please follow the instructions at on the XPDF website.

Documentation

Driver Initialization

The easiest way to instantiate the driver is to call the `create method.

$pdfToText = XPDF\PdfToText::create();

You can optionaly pass a configuration and a logger (any Psr\Logger\LoggerInterface).

$pdfToText = XPDF\PdfToText::create(array(
    'pdftotext.binaries' => '/opt/local/xpdf/bin/pdftotext',
    'pdftotext.timeout' => 30, // timeout for the underlying process
), $logger);

Extract text

To extract text from PDF, use the getText method.

$pdfToText = XPDF\PdfToText::create();
$text = $pdfToText->getText('document.pdf');

You can optionally extract from a page to another page.

$text = $pdfToText->getText('document.pdf', $from = 1, $to = 4);

You can also predefined how much pages would be extracted on any call.

$pdfToText->setPageQuantity(2);
$pdfToText->getText('document.pdf'); // extracts page 1 and 2

Extract embedded images

To extract embedded images from PDF, use the PdfImages::getImages method.

$pdfImage = XPDF\PdfImage::create();
$pdfImage->setOutputFormat('jpeg');
$images = $pdfImage->getImages('document.pdf');

This will return an array of filenames in a temp directory.

Generate images

To convert the entire page to an images, use the PdfToPpm::getImages method.

$pdfToPpm = XPDF\PdfToPpm::create();
$pdfToPpm->setOutputFormat('png');

// optional, set an output resolution
$pdfToPpm->setResolution(300); // default is 150

// alternatively, set the max width/height in pixels. this overrides the resolution setting.
// $pdfToPpm->setMaxDimension(2000);

$images = $pdfToPpm->getImages('document.pdf');

This will return an array of filenames in a temp directory.

License

This project is licensed under the MIT license.