sgh / pdfbox
PHP5 wrapper for the Apache PdfBox ExtractText utility.
Installs: 87 495
Dependents: 7
Suggesters: 0
Security: 0
Stars: 20
Watchers: 4
Forks: 8
Open Issues: 3
Requires
- php: >=5.3.0
Requires (Dev)
- phpunit/phpunit-dom-assertions: 1.0.*@dev
This package is not auto-updated.
Last update: 2024-11-21 17:03:00 UTC
README
A PHP interface for the PdfBox ExtractText utility, useful to unit-test contents of generated PDFs.
Requirements
- PHP >=5.3 or HHVM
- Java Runtime Environment
- PdfBox JAR file
- Download: http://pdfbox.apache.org/downloads.html
- Minimum version: 1.2.0
- Recommended version: >= 1.8.3
- PHP needs permissions for shell execution
Install
To install with composer:
composer require sgh/pdfbox
Basic Usage
use SGH\PdfBox\PdfBox; //$pdf = GENERATED_PDF; $converter = new PdfBox; $converter->setPathToPdfBox('/usr/bin/pdfbox-app-1.7.0.jar'); $text = $converter->textFromPdfStream($pdf); $html = $converter->htmlFromPdfStream($pdf); $dom = $converter->domFromPdfStream($pdf);
If the source PDF is a file, use xxxFromPdfFile()
instead xxxFromPdfStream()
with the source path as parameter.
If you want to save the converted output to a file, specify the destination path as second parameter of the xxxFromPdfxxx()
methods.
Advanced Usage
Convert a range of pages instead of the full document:
$converter->getOptions() ->setStartPage(2) ->setEndPage(5);
Ignore corrupt objects in the PDF:
$converter->getOptions() ->setForce(true);
Sort text:
$converter->getOptions() ->setSort(true);
PHPUnit tests
To run the unit tests, change the environment variable PDFBOX_JAR
to the full path of your PdfBox JAR file. See phpunit.xml.dist.