jbpapp/pdf-to-text

Extract text from a pdf file using pdf-to-text binary.

2.0 2019-02-24 22:58 UTC

This package is auto-updated.

Last update: 2024-12-25 12:49:46 UTC


README

Read PDF files with PHP 5.6 (based on spatie/pdf-to-text package)

This package is a PHP 5.6+ fork of Spatie PDF To Text package. If you use PHP7, please use the original package.

This package provides a class to extract text from a pdf.

 \JBPapp\PdfToText\Pdf::getText('book.pdf'); //returns the text from the pdf

Requirements

Behind the scenes this package leverages pdftotext. You can verify if the binary installed on your system by issueing this command:

which pdftotext

If it is installed it will return the path to the binary.

To install the binary you can use this command on Ubuntu or Debian:

apt-get install poppler-utils

If you're on RedHat or CentOS use this:

yum install poppler-utils

Installation

You can install the package via composer:

$ composer require spatie/pdf-to-text

Usage

Extracting text from a pdf is easy.

$text = (new Pdf())
    ->setPdf('book.pdf')
    ->text();

Or easier:

 \JBPapp\PdfToText\Pdf::getText('book.pdf')

By default the package will assume that the pdftotext is located at /usr/bin/pdftotext. If you're using the a different location pass the path to the binary in constructor

$text = (new Pdf('/custom/path/to/pdftotext'))
    ->setPdf('book.pdf')
    ->text();

or as the second parameter to the getText-function:

 \JBPapp\PdfToText\Pdf::getText('book.pdf', '/custom/path/to/pdftotext')

Change log

Please see CHANGELOG for more information what has changed recently.

Testing

$ composer test

Contributing

Please see CONTRIBUTING for details.

Security

If you discover any security related issues, please email freek@spatie.be instead of using the issue tracker.

Credits

License

The MIT License (MIT). Please see License File for more information.