forest/pdf-to-text

Extract text from a pdf

2.0.4 2019-09-11 14:47 UTC

This package is auto-updated.

Last update: 2025-05-12 03:45:00 UTC


README

This package provides a class to extract text from a pdf.

 \forest\PdfToText\Pdf::getText('book.pdf'); //returns the text from the pdf

Requirements

Behind the scenes this package leverages pdftotext. You can verify if the binary installed on your system by issueing this command:

which pdftotext

If it is installed it will return the path to the binary.

To install the binary you can use this command on Ubuntu or Debian:

apt-get install poppler-utils

If you're on RedHat or CentOS use this:

yum install poppler-utils

Installation

You can install the package via composer:

$ composer require forest/pdf-to-text

Usage

Extracting text from a pdf is easy.

$text = (new Pdf())
    ->setPdf('book.pdf')
    ->text();

Or easier:

 \forest\PdfToText\Pdf::getText('book.pdf')

By default the package will assume that the pdftotext is located at /usr/bin/pdftotext. If you're using the a different location pass the path to the binary in constructor

$text = (new Pdf('/custom/path/to/pdftotext'))
    ->setPdf('book.pdf')
    ->text();

or as the second parameter to the getText-function:

 \forest\PdfToText\Pdf::getText('book.pdf', '/custom/path/to/pdftotext')

Change log

Please see CHANGELOG for more information what has changed recently.

Testing

$ composer test

Contributing

Please see CONTRIBUTING for details.

License

The MIT License (MIT). Please see License File for more information.