unknow-sk/laravel-pdf-to

Laravel package for converting pdf to text, html or image

1.0.0 2025-04-29 06:05 UTC

This package is auto-updated.

Last update: 2025-04-29 06:05:58 UTC


README

License: MIT Latest Version on Packagist GitHub Tests Action Status GitHub Code Style Action Status Total Downloads

Laravel package for extracting Text/Html from a PDF or converting it to images (PNG, JPeG).

Support us

We invest a lot of time and give our hearts to work in Open Source.

Installation

You can install the package via composer:

composer require unknow-sk/laravel-pdf-to

You can publish the config file with:

php artisan vendor:publish --tag="laravel-pdf-to-config"

This is the content of the published config file:

return [
    /**
     * Set the pdftotext binary path manually
     */
    'pdftotext_bin' => env('PDF_TO_TEXT_PATH'),

    /**
     * Set the pdftohtml binary path manually
     */
    'pdftohtml_bin' => env('PDF_TO_HTML_PATH'),

    /**
     * Set the pdftoppm binary path manually
     */
    'pdftoppm_bin' => env('PDF_TO_PPM_PATH'),

    /**
     * Set the pdftocairo binary path manually
     */
    'pdftocairo_bin' => env('PDF_TO_CAIRO_PATH'),

    /**
     * Set the default output directory
     */
    'output_dir' => env('PDF_TO_OUTPUT_DIR', storage_path('app/pdf-to')),
];

Required Packages

This package relies on the following external tools:

  • pdftotext: For extracting text from PDFs.
  • pdftohtml: For converting PDFs to HTML.
  • pdftoppm: For generating images from PDFs.

Make sure these tools are installed and available in your system's PATH. On macOS, you can install them via Homebrew:

brew install poppler

Binary Files Configuration

By default, the package attempts to locate the required binary files (pdftotext, pdftohtml, pdftoppm, or pdftocairo) automatically. If these binaries are not found in your system's PATH, you will need to set their paths manually in the configuration file.

You can update the configuration file config/pdf-to.php as follows:

return [
    'pdftotext_bin' => env('PDF_TO_TEXT_PATH'),
    'pdftohtml_bin' => env('PDF_TO_HTML_PATH'),
    'pdftoppm_bin' => env('PDF_TO_PPM_PATH'),
    'pdftocairo_bin' => env('PDF_TO_CAIRO_PATH'),
];

Alternative Libraries

For text extraction, this package uses the Spatie/pdf-to-text. However, if you don't have pdftoppm or pdftocairo installed, you can also use the Spatie/pdf-to-image for image generation. This provides a fallback mechanism to ensure functionality even without the required binaries.

Usage

Extract Text from PDF

use UnknowSk\LaravelPdfTo\Facades\LaravelPdfTo;

$text = LaravelPdfTo::setFile('path/to/your/file.pdf')
    ->setTimeout(120) // optionally
    ->result('txt');
echo $text;

Convert PDF to HTML

use UnknowSk\LaravelPdfTo\Facades\LaravelPdfTo;

$html = LaravelPdfTo::setFile('path/to/your/file.pdf')
    ->setConfig(['options' => [...]]) // optionally
    ->saveAs('output-file') // optionally, if you wan to store as file, then result returns path
    ->result('html');
echo $html;

Convert PDF to Images

use UnknowSk\LaravelPdfTo\Facades\LaravelPdfTo;

$image = LaravelPdfTo::setFile('path/to/your/file.pdf')
    ->setTimeout(180) // optionally
    ->result('png');
echo $image; // Path to the generated image

Testing

To run the tests, use the following command:

composer test

The tests include functionality for extracting text, converting to HTML, and generating images from PDFs. Example test files are located in the tests/ directory.

Changelog

Please see CHANGELOG for more information on what has changed recently.

Contributing

Please see CONTRIBUTING for details.

Security Vulnerabilities

Please review our security policy on how to report security vulnerabilities.

Credits

License

The MIT License (MIT). Please see License File for more information.