1tomany/pdf-to-image

This package is abandoned and no longer maintained. The author suggests using the 1tomany/pdf-ai package instead.

A simple PHP library that makes extracting data from PDFs for large language models easy

Installs: 126

Dependents: 0

Suggesters: 0

Security: 0

Stars: 4

Watchers: 2

Forks: 1

Open Issues: 0

pkg:composer/1tomany/pdf-to-image

v0.5.2 2026-01-28 21:21 UTC

This package is auto-updated.

Last update: 2026-02-10 21:54:50 UTC


README

pdf-ai is a simple PHP library that makes extracting data from PDFs for large language models easy. It uses a single dependency, the Symfony Process Component, to interface with the Poppler command line tools from the xpdf library.

Installation

Install the library using Composer:

composer require 1tomany/pdf-ai

Installing Poppler

Before beginning, ensure the pdfinfo, pdftoppm, and pdftotext binaries are installed and located in the $PATH environment variables.

macOS

brew install poppler

Debian and Ubuntu

apt-get install poppler-utils

Usage

This library has three main features:

  • Read PDF metadata such as the number of pages
  • Rasterize one or more pages to JPEG or PNG images
  • Extract text from one or more pages

Extracted data is stored in memory and can be written to the filesystem or converted to a data: URI. Because extracted data is stored in memory, this library returns a \Generator object for each page that is extracted or rasterized.

Using the library is easy, and you have two ways to interact with it:

  1. Direct Instantiate the OneToMany\PDFAI\Client\Poppler\PopplerExtractorClient class and call the methods directly. This method is easier to use, but comes with the cost that your application will be less flexible and testable.
  2. Actions Create a container of OneToMany\PDFAI\Contract\Client\ExtractorClientInterface objects, and use the OneToMany\PDFAI\Factory\ExtractorClientFactory class to instantiate them.

Note: A Symfony bundle is available if you wish to integrate this library into your Symfony applications with autowiring and configuration support.

Direct usage

<?php

require_once __DIR__ . '/vendor/autoload.php';

use OneToMany\PDFAI\Client\Poppler\PopplerExtractorClient;
use OneToMany\PDFAI\Contract\Enum\OutputType;
use OneToMany\PDFAI\Request\ExtractDataRequest;
use OneToMany\PDFAI\Request\ExtractTextRequest;
use OneToMany\PDFAI\Request\ReadMetadataRequest;

$filePath = '/path/to/file.pdf';

// Construct the Poppler wrapper
$client = new PopplerExtractorClient();

// Construct and execute a request to read the PDF metadata
$metadata = $client->readMetadata(new ReadMetadataRequest($filePath));

vprintf("The PDF '%s' has %d page(s).\n", [
    $filePath, $metadata->getPages(),
]);

// Construct a request to rasterize all pages as 150 DPI JPEGs
$request = new ExtractDataRequest($filePath, 1, null, OutputType::Jpg, 150);

foreach ($client->extractData($request) as $image) {
    // $image->getData() or $image->toDataUri()
    printf("MD5: %s\n", md5($image->getData()));
}

// Extract text from pages 3 and 4
$request = new ExtractTextRequest($filePath, 3, 4);

foreach ($client->extractData($request) as $text) {
    // $text->getData()
    printf("Length: %d\n", strlen($text->getData()));
}

Test suite

Run the test suite with PHPUnit:

./vendor/bin/phpunit

Static analysis

Run static analysis with PHPStan:

./vendor/bin/phpstan

Credits

License

The MIT License