prinsfrank / pdf-samples
A collection of files to use for testing conforming readers
Fund package maintenance!
PrinsFrank
Installs: 8 159
Dependents: 0
Suggesters: 0
Security: 0
Stars: 0
Watchers: 1
Forks: 1
Open Issues: 0
pkg:composer/prinsfrank/pdf-samples
Requires
- php: ^8.1
- symfony/yaml: ^7.3
Requires (Dev)
- prinsfrank/pdfparser: ^2.5
- dev-main
- v1.0.1
- v1.0.0
- v0.0.9
- v0.0.8
- v0.0.7
- v0.0.6
- v0.0.5
- v0.0.4
- v0.0.3
- v0.0.2
- v0.0.1
- dev-add-sample-for-incomplete-text-extraction
- dev-update-readme
- dev-add-pdf-with-image
- dev-add-command-to-generate-file-content
- dev-add-data-provider-for-files
- dev-set-up-new-directory-structure
- dev-optimize-text-parser
- dev-support-surrogate-pairs-as-destination-character
This package is auto-updated.
Last update: 2025-10-19 14:03:57 UTC
README
⚠️ **This repository is archived. Samples now live in the pdfparser library itself ⚠️
This repository is a collection of PDF files together with structured data about their contents in YAML form.
Objective of this repository
The objective of this repository is two-fold:
- Provide a variety of PDF files by different generators for testing/benchmarking purposes.
- Provide structured data about the contents of those files to test output of conforming parsers.
File organization
The samples are organized in subdirectories by generator as follows:
generator-name
sample-description
file.pdf
contents.yml
images
(optional)page_0_image_0.png
(page number and index of image on page, file extension based on image type)
Contents.yml
The contents.yml file contains structured data about the contents of the PDF file. The schema is defined in schema.json
in the root of this repository.
Properties can be added but not removed or renamed between minor versions to preserve BC.