mostlyserious/craft-text-extractor

There is no license information available for the latest version (1.0.0) of this package.

A tool to extract text from documents.

1.0.0 2025-05-01 18:54 UTC

This package is auto-updated.

Last update: 2025-05-01 18:54:47 UTC


README

A tool to extract text from documents and insert it into Craft CMS Asset Elements.

Requirements

This plugin requires Craft CMS 5.0.0 or later, and PHP 8.2 or later.

Features

  • Supports PDF (.pdf) and MS Word (.docx) files
  • Extracts text on Asset creation and when Asset files are replaced
  • Includes an Action to extract text from the Assets index view.

Configuration

Extracted document text is inserted into the custom field handle defined by the plugin. The default field handle is body.

You can customize the handle by adding a plugin config file.

<?php

/* @note config/text-extractor.php */

return [
    'fieldHandle' => 'myCustomHandle'
];

This must be a Text field or CKEditor field.

Usage

  • Upload supported file extensions and enjoy!

Thank you to the following packages:

Future Plans and Other Document Parsers

The PHPWord library (docs) and PHPOffice tools like promising, but were more complex than needed for this project at this time.