mostlyserious / craft-text-extractor
A tool to extract text from documents.
Installs: 5
Dependents: 0
Suggesters: 0
Security: 0
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Type:craft-plugin
Requires
- php: >=8.2
- craftcms/cms: ^5.0.0
- label305/docx-extractor: ^0.2.3
- smalot/pdfparser: ^2.12.0
Requires (Dev)
- craftcms/ecs: dev-main
- craftcms/phpstan: dev-main
README
A tool to extract text from documents and insert it into Craft CMS Asset Elements.
Requirements
This plugin requires Craft CMS 5.0.0 or later, and PHP 8.2 or later.
Features
- Supports PDF (.pdf) and MS Word (.docx) files
- Password-protected PDF files are not supported.
- Extracts text on Asset creation and when Asset files are replaced
- Includes an Action to extract text from the Assets index view.
Configuration
Extracted document text is inserted into the custom field handle defined by the plugin. The default field handle is body
.
You can customize the handle by adding a plugin config file.
<?php /* @note config/text-extractor.php */ return [ 'fieldHandle' => 'myCustomHandle' ];
This must be a Text field or CKEditor field.
Usage
- Upload supported file extensions and enjoy!
Thank you to the following packages:
Future Plans and Other Document Parsers
The PHPWord library (docs) and PHPOffice tools like promising, but were more complex than needed for this project at this time.