dlindberg / blob-chunk
Utility for breaking up a content fragment of HTML for search indexing
Requires
- php: ^7.2
- ext-dom: *
- ext-libxml: *
- ext-mbstring: *
- dlindberg/dom-document-factory: ^1.0
Requires (Dev)
- phpunit/phpunit: ^8.0
- squizlabs/php_codesniffer: ^3.0
This package is auto-updated.
Last update: 2024-10-28 08:28:36 UTC
README
This is currently an early work in progress. The purpose of this project is to take a content block of html and break it apart into smaller chunks to make to improve indexing with search appliances such as Algolia, where frequently the raw html content is too large to fit within the index limits.
Install
Via Composer
$ composer require dlindberg/blob-chunk
Basic Usage
$blobChunk = new dlindberg\BlobChunk(); $result = $blobChunk->parse($html);
Returns an array of content chunks. By default it attempts to break out lists, tables, header tags, and paragraphs as separate elements. It also breaks apart paragraphs into sentences. There is a reasonable amount of surface area for extensibility and configuration; however, that area of the project is still somewhat of a work in progress.
Change log
Please see CHANGELOG for more information on what has changed recently.
Testing
$ composer test
The current tests for the manager are reasonably thorough. Tests on the parser and parent class need to be improved.
Contributing
Please see CONTRIBUTING and CODE_OF_CONDUCT for details.
Security
If you discover any security related issues, please email dane@lindberg.xyz instead of using the issue tracker.
Credits
The boiler plate for this project is based on The League of Extraordinary Packages' Skeleton package repository.
License
The MIT License (MIT). Please see License File for more information.