dlindberg/blob-chunk

Utility for breaking up a content fragment of HTML for search indexing

0.1.0 2019-03-27 19:48 UTC

This package is auto-updated.

Last update: 2024-04-28 07:33:45 UTC


README

Latest Version on Packagist Software License Build Status Coverage Status Quality Score Total Downloads

This is currently an early work in progress. The purpose of this project is to take a content block of html and break it apart into smaller chunks to make to improve indexing with search appliances such as Algolia, where frequently the raw html content is too large to fit within the index limits.

Install

Via Composer

$ composer require dlindberg/blob-chunk

Basic Usage

$blobChunk = new dlindberg\BlobChunk();
$result = $blobChunk->parse($html);

Returns an array of content chunks. By default it attempts to break out lists, tables, header tags, and paragraphs as separate elements. It also breaks apart paragraphs into sentences. There is a reasonable amount of surface area for extensibility and configuration; however, that area of the project is still somewhat of a work in progress.

Change log

Please see CHANGELOG for more information on what has changed recently.

Testing

$ composer test

The current tests for the manager are reasonably thorough. Tests on the parser and parent class need to be improved.

Contributing

Please see CONTRIBUTING and CODE_OF_CONDUCT for details.

Security

If you discover any security related issues, please email dane@lindberg.xyz instead of using the issue tracker.

Credits

The boiler plate for this project is based on The League of Extraordinary Packages' Skeleton package repository.

License

The MIT License (MIT). Please see License File for more information.