dotpack/php-boiler-pipe

This package is abandoned and no longer maintained. No replacement package was suggested.

PhpBoilerPipe. Boilerplate Removal and Fulltext Extraction from HTML pages

dev-master 2024-06-02 09:19 UTC

This package is not auto-updated.

Last update: 2024-06-12 17:59:32 UTC


README

Project Archived

This project is no longer maintained. Please refer to pforret/pf-article-extractor for further updates and continued development.

Thank you for your support!

Boilerplate Removal and Fulltext Extraction from HTML pages.

Partial implementation of https://github.com/kohlschutter/boilerpipe in PHP. Requires PHP >= 5.4.

Example

# html
$path = "http://example.com/some-article.html";
$data = file_get_contents($path);

# code
$ae = new DotPack\PhpBoilerPipe\ArticleExtractor();
echo $ae->getContent($data) . "\n";