pforret / pf_pageparser
Simple Regex Page Parser in PHP
Installs: 579
Dependents: 1
Suggesters: 0
Security: 0
Stars: 2
Watchers: 3
Forks: 1
Open Issues: 0
Language:HTML
Requires
- php: ^8.0
- ext-curl: *
- ext-json: *
- guzzlehttp/guzzle: ^6.5|^7.0
- psr/log: ^1.1|^2.0|^3.0
Requires (Dev)
- laravel/pint: ^1.6
- phpunit/phpunit: ^9.5
README
This is a HTML parser I've written because I scrape a lot of web sites to look for structured, repetitive data. This parser allows me to easily cleanup HTML, split it into chunks and find the right data in each chunk It does not use a DOM parser, so it also works on partial or invalid HTML
Installation
You can install the package via composer:
composer require pforret/pf_pageparser
Usage
$pp=New PfPageparser(["cacheTtl" => 300]); $pp->load_from_url("http://www.example.com/products") ->trim("<table","</table>") ->split_chunks('</tr>') ->filter_chunks('product_id') ->parse_from_chunks('|Price: [\d\.]*|',true); $prices=$pp->results();
Testing
composer test
Changelog
Please see CHANGELOG for more information what has changed recently.
Contributing
Please see CONTRIBUTING for details.
Security
If you discover any security related issues, please email peter@forret.com instead of using the issue tracker.
Credits
License
The MIT License (MIT). Please see License File for more information.
PHP Package Boilerplate
This package was generated using the PHP Package Boilerplate.