pforret/pf_pageparser

Simple Regex Page Parser in PHP

2.0.3 2023-04-17 11:58 UTC

This package is auto-updated.

Last update: 2024-11-18 15:24:04 UTC


README

Latest Version on Packagist Build Status Quality Score Total Downloads

This is a HTML parser I've written because I scrape a lot of web sites to look for structured, repetitive data. This parser allows me to easily cleanup HTML, split it into chunks and find the right data in each chunk It does not use a DOM parser, so it also works on partial or invalid HTML

Installation

You can install the package via composer:

composer require pforret/pf_pageparser

Usage

$pp=New PfPageparser(["cacheTtl" => 300]);

$pp->load_from_url("http://www.example.com/products")
    ->trim("<table","</table>")
    ->split_chunks('</tr>')
    ->filter_chunks('product_id')
    ->parse_from_chunks('|Price: [\d\.]*|',true);

$prices=$pp->results();

Testing

composer test

Changelog

Please see CHANGELOG for more information what has changed recently.

Contributing

Please see CONTRIBUTING for details.

Security

If you discover any security related issues, please email peter@forret.com instead of using the issue tracker.

Credits

License

The MIT License (MIT). Please see License File for more information.

PHP Package Boilerplate

This package was generated using the PHP Package Boilerplate.