kosuha606/html-uni-parser

There is no license information available for the latest version (1.0.16) of this package.

Uni parser for sites

Installs: 49

Dependents: 0

Suggesters: 0

Security: 0

Stars: 0

Watchers: 0

Forks: 0

Open Issues: 2

Type:composer-pugin

1.0.16 2020-03-07 19:32 UTC

This package is auto-updated.

Last update: 2024-04-08 03:50:28 UTC


README

Build Status Scrutinizer Code Quality

Universal html parser which can parse every kind of html page

Installation

To install this plugin use composer:

$ composer require kosuha606/html-uni-parser

Usage

There is four available types of parsing html.

Example:

$results = HtmlUniParser::create([
    'pageUrl' => 'http://example.com',
    'xpathOnCard' => [
        'h1' => '//h1',
        'description' => 'HTML//p'
    ]
])->parseCard();

Examples

For more examples see the examples/ direcotry

Description of configurable properties

Property Description
catalogUrl The url address for parsing by catalog strategy parseCatalog()
searchUrl The url what used to search on goal site. parseSearch()
pageUrl The url what used to parse one page. parseCard()
urlGenerator Callback function what can be used to generate links to parse parseGenerator()
encoding The encoding of goal site
siteBaseUrl Base url for process links after parse
resultLimit Here you can limit the results count
sleepAfterRequest Number of seconds to sleep after each request
goIntoCard Wheather need to go into card when parse catalog links
xpathItem Xpath query what can be used for parse items in list
xpathLink Xpath query what can be used for parse link inside parsed item
xpathOnCard Array of xpath queries, every key will be key in result array
typeMech Type of parsing mechanizm, for example: wget, curl, phantomjs, filegetcontents
forceOuterHtml Force parser to use outer html for xpaths

Available methods

Method Description
parseCatalog To parse catalog links and parse every link this function reutrn results as array of parsed links
parseSearch This method takes an argument of query string for search page and after building search link it behave like parseCatalog
parseCard To parse one page of site
parseGenerator To parse links what was generated by urlGenerator callback

Run tests

To run tests you can use this command:

./vendor/bin/phpunit