shamanhead/phpporser

Advanced parser for advanced tasks

v0.3 2021-06-09 17:40 UTC

This package is auto-updated.

Last update: 2024-11-10 01:25:58 UTC


README

  • Getting Started
    • Requirements
    • Installing via composer
    • Installing via archive
    • Installing chromium executable
  • Parsing your first page
  • Search methods
  • Working with text
  • Contribute
  • License

Getting Started

Requirements

Requires PHP 7.1+.

Also needs Headless Chromium PHP and chromium executable, if you want to use this library with headless browsing support(includes by standart in packagist version).

Installing via composer

$ composer require shamanhead/phpporser

Installing via archive

You can install this library also from archive, by downloading it from github. There is no dependendies needed, besides Headless Chromium PHP, if you want to use this library with headless browsing feature. Installing chromium executable If you want to use this library with headless browser, first you need to download executable of this browser.

This might works on Windows, MacOs and Linux.

Choose browser that you want to use

Headless chromium supports all chomium-based browsers, like Chrome, Opera, Chromium etc.

Installing chromium executable

I can recommend to use chromium instead of chrome, because of my observation he works better than chrome.

So, go on the official chromium browser downloading page and download it.

After doing this step, unpack archive and move to necessary place.

Then, specify path in your script:

require_once "vendor/autoload.php";

use HeadlessChromium\Page;

use ShamanHead\PhpPorser\App\Dom as Dom;

$dom = new Dom();
$dom->setHref('file:///home/shamanhead/dev/porser/phpporser-master/test.html');
$dom->setBrowserPath('PATH_TO_CHROME');

If you done all right, parser would work. If you have any errors occuring during this step, you can go see here, is there solution to solve your problem. In other case, please, open new issue here or on Headless Chromium PHP page.

Parsing your first page

Huh, half of work done. So now, let's try to parse simple page, like Computer sciense on wikipedia. With the help of it, I will show all the capabilities of the parser.

First of all, let's try to get 'Computer sciense' string on top of the page:

<?php

require_once "vendor/autoload.php";

use ShamanHead\PhpPorser\App\Dom as Dom;

$dom = new Dom();
$dom->setHref('https://en.wikipedia.org/wiki/Computer_science');

print_r($dom->tag('h1')->class('firstHeading')->text()->merge());

?>

It's works! But how? Let's me explain:

  1. Parser get's all tags with name 'h1'
  2. Then parser get's all tags with class 'firstHeading' in h1 tags range(and it's dependencies)
  3. Get's text from it
  4. Converts result array to string format

Search methods

To find elements in html dom, there is 4 functions in this library:
<?php

require_once "vendor/autoload.php";

use ShamanHead\PhpPorser\App\Dom as Dom;

$dom = new Dom();
$dom->setHref('href to file');

print_r($dom->tag('h1')->array()); //finds by tag name 'h1'
print_r($dom->id('firstHeading')->array()); //finds by id name 'firstHeading'
print_r($dom->class('wrapper__main')->array()); //finds by class name 'wrapper_main'
print_r($dom->custom(['name', 'button'])->array()); //finds by 'name' attribute value 'button'

?>

You can combine search methods with each other, to find elements in special way:

<?php

require_once "vendor/autoload.php";

use ShamanHead\PhpPorser\App\Dom as Dom;

$dom = new Dom();
$dom->setHref('href to file');

print_r($dom->class('main')->id('firstHeading')->tag('h1')->array());

?>

Working with text

<?php

require_once "vendor/autoload.php";

use ShamanHead\PhpPorser\App\Dom as Dom;

$dom = new Dom();
$dom->setHref('href to file');

$divText = $dom->tag('div')->id('someDiv')->text();

$divText->contents(); //Returns all text in array form.

$divText->merge('symbol'); //Returns all text in string form with 'symbol' separator
                          //'\n' by default.

$divText->first(); //Returns first founded text.

$divText->last(); //Returns last founded text.

?>

Contribute

Hey, want to contribute? Just notice me on my email ( arsenii.romanovskii85@gmail.com ), where will you indicate what you want to help.

License

See the LICENSE file.