jerodev / diggy
A fluent PHP web scraper
Installs: 2 515
Dependents: 0
Suggesters: 0
Security: 0
Stars: 3
Watchers: 3
Forks: 0
Open Issues: 0
Requires
- php: ^8.1
- ext-dom: *
- guzzlehttp/guzzle: ^7.0
- symfony/css-selector: ^6.1
Requires (Dev)
- jerodev/code-styles: dev-master
- phpstan/phpstan: ^1.0
- phpunit/phpunit: ^9.3
README
Diggy is a simple wrapper around the PHP DOM extension that allow finding elements using simple query selectors and fail proof chaining.
Requirements
- PHP 8.1
Getting started
Diggy includes a simple webclient that uses Guzzle under the hood to download a page and return a NodeCollection
object. However, you can use any webclient you prefer and pass a DOMNode
or DOMNodeList
object to the
NodeCollection
constructor.
$client = new \Jerodev\Diggy\WebClient(); $page = $client->get('https://www.deviaene.eu/'); $socials = $page->first('#social')->querySelector('a span')->texts(); var_dump($socials); // [ // 'GitHub', // 'Twitter', // 'Email', // 'LinkedIn', // ]
Available functions
These are the available functions on a NodeCollection
object. All functions that do not return a native value can be
chained without having to worry if there are nodes in the collection or not.
attribute(string $name)
Returns the value of the attribute of the first element in the collection if available.
$nodes->attribute('href');
count()
Returns the number of elements in the current node collection.
$nodes->count();
each(string $selector, closure $closure, ?int $max = null)
Loops over all dom elements in the current collection and executes a closure for each element. The return value of this function is an array of values returned from the closure.
$nodes->each('a', static function (NodeFilter $node) { return $a->attribute('href'); });
exists(?string $selector = null)
Indicates if an element exists in the collection. If a selector is given, the current nodes will first be filtered.
$nodes->exists('a.active');
filter(closure $closure)
Filters the current node collection based on a given closure.
$nodes->filter(static function (NodeFilter $node) { return $node->text() === 'foo'; });
first(?string $selector = null)
Returns the first element of the node collection. If a selector is given, the current nodes will first be filtered.
$nodes->first('a.active');
is(string $nodeName)
Indicates if the first element in the current collection has a specified tag name.
$nodes->is('div');
last(?string $selector = null)
Returns the last element of the node collection. If a selector is given, the current nodes will first be filtered.
$nodes->last('a.active');
nodeName()
Returns the tag name of the first element in the current node collection
$nodes->nodeName();
nth(int $index, ?string $selector = null)
Returns the nth element of the node collection, starting at 0
.
If a selector is given, the current nodes will first be filtered.
$nodes->nth(1, 'a.active');
querySelector(string $selector)
Finds all elements in the current node collection matching this css query selector.
$nodes->querySelector('a.active');
text(?string $selector = null)
Returns the inner text of the first element in the node collection. If a selector is given, the current nodes will first be filtered.
$nodes->text('p.description');
texts()
Returns an array containing the inner text of every root element in the collection.
$nodes->texts('nav > a');
whereHas(closure $closure)
Filter nodes that contain child nodes that fulfill the filter described by the closure
$nodes->whereHas(static function (NodeFilter $node) { return $node->first('a[href]'); });
whereHasAttribute(string $key, ?string $value = null)
Filters the current node collection by the existence of a specific attribute. If a value is given the collection is also filtered by the value of this attribute.
$nodes->whereHasAttribute('href');
whereHasText(?string $value = null, bool $trim = true, bool $exact = false)
Filters the current node collection by the existence of inner text.
Setting a value will also filter the nodes by the actual inner text based on $trim
and $exact
.
$nodes->whereHasText('foo');
xPath(string $selector)
Finds all elements in the current node collection matching this xpath query selector.
$nodes->xPath('//nav/a[@href]');