README

Bower: bower install wildhoney/xpath-document

Getting Started

XPathDocument allows you to chain your query methods, allowing you to delve deeper into the DOM hierarchy with each iteration.

$posts = $xpathDocument->query('//div[@class="posts"]');

foreach ($posts as $post) {
    $comments = $post->query('div[@class="comments"]');
}

Each query will return an instance of XPathDocument_Dom_List – and this class implements Iterator, ArrayAccess and Countable, which gives you lots of useful methods for manipulating the node collection.

Typically XPathDocument_Dom_List will hold a collection of XPathDocument_Dom_Element instances – but other instances are possible:

XPathDocument_Dom_Element – generic elements with values and attributes;
XPathDocument_Dom_Attr – specific for node attributes;
XPathDocument_Dom_Text – specific for text values of nodes;

The latter two have a simple getText method for returning their values. However, XPathDocument_Dom_Element has the greatest flexibility.

Element Instance

With an instance of XPathDocument_Dom_Element you have the following methods:

getText – retrieve the value of the node;
getHtml – retrieve the HTML value of the node;
getName – retrieve the name of the node (span, div, etc...);
getAttribute – retrieve an attribute by its name;
query – use node as the context for further querying;

Reddit Example

Please see the Reddit.com example in the example/index.php which will demonstrate how simple it is to crawl websites with XPathDocument!

wildhoney / xpath-document

Maintainers

Details

README

Getting Started

Element Instance

Reddit Example