wildhoney/xpath-document

There is no license information available for the latest version (dev-master) of this package.

Friendlier XPath extension of DOMDocument for those fluent in beloved XPath!

dev-master 2013-11-13 11:55 UTC

This package is not auto-updated.

Last update: 2024-03-25 13:15:16 UTC


README

68747470733a2f2f7472617669732d63692e6f72672f57696c64686f6e65792f42616e7465722e6a732e706e673f6272616e63683d6d6173746572

Bower: bower install wildhoney/xpath-document

Getting Started

XPathDocument allows you to chain your query methods, allowing you to delve deeper into the DOM hierarchy with each iteration.

$posts = $xpathDocument->query('//div[@class="posts"]');

foreach ($posts as $post) {
    $comments = $post->query('div[@class="comments"]');
}

Each query will return an instance of XPathDocument_Dom_List – and this class implements Iterator, ArrayAccess and Countable, which gives you lots of useful methods for manipulating the node collection.

687474703a2f2f692e696d6775722e636f6d2f4f4d4f4e4563512e706e67

Typically XPathDocument_Dom_List will hold a collection of XPathDocument_Dom_Element instances – but other instances are possible:

  • XPathDocument_Dom_Element – generic elements with values and attributes;
  • XPathDocument_Dom_Attr – specific for node attributes;
  • XPathDocument_Dom_Text – specific for text values of nodes;

The latter two have a simple getText method for returning their values. However, XPathDocument_Dom_Element has the greatest flexibility.

Element Instance

With an instance of XPathDocument_Dom_Element you have the following methods:

  • getText – retrieve the value of the node;
  • getHtml – retrieve the HTML value of the node;
  • getName – retrieve the name of the node (span, div, etc...);
  • getAttribute – retrieve an attribute by its name;
  • query – use node as the context for further querying;

Reddit Example

Please see the Reddit.com example in the example/index.php which will demonstrate how simple it is to crawl websites with XPathDocument!