XML SAX (JAVA like) parser

4.0 2024-03-28 10:27 UTC

README

Packagist Scrutinizer Code Quality Code Coverage Build Status Build Status

SensioLabsInsight

This library enables you to parse XML documents with SAX in Java style: instead of handling events by using these nasty functions and callbacks (see official PHP documentation example here), you can just inherit provided abstract class RunOpenCode\Sax\Handler\AbstractSaxHandler and implement all of its abstract methods.

Major benefit of using this library is clean, human-readable code.

Example:

class MySaxHandler extends RunOpenCode\Sax\Handler\AbstractSaxHandler {
    // ... your implementation 
}

$result = RunOpenCode\Sax\SaxParser::factory()->parse(new MySaxHandler(), $myXmlDocumentResource);

List of methods which you ought to implement:

  • onDocumentStart: executed when parsing started of XML document.
  • onElementStart: executed when parser stumbled upon new XML tag.
  • onElementData: executed when parser stumbled upon CDATA of some XML tag.
  • onElementEnd: executed when parser stumbled upon closed already opened XML tag.
  • onDocumentEnd: executed when parsing of XML document is done.
  • onParseError: executed when parsing error is triggered.
  • getResult: executed at very end of parsing process where you should provide a invoker with parsing results.

Since common usage of Sax parser is to have a stack of currently working elements, there is a prototype implementation of that as well in class RunOpenCode\Sax\Handler\AbstractStackedSaxHandler. It extends RunOpenCode\Sax\Handler\AbstractSaxHandler and provides you with possibility to get current working element via getCurrentElementName() as well as with stack size via getStackSize().

Important notes

  • Due to underlying implementation of PHP XML parser, all tag names in relevant event calls are provided uppercase. Per example, if you have tag <tag></tag>, in relevant event methods your check for tag name should be if ($name === 'TAG').
  • Event onParseError is due to unrecoverable parsing error, however, it is up to you and your use case weather you are going to trigger error continue with execution.
  • Event onElementData will trigger even if you have blank spaces only between tags in XML document.

SaxParser and StreamAdapterInterface

RunOpenCode\Sax\SaxParser is provided as utility class which ought to ease up your usage of your SaxHandler implementation. SaxHandler uses Psr\Http\Message\StreamInterface implementation as source of XML document for parsing, however, StreamAdapters can help you to work with various XML document sources, such as:

  • Resources (file resources or PHP native streams)
  • DOMDocument
  • SimpleXMLElement

If you need any other type of XML document source, you can provide it by implementing RunOpenCode\Sax\Contract\StreamAdapterInterface, and you can register it to RunOpenCode\Sax\SaxParser instance via SaxParser::addStreamAdapter() method call.

When you invoke SaxParser::parse(), before parsing, source of provided XML document will be checked against available adapters and converted to Psr\Http\Message\StreamInterface implementation.

This library recommends guzzlehttp/psr7 and uses it as default StreamInterface implementation, but you can use any other implementation that suits your need.

Changelog

February 21th, 2017.

  • BC break: Changed api, no more callback, invocation of parse() method should return parsing result.

February 10th, 2017.

  • Dropped support for PHP 5.x
  • Added PHPUnit 6.x as requirement
  • Added lib exceptions