Html5 stream tokenizer/reader (not using libxml)
HtmlReader is a very simple Html Parser NOT build on libxml. It is thought as replacement for XMLReader which won't parse html5 input data properly. It is faster than DOM and won't change a single whitespace.
It won't care about properly closed Elements etc. so you can / have to do it your own.
Use Composer to install the Package from Packagist.com:
composer require html5/htmlreader
$reader = new HtmlReader(); $reader->loadHtml("input.html") // $reader->loadHtmlString("<html></html>"); $reader->setHandler(new HtmlCallback()); // <-- Write your own HtmlCallback $reader->parse();
We have packed a DebugHtmlCallback Handler.
- Added Support for Namespaces
Written by Matthias Leuffen http://leuffen.de