hexydec / htmldoc
A token based HTML document parser and minifier. Minify HTML documents including inline CSS, Javascript, and SVG's on the fly. Extract document text, attributes, and fragments. Full test suite.
Installs: 5 907
Dependents: 2
Suggesters: 0
Security: 0
Stars: 21
Watchers: 4
Forks: 3
Open Issues: 0
Requires
- php: >=8.0
- hexydec/cssdoc: 1.2.0
- hexydec/jslite: 1.0.2
- hexydec/tokenise: 1.0.1
Requires (Dev)
- phpstan/phpstan: ^1.10
- phpunit/phpunit: 10.1.2
README
A tokeniser based HTML document parser and minifier, written in PHP.
Description
An HTML parser, primarily designed for minifying HTML documents, it also enables the document structure to be queried allowing attribute and textnode values to be extracted.
The parser is designed around a tokeniser to make the document processing more reliable than regex based minifiers, which are a bit blunt and can be problematic if they match patterns in the wrong places.
The software is also capable of processing and minifying SVG documents.
Usage
To minify an HTML document:
use hexydec\html\htmldoc; $doc = new htmldoc(); // load from a variable if ($doc->load($html) { // minify the document $doc->minify(); // compile back to HTML echo $doc->save(); }
You can test out the minifier online at https://hexydec.com/apps/minify-html/, or run the supplied index.php
file after installation.
To extract data from an HTML document:
use hexydec\html\htmldoc; $doc = new htmldoc(); // load from a URL this time if ($doc->open($url) { // extract text $text = $doc->find('.article__body')->text(); // extract attribute $attr = $doc->find('.article__author-image')->attr('src'); // extract HTML $html = $doc->find('.article__body')->html(); }
Installation
The easiest way to get up and running is to use composer:
$ composer install hexydec/htmldoc
HTMLdoc requires \hexydec\token\tokenise to run, which you can install manually if not using composer. Optionally you can also install CSSdoc and JSlite to perform inline CSS and Javascript minification respectively.
All these dependencies will be installed through composer.
Test Suite
You can run the test suite like this:
Linux
$ vendor/bin/phpunit
Windows
> vendor\bin\phpunit
Documentation
- How it works
- How to use and examples
- API Reference
- Mitigating Side Effects of Minification
- About Document Recycling
- Object Performance
Support
HTMLdoc supports PHP version 8.0+.
Contributing
If you find an issue with HTMLdoc, please create an issue in the tracker.
If you wish to fix an issue yourself, please fork the code, fix the issue, then create a pull request, and I will evaluate your submission.
Licence
The MIT License (MIT). Please see License File for more information.