akankov / html-ast
nikic/php-parser for HTML — an immutable HTML5 AST with positions, trivia, visitors, and a fidelity printer
Requires
- php: 8.3.* || 8.4.* || 8.5.*
- ext-dom: *
- ext-libxml: *
- ext-mbstring: *
Requires (Dev)
- friendsofphp/php-cs-fixer: ^3.65
- phan/phan: ^6.0
- phpstan/phpstan: ^2.1
- phpunit/phpunit: ^12.0
- rector/rector: ^2.0
Suggests
- masterminds/html5: ^2.9 — required by Akankov\HtmlAst\Parser\MastermindsParser as the PHP 8.3 fallback parser. Not needed when running on PHP 8.4+, where the native \Dom\HTMLDocument backend is used.
README
nikic/php-parserfor HTML. A spec-compliant HTML5 abstract syntax tree for PHP, with byte-range positions, trivia preservation, an immutable visitor framework, and a fidelity printer.
⚠️ 0.x is unstable. The public API shape may change in any minor release until the package reaches
1.0. The 1.0 commitment lands only after the API has been frozen in production for at least two months.
Why this exists
PHP 8.4 ships native HTML5 parsing through \Dom\HTMLDocument (lexbor under
the hood). That collapses a decade of fragmented PHP HTML tooling — but
\Dom\HTMLDocument is a great parser, not a great AST for transformation
work. It has four gaps that html-ast fills:
- Byte-range positions on every node. Required for linters, formatters,
and source maps.
\Dom\HTMLDocumentexposes none. - Trivia preservation (whitespace between attributes, comment positions, attribute quoting style). Required for round-trip fidelity. The native serializer drops it.
- An immutable visitor framework with
enterNode/leaveNodeand sentinel return values for replace, remove, and stop. Familiar to anyone who has written anikic/php-parservisitor. - A fidelity printer.
StandardPrinterproduces normalized output;LosslessPrinter(v0.2) round-trips trivia exactly.
It also bridges PHP 8.3 (via a masterminds/html5 adapter) and PHP 8.4+
(via the native parser) behind a single Parser interface, so consumers do
not need to branch.
Install
composer require akankov/html-ast
PHP 8.3 users additionally need:
composer require masterminds/html5:^2.9
(On PHP 8.4+ the native \Dom\HTMLDocument backend is used and there are
zero runtime dependencies.)
30-second example
use Akankov\HtmlAst\Parser\Parser; use Akankov\HtmlAst\Node\Element; use Akankov\HtmlAst\Visitor\Visitor; use Akankov\HtmlAst\Visitor\NodeTraverser; use Akankov\HtmlAst\Visitor\VisitorAction; use Akankov\HtmlAst\Printer\StandardPrinter; $result = Parser::detect()->parse($html); $stripTestIds = new class implements Visitor { public function enterNode(\Akankov\HtmlAst\Node\Node $n): VisitorAction|\Akankov\HtmlAst\Node\Node|null { if ($n instanceof Element && $n->hasAttribute('data-testid')) { return $n->withoutAttribute('data-testid'); } return null; } public function leaveNode(\Akankov\HtmlAst\Node\Node $n): VisitorAction|\Akankov\HtmlAst\Node\Node|null { return null; } }; $tree = (new NodeTraverser())->traverse($result->tree, [$stripTestIds]); $output = (new StandardPrinter())->print($tree);
Status
akankov/html-ast is in the M0 design phase — the public API shape is
being resolved at docs/design/api-v0.1.md. All
implementation classes currently throw \LogicException so type checkers
pass while the algorithms are being written. Track the progress on the
milestones page.
Documentation
- API design —
docs/design/api-v0.1.md. - Algorithm lineage —
CREDITS.md. - Contributing —
CONTRIBUTING.md. - Security —
SECURITY.md. - Changelog —
CHANGELOG.md.
License
MIT, see LICENSE.