akankov/html-ast

nikic/php-parser for HTML — an immutable HTML5 AST with positions, trivia, visitors, and a fidelity printer

Maintainers

Package info

github.com/akankov/html-ast

pkg:composer/akankov/html-ast

Statistics

Installs: 0

Dependents: 0

Suggesters: 0

Stars: 0

Open Issues: 1

v0.0.1 2026-05-06 17:38 UTC

This package is auto-updated.

Last update: 2026-05-07 06:44:05 UTC


README

nikic/php-parser for HTML. A spec-compliant HTML5 abstract syntax tree for PHP, with byte-range positions, trivia preservation, an immutable visitor framework, and a fidelity printer.

CI Latest Version PHP Version License: MIT

⚠️ 0.x is unstable. The public API shape may change in any minor release until the package reaches 1.0. The 1.0 commitment lands only after the API has been frozen in production for at least two months.

Why this exists

PHP 8.4 ships native HTML5 parsing through \Dom\HTMLDocument (lexbor under the hood). That collapses a decade of fragmented PHP HTML tooling — but \Dom\HTMLDocument is a great parser, not a great AST for transformation work. It has four gaps that html-ast fills:

  1. Byte-range positions on every node. Required for linters, formatters, and source maps. \Dom\HTMLDocument exposes none.
  2. Trivia preservation (whitespace between attributes, comment positions, attribute quoting style). Required for round-trip fidelity. The native serializer drops it.
  3. An immutable visitor framework with enterNode / leaveNode and sentinel return values for replace, remove, and stop. Familiar to anyone who has written a nikic/php-parser visitor.
  4. A fidelity printer. StandardPrinter produces normalized output; LosslessPrinter (v0.2) round-trips trivia exactly.

It also bridges PHP 8.3 (via a masterminds/html5 adapter) and PHP 8.4+ (via the native parser) behind a single Parser interface, so consumers do not need to branch.

Install

composer require akankov/html-ast

PHP 8.3 users additionally need:

composer require masterminds/html5:^2.9

(On PHP 8.4+ the native \Dom\HTMLDocument backend is used and there are zero runtime dependencies.)

30-second example

use Akankov\HtmlAst\Parser\Parser;
use Akankov\HtmlAst\Node\Element;
use Akankov\HtmlAst\Visitor\Visitor;
use Akankov\HtmlAst\Visitor\NodeTraverser;
use Akankov\HtmlAst\Visitor\VisitorAction;
use Akankov\HtmlAst\Printer\StandardPrinter;

$result = Parser::detect()->parse($html);

$stripTestIds = new class implements Visitor {
    public function enterNode(\Akankov\HtmlAst\Node\Node $n): VisitorAction|\Akankov\HtmlAst\Node\Node|null
    {
        if ($n instanceof Element && $n->hasAttribute('data-testid')) {
            return $n->withoutAttribute('data-testid');
        }
        return null;
    }

    public function leaveNode(\Akankov\HtmlAst\Node\Node $n): VisitorAction|\Akankov\HtmlAst\Node\Node|null
    {
        return null;
    }
};

$tree   = (new NodeTraverser())->traverse($result->tree, [$stripTestIds]);
$output = (new StandardPrinter())->print($tree);

Status

akankov/html-ast is in the M0 design phase — the public API shape is being resolved at docs/design/api-v0.1.md. All implementation classes currently throw \LogicException so type checkers pass while the algorithms are being written. Track the progress on the milestones page.

Documentation

License

MIT, see LICENSE.