sinnbeck / html-ast
Create an AST from a html string
Requires
- php: ^8.2
Requires (Dev)
- laravel/pint: ^v1.22.0
- pestphp/pest: ^3.0.0
- symfony/var-dumper: ^7.2.0
This package is auto-updated.
Last update: 2025-04-14 11:14:39 UTC
README
An HTML AST (Abstract Syntax Tree) parser written in PHP.
Inspired by the AST parser in TempestPHP (by Brett Roose), this library provides a built-in lexer to tokenize HTML strings, an AST parser to convert tokens into a tree structure, and a printer to output well-formatted (indented) HTML.
Note: This package requires PHP 8.2 or higher.
Table of Contents
Features
- Built-in Lexer: Tokenizes raw HTML input.
- AST Parser: Converts tokenized HTML into an Abstract Syntax Tree for easier analysis and manipulation.
- HTML Printer: Renders the AST back into properly indented HTML code.
Requirements
- PHP version 8.2 or later.
- Composer (for installation via Packagist).
Installation
You can install html-ast via Composer. From your project root, run:
composer require sinnbeck/html-ast
Alternatively, if you prefer to clone the repository directly:
git clone https://github.com/sinnbeck/html-ast.git
cd html-ast
composer install
Usage
The package is organized into three main components: the Lexer, the AST Parser, and the Printer. Below are basic examples of how to use each.
Lexing
The lexer tokenizes an HTML string. Tokens represent the smallest meaningful elements of the HTML (such as tags, attributes, and text).
use Sinnbeck\HtmlAst\Lexer\Lexer; // Provide your HTML string $html = '<div class="container"><p>Hello, world!</p></div>'; // Create a Lexer instance from the string $lexer = Lexer::fromString($html); // Lex the HTML string into tokens $tokens = $lexer->lex(); // Optionally, inspect the tokens: print_r($tokens);
Parsing
The AST parser converts the token list into a tree structure, where each node represents an HTML element, text node, or comment.
use Sinnbeck\HtmlAst\Ast\Parser; // Create an AST parser instance with the tokens from the lexer $ast = Parser::make($tokens); // Parse tokens into an AST (node tree) $nodes = $ast->parse(); // Optionally, inspect the node tree: print_r($nodes);
Printing
The printer takes an HTML input or the resulting AST and renders it as neatly formatted HTML. This is useful for ensuring consistent formatting after transformations.
use Sinnbeck\HtmlAst\Printer; // Create a Printer instance and render the HTML string echo Printer::make($nodes)->render();
If you need to indent all lines by a certain level, you can easily do so.
use Sinnbeck\HtmlAst\Printer; // Indents everything by 1 extra indentation level echo Printer::make($nodes)->render(1);
By default, the output is indented with 4 spaces.
This can be easily changed by calling ->withIndent()
use Sinnbeck\HtmlAst\Printer; // Indents with tab instead of 4 spaces echo Printer::make($nodes)->indentWith("\t")->render();
Testing
The repository includes tests under the tests
directory, using Pest PHP as the testing framework and Symfony's VarDumper for debugging. To run tests, execute:
composer test
This command runs all tests to ensure the lexing, parsing, and printing functionalities work as expected.
Todo
- Add line numbers to tokens (Lexer)
- Introduce an HTML validator to ensure that the HTML structure conforms to expected standards
- Implement a node visitor pattern to allow modification or transformation of the AST
Contributing
Contributions to html-ast are welcome. If you would like to contribute, please follow these steps:
- Fork the repository.
- Create a feature branch:
git checkout -b feature/your-feature-name
- Make your changes and add tests.
- Format all files:
./vendor/bin/pint`
- Commit your changes:
git commit -am 'Add new feature'
- Push the branch:
git push origin feature/your-feature-name
- Open a pull request explaining your changes.
Please adhere to the coding standards and test all changes before submitting a pull request.
License
This project is licensed under the MIT License