webfactory / html5-tagrewriter
A small library that uses a handler pattern to transform HTML documents, based on the PHP 8.4+ HTML5 parser and DOM extension
Installs: 36
Dependents: 1
Suggesters: 0
Security: 0
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
pkg:composer/webfactory/html5-tagrewriter
Requires
- php: >= 8.4
- ext-dom: *
Requires (Dev)
- phpunit/phpunit: ^12.5.6
This package is auto-updated.
Last update: 2026-01-26 10:27:45 UTC
README
A small library that uses a handler pattern to transform HTML documents. Based on the PHP 8.4+ HTML5 parser and DOM extension.
Useful to make manipulations to HTML5 documents that may not be so easy when generating the HTML output (e.g. a template engine like Twig), but are rather trivial when looking at the final DOM.
Examples:
- Add
target="_blank"andrel="noopener"to all external links - Find all
<img>in a page that have adata-creditsattribute, and place all credits information in a section in the page footer - Find all headings within the
<main>section of the page, generate a table of contents with anchor links and place it at the beginning of the page
Usage
Basic Usage
use Webfactory\Html5TagRewriter\Implementation\Html5TagRewriter; $rewriter = new Html5TagRewriter(); // Process a complete HTML5 document $html = '<!DOCTYPE html><html><body><p>Hello</p></body></html>'; $result = $rewriter->process($html); // Process an HTML fragment $fragment = '<p>Hello <strong>World</strong></p>'; $result = $rewriter->processBodyFragment($fragment);
Note
The processBodyFragment() method is currently limited in that it can only process
HTML strings that come from within the <body> section. This has to do with the
HTML 5 parsing rules defining different parsing states,
and the PHP DOM API for the HTML 5 parser does currently not expose
a (documented) way to create fragments and passing the required context information.
For correct results, you should limit its usage to fragments that shall be processed
starting in the in body parsing state and where the data state tokenization mode
is active.
Creating a Handler
Implement the RewriteHandler interface or extend BaseRewriteHandler to create custom tag transformations.
The BaseRewriteHandler provides empty default implementations, so you only need to override the methods you need:
use Dom\Element; use Webfactory\Html5TagRewriter\Handler\BaseRewriteHandler; class ExternalLinkHandler extends BaseRewriteHandler { public function appliesTo(): string { // XPath expression to match elements // Use 'html:' prefix for HTML5 elements, 'svg:' for SVG and 'mathml:' for MathML return '//html:a[@href]'; } public function match(Element $element): void { $href = $element->getAttribute('href'); if (str_starts_with($href, 'http')) { $element->setAttribute('target', '_blank'); $element->setAttribute('rel', 'noopener'); } } }
Registering Handlers
$rewriter = new Html5TagRewriter(); $rewriter->register(new ExternalLinkHandler()); $rewriter->register(new AnotherHandler()); $result = $rewriter->process($html);
XPath Namespaces
The following namespaces are pre-registered for XPath queries:
| Prefix | Namespace URI |
|---|---|
html |
http://www.w3.org/1999/xhtml |
svg |
http://www.w3.org/2000/svg |
mathml |
http://www.w3.org/1998/Math/MathML |
ESI Tag Support
The library handles Edge Side Includes (ESI) tags, converting empty ESI tags to self-closing format:
// Input '<esi:include src="url"></esi:include>' // Output '<esi:include src="url" />'
Credits, Copyright and License
This library is based on internal work that we have been using at webfactory GmbH, Bonn, at least since 2012. However, that (old) code was written with the legacy PHP DOM extension, leading to several quirks in HTML processing and requiring the use of Polyglot HTML 5 which is processable as XML.
Copyright 2026 webfactory GmbH, Bonn. Code released under the MIT license.