edent / pretty-print-html
Pretty Print HTML using PHP.
Requires
- php: >=8.4
Requires (Dev)
- phpunit/phpunit: ^10.0
This package is auto-updated.
Last update: 2025-05-18 22:44:26 UTC
README
This takes hard-to-read HTML like:
<!doctype html><html><head><meta charset="UTF-8"></head><body><div class="news main"><h1 id="top">Title</h1><p>How <em>exciting</em>!</p></div>
And pretty-prints it:
<!doctype html>
<html>
<head>
<meta charset=UTF-8>
</head>
<body>
<div class="main news">
<h1 id=top>Title</h1>
<p>How <em>exciting</em>!</p>
</div>
</body>
</html>
Functionality
- All elements correctly indented
- This will add extra whitespace. You may need to alter your CSS to collapse it.
- Conversely, if you have layout significant whitespace outside of preserved elements, it may be lost.
- Whitespace preserved in
<p>
,<pre>
,<q>
and<textarea>
blocks.- Also preserved in deprecated elements
<listing>
,<plaintext>
,<xmp>
. - No indenting takes place inside these elements and their children.
- Attribute manipulations still takes place on them and their children.
- Also preserved in deprecated elements
- No internal modifications made to
<style>
and<script>
contents- Excess whitespace from the start and end will be trimmed.
- This does not sanity check or prettify CSS or JavaScript
- Attribute values are unquoted unless quoting is required
- Single attribute:
class=big
, - Multiple attributes:
class="big title"
. - Empty attribute:
alt
. - Whitespace in values is preserved.
- Single attribute:
- Attribute values will have newlines replaced with spaces
- Except for
id
,title
, and<textarea>
'splaceholder
- Except for
- Attribute order will be alphabetically sorted
- e.g.
class=x height=1 id=x src=x width=2
.
- e.g.
- All Unicode characters are preserved
- Certain spacing and newline characters may be transformed into HTML entities.
- Attribute values will have internal
"
transformed to"
. - In text,
<
and>
will become<
and>
.
- Comments are preserved
- No modification of comment contents.
Installation
This package requires PHP 8.4 or higher due to its dependency on the new Dom\HTMLDocument
class.
You can install this package via Composer:
composer require edent/pretty-print-html
Or you can import it manually with:
require_once "path/to/src/PrettyPrintHtml.php";
Use
// HTML as a string:
$html = "<div>This is <span> an <em>example</em>";
// Or as a file:
$html = file_get_contents( "example.html" );
// Turn the HTML into a Dom\HTMLDocument
$dom = \Dom\HTMLDocument::createFromString( $html, LIBXML_NOERROR, "UTF-8" );
// Create the pretty printer
$formatter = new Edent\PrettyPrintHtml\PrettyPrintHtml();
// Output the result
echo $formatter->serializeHtml( $dom );
Advanced Use
In order to be usefully customisable, serializeHtml()
can take several named parameters.
These are:
keepWhitespace
- boolean. Whether to keep the original whitespace. Defaultfalse
.- Enabling this disables the indenting pretty-printer but keeps the attribute formatting.
alphabetise
- boolean. Whether to alphabetise attributes. Defaulttrue
.- Disabling this preserves the order of the attributes when they were created.
preserveElements
- array. A list of elements where whitespace should be preserved.- By default, only the elements listed in
$whitespace_sensitive_elements
are preserved. - To preserve all elements, pass
[""]
.
- By default, only the elements listed in
rawAttributes
- boolean. Whether to strip"
from attributes. Default true.- Disabling this means all attribute values will be quoted, including the null string.
For example, to disable attribute sorting and only preserve whitespace in the <p>
and <li>
elements, call:
$formatter->serializeHtml(
node: $dom,
alphabetise: false,
preserveElements: ["p", "li"]
);
To keep the original whitespace and ensure all attributes are quoted:
$formatter->serializeHtml(
node: $dom,
rawAttributes: false,
keepWhitespace: true
);
Development and Testing
Clone the repository
git clone https://gitlab.com/edent/pretty-print-html-using-php.git cd pretty-print-html-using-php
Install dependencies
composer install
Run the tests
./vendor/bin/phpunit
Philosophy
A computer language is not just a way of getting a computer to perform operations but rather … it is a novel formal medium for expressing ideas about methodology. Thus, programs must be written for people to read, and only incidentally for machines to execute.
PHP's new Dom\HTMLDocument
class produces syntactically valid HTML code. The code is very easy for a computer to parse. But because there is no indenting, the code is difficult for a human to parse.
Adding newlines and indents before every new element can introduce spacing errors when the HTML is rendered to screen. Some of these can be fixed with extra CSS, some cannot
This pretty-printer attempts to strike a balance between code which is readable for humans whether it is rendered on screen or viewed as source code.
Why is human readability so important?
Today's heavily optimized websites have largely killed the "view source" learning experience. The code is minified, bundled, and often incomprehensible to beginners trying to understand how things work. […] I want anyone, regardless of skill level, to inspect elements, understand the structure, and learn from readable code. And I am fully aware my code isn’t perfect. It’s old and there’s a lot of room for improvement.
Using this pretty printer should give you and your users an excellent "view source" experience, without sacrificing the browser's ability to render the code.