edent/pretty-print-html

Pretty Print HTML using PHP.

2025-04-18 2025-04-18 11:03 UTC

This package is auto-updated.

Last update: 2025-05-18 22:44:26 UTC


README

This takes hard-to-read HTML like:

<!doctype html><html><head><meta charset="UTF-8"></head><body><div class="news main"><h1 id="top">Title</h1><p>How <em>exciting</em>!</p></div>

And pretty-prints it:

<!doctype html>
<html>
	<head>
		<meta charset=UTF-8>
	</head>
	<body>
		<div class="main news">
			<h1 id=top>Title</h1>
			<p>How <em>exciting</em>!</p>
		</div>
	</body>
</html>

Functionality

  • All elements correctly indented
    • This will add extra whitespace. You may need to alter your CSS to collapse it.
    • Conversely, if you have layout significant whitespace outside of preserved elements, it may be lost.
  • Whitespace preserved in <p>, <pre>, <q> and <textarea> blocks.
    • Also preserved in deprecated elements <listing>, <plaintext>, <xmp>.
    • No indenting takes place inside these elements and their children.
    • Attribute manipulations still takes place on them and their children.
  • No internal modifications made to <style> and <script> contents
    • Excess whitespace from the start and end will be trimmed.
    • This does not sanity check or prettify CSS or JavaScript
  • Attribute values are unquoted unless quoting is required
    • Single attribute: class=big,
    • Multiple attributes: class="big title".
    • Empty attribute: alt.
    • Whitespace in values is preserved.
  • Attribute values will have newlines replaced with spaces
    • Except for id, title, and <textarea>'s placeholder
  • Attribute order will be alphabetically sorted
    • e.g. class=x height=1 id=x src=x width=2.
  • All Unicode characters are preserved
    • Certain spacing and newline characters may be transformed into HTML entities.
    • Attribute values will have internal " transformed to &quot;.
    • In text, < and > will become &lt; and &gt;.
  • Comments are preserved
    • No modification of comment contents.

Installation

This package requires PHP 8.4 or higher due to its dependency on the new Dom\HTMLDocument class.

You can install this package via Composer:

composer require edent/pretty-print-html

Or you can import it manually with:

require_once "path/to/src/PrettyPrintHtml.php";

Use

//  HTML as a string:
$html = "<div>This is <span> an <em>example</em>";
//  Or as a file:
$html = file_get_contents( "example.html" );

//  Turn the HTML into a Dom\HTMLDocument
$dom = \Dom\HTMLDocument::createFromString( $html, LIBXML_NOERROR, "UTF-8" );

//  Create the pretty printer
$formatter = new Edent\PrettyPrintHtml\PrettyPrintHtml();

//  Output the result
echo $formatter->serializeHtml( $dom );

Advanced Use

In order to be usefully customisable, serializeHtml() can take several named parameters.

These are:

  • keepWhitespace - boolean. Whether to keep the original whitespace. Default false.
    • Enabling this disables the indenting pretty-printer but keeps the attribute formatting.
  • alphabetise - boolean. Whether to alphabetise attributes. Default true.
    • Disabling this preserves the order of the attributes when they were created.
  • preserveElements - array. A list of elements where whitespace should be preserved.
    • By default, only the elements listed in $whitespace_sensitive_elements are preserved.
    • To preserve all elements, pass [""].
  • rawAttributes - boolean. Whether to strip " from attributes. Default true.
    • Disabling this means all attribute values will be quoted, including the null string.

For example, to disable attribute sorting and only preserve whitespace in the <p> and <li> elements, call:

$formatter->serializeHtml(
	node: $dom, 
	alphabetise: false,
	preserveElements: ["p", "li"]
);

To keep the original whitespace and ensure all attributes are quoted:

$formatter->serializeHtml(
	node: $dom, 
	rawAttributes: false,
	keepWhitespace: true
);

Development and Testing

  1. Clone the repository

    git clone https://gitlab.com/edent/pretty-print-html-using-php.git
    cd pretty-print-html-using-php
    
  2. Install dependencies

    composer install
    
  3. Run the tests

    ./vendor/bin/phpunit
    

Philosophy

As was written long ago:

A computer language is not just a way of getting a computer to perform operations but rather … it is a novel formal medium for expressing ideas about methodology. Thus, programs must be written for people to read, and only incidentally for machines to execute.

PHP's new Dom\HTMLDocument class produces syntactically valid HTML code. The code is very easy for a computer to parse. But because there is no indenting, the code is difficult for a human to parse.

Adding newlines and indents before every new element can introduce spacing errors when the HTML is rendered to screen. Some of these can be fixed with extra CSS, some cannot

This pretty-printer attempts to strike a balance between code which is readable for humans whether it is rendered on screen or viewed as source code.

Why is human readability so important?

As Ana Rodrigues said:

Today's heavily optimized websites have largely killed the "view source" learning experience. The code is minified, bundled, and often incomprehensible to beginners trying to understand how things work. […] I want anyone, regardless of skill level, to inspect elements, understand the structure, and learn from readable code. And I am fully aware my code isn’t perfect. It’s old and there’s a lot of room for improvement.

Using this pretty printer should give you and your users an excellent "view source" experience, without sacrificing the browser's ability to render the code.