ctw/ctw-middleware-tidy

This PSR-15 middleware formats, fixes and beautifies the HTML in the Response body using HTML Tidy.

Installs: 92

Dependents: 0

Suggesters: 0

Security: 0

Stars: 1

Watchers: 1

Forks: 0

Open Issues: 0

Language:HTML

pkg:composer/ctw/ctw-middleware-tidy

4.0.4 2025-11-25 09:11 UTC

README

Latest Stable Version GitHub Actions Scrutinizer Build Scrutinizer Quality Code Coverage

PSR-15 middleware that cleans, repairs, and formats HTML responses using PHP's Tidy extension for standards-compliant output.

Introduction

Why This Library Exists

HTML generated by template engines and dynamic content systems can contain markup errors, inconsistent formatting, and deprecated constructs. PHP's Tidy extension provides a powerful way to clean and repair HTML, ensuring browser compatibility and standards compliance.

This middleware applies Tidy processing to HTML responses with:

  • Error correction: Fixes malformed HTML, missing tags, and improper nesting
  • HTML5 compliance: Ensures valid HTML5 doctype and structure
  • Compact output: Removes unnecessary whitespace while maintaining readability
  • UTF-8 handling: Proper encoding for international character support
  • Configurable behavior: Full control over Tidy's extensive options

Problems This Library Solves

  1. Malformed HTML: Template engines can produce invalid markup that breaks in some browsers
  2. Missing doctypes: Tidy's HTML5 mode sometimes strips the doctype; this middleware re-adds it
  3. Inconsistent whitespace: Source indentation creates bloated output
  4. Encoding issues: Mixed or incorrect character encoding causes display problems
  5. Legacy markup: Old HTML constructs need modernization for current standards

Where to Use This Library

  • Production applications: Ensure all HTML output is valid and standards-compliant
  • Legacy code modernization: Clean up HTML from older template systems
  • CMS and blog platforms: Fix user-submitted HTML content
  • API responses: Guarantee well-formed HTML fragments in API output
  • Development debugging: Catch HTML errors before they reach production

Design Goals

  1. Standards compliance: Output valid HTML5 documents
  2. Safe processing: Returns original HTML if Tidy processing fails
  3. Doctype preservation: Re-adds HTML5 doctype when needed
  4. Configurable options: Full access to Tidy's configuration parameters
  5. Statistics tracking: Appends compression/change statistics as HTML comment

Requirements

  • PHP 8.3 or higher
  • ext-tidy (PHP Tidy extension)
  • ctw/ctw-middleware ^4.0

Installation

Install by adding the package as a Composer requirement:

composer require ctw/ctw-middleware-tidy

Ensure the Tidy extension is enabled in your PHP configuration:

extension=tidy

Usage Examples

Basic Pipeline Registration (Mezzio)

use Ctw\Middleware\TidyMiddleware\TidyMiddleware;

// In config/pipeline.php
$app->pipe(TidyMiddleware::class);

ConfigProvider Registration

// config/config.php
return [
    // ...
    \Ctw\Middleware\TidyMiddleware\ConfigProvider::class,
];

Default Configuration

The middleware uses sensible defaults optimized for HTML5:

[
    'char-encoding'    => 'utf8',
    'doctype'          => 'html5',
    'bare'             => true,
    'break-before-br'  => true,
    'indent'           => false,
    'indent-spaces'    => 0,
    'logical-emphasis' => true,
    'numeric-entities' => true,
    'quiet'            => true,
    'quote-ampersand'  => false,
    'tidy-mark'        => false,
    'uppercase-tags'   => false,
    'vertical-space'   => false,
    'wrap'             => 10000,
    'wrap-attributes'  => false,
    'write-back'       => true,
]

Configuration Options

Option Default Description
char-encoding utf8 Character encoding for input/output
doctype html5 Document type declaration
bare true Strip Microsoft Office markup
indent false Indent block elements
indent-spaces 0 Spaces per indent level
wrap 10000 Line wrap column (high value = minimal wrapping)
tidy-mark false Don't add Tidy meta generator tag
quiet true Suppress non-essential output

Custom Configuration

Override defaults via factory configuration:

// config/autoload/tidy.global.php
return [
    'tidy_middleware' => [
        'char-encoding' => 'utf8',
        'doctype'       => 'html5',
        'indent'        => true,
        'indent-spaces' => 2,
        'wrap'          => 120,
    ],
];

Output Statistics

The middleware appends an HTML comment showing processing statistics:

<!-- html: in 15420 b | out 12336 b | diff 20.0000 % -->
Field Description
in Original HTML size in bytes
out Processed HTML size in bytes
diff Size reduction percentage

HTML5 Doctype Handling

Tidy may strip the HTML5 doctype during processing. This middleware automatically re-adds it when the doctype option is set to html5:

<!DOCTYPE html>
<html>
...
</html>

Selective Processing

The middleware automatically:

  • Only processes responses with Content-Type: text/html or application/xhtml
  • Passes through empty responses unchanged
  • Returns original HTML if Tidy processing fails
  • Skips non-HTML responses (JSON, images, etc.)