ctw / ctw-middleware-tidy
This PSR-15 middleware formats, fixes and beautifies the HTML in the Response body using HTML Tidy.
Installs: 92
Dependents: 0
Suggesters: 0
Security: 0
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Language:HTML
pkg:composer/ctw/ctw-middleware-tidy
Requires
- php: ^8.3
- ext-tidy: *
- ctw/ctw-middleware: ^4.0
- psr/container: ^1.0 || ^2.0
Requires (Dev)
- ctw/ctw-qa: ^5.0
- phpunit/phpunit: ^12.0
- symfony/var-dumper: ^7.0
README
PSR-15 middleware that cleans, repairs, and formats HTML responses using PHP's Tidy extension for standards-compliant output.
Introduction
Why This Library Exists
HTML generated by template engines and dynamic content systems can contain markup errors, inconsistent formatting, and deprecated constructs. PHP's Tidy extension provides a powerful way to clean and repair HTML, ensuring browser compatibility and standards compliance.
This middleware applies Tidy processing to HTML responses with:
- Error correction: Fixes malformed HTML, missing tags, and improper nesting
- HTML5 compliance: Ensures valid HTML5 doctype and structure
- Compact output: Removes unnecessary whitespace while maintaining readability
- UTF-8 handling: Proper encoding for international character support
- Configurable behavior: Full control over Tidy's extensive options
Problems This Library Solves
- Malformed HTML: Template engines can produce invalid markup that breaks in some browsers
- Missing doctypes: Tidy's HTML5 mode sometimes strips the doctype; this middleware re-adds it
- Inconsistent whitespace: Source indentation creates bloated output
- Encoding issues: Mixed or incorrect character encoding causes display problems
- Legacy markup: Old HTML constructs need modernization for current standards
Where to Use This Library
- Production applications: Ensure all HTML output is valid and standards-compliant
- Legacy code modernization: Clean up HTML from older template systems
- CMS and blog platforms: Fix user-submitted HTML content
- API responses: Guarantee well-formed HTML fragments in API output
- Development debugging: Catch HTML errors before they reach production
Design Goals
- Standards compliance: Output valid HTML5 documents
- Safe processing: Returns original HTML if Tidy processing fails
- Doctype preservation: Re-adds HTML5 doctype when needed
- Configurable options: Full access to Tidy's configuration parameters
- Statistics tracking: Appends compression/change statistics as HTML comment
Requirements
- PHP 8.3 or higher
- ext-tidy (PHP Tidy extension)
- ctw/ctw-middleware ^4.0
Installation
Install by adding the package as a Composer requirement:
composer require ctw/ctw-middleware-tidy
Ensure the Tidy extension is enabled in your PHP configuration:
extension=tidy
Usage Examples
Basic Pipeline Registration (Mezzio)
use Ctw\Middleware\TidyMiddleware\TidyMiddleware; // In config/pipeline.php $app->pipe(TidyMiddleware::class);
ConfigProvider Registration
// config/config.php return [ // ... \Ctw\Middleware\TidyMiddleware\ConfigProvider::class, ];
Default Configuration
The middleware uses sensible defaults optimized for HTML5:
[
'char-encoding' => 'utf8',
'doctype' => 'html5',
'bare' => true,
'break-before-br' => true,
'indent' => false,
'indent-spaces' => 0,
'logical-emphasis' => true,
'numeric-entities' => true,
'quiet' => true,
'quote-ampersand' => false,
'tidy-mark' => false,
'uppercase-tags' => false,
'vertical-space' => false,
'wrap' => 10000,
'wrap-attributes' => false,
'write-back' => true,
]
Configuration Options
| Option | Default | Description |
|---|---|---|
char-encoding |
utf8 |
Character encoding for input/output |
doctype |
html5 |
Document type declaration |
bare |
true |
Strip Microsoft Office markup |
indent |
false |
Indent block elements |
indent-spaces |
0 |
Spaces per indent level |
wrap |
10000 |
Line wrap column (high value = minimal wrapping) |
tidy-mark |
false |
Don't add Tidy meta generator tag |
quiet |
true |
Suppress non-essential output |
Custom Configuration
Override defaults via factory configuration:
// config/autoload/tidy.global.php return [ 'tidy_middleware' => [ 'char-encoding' => 'utf8', 'doctype' => 'html5', 'indent' => true, 'indent-spaces' => 2, 'wrap' => 120, ], ];
Output Statistics
The middleware appends an HTML comment showing processing statistics:
<!-- html: in 15420 b | out 12336 b | diff 20.0000 % -->
| Field | Description |
|---|---|
in |
Original HTML size in bytes |
out |
Processed HTML size in bytes |
diff |
Size reduction percentage |
HTML5 Doctype Handling
Tidy may strip the HTML5 doctype during processing. This middleware automatically re-adds it when the doctype option is set to html5:
<!DOCTYPE html> <html> ... </html>
Selective Processing
The middleware automatically:
- Only processes responses with
Content-Type: text/htmlorapplication/xhtml - Passes through empty responses unchanged
- Returns original HTML if Tidy processing fails
- Skips non-HTML responses (JSON, images, etc.)