kit-jotform / php-ftfy
Fixes text for you — PHP port of the Python ftfy library
Requires
- php: >=8.1
- ext-intl: *
- ext-mbstring: *
Requires (Dev)
- phpunit/phpunit: ^11.0
- squizlabs/php_codesniffer: ^4.0
README
A PHP 8.1+ text-fixing library based on the Python ftfy library (version 6.3.1) by Robyn Speer.
use Ftfy\Ftfy; echo Ftfy::fixText("(ง'⌣')ง"); // (ง'⌣')ง
What it does
ftfy fixes mojibake — text that was encoded in UTF-8 but decoded as something else (Windows-1252, Latin-1, etc.), producing garbled characters.
use Ftfy\Ftfy; // Fix common mojibake Ftfy::fixText('âœ" No problems'); // ✔ No problems // Fix multiple layers of mojibake Ftfy::fixText('The Mona Lisa doesn’t have eyebrows.'); // "The Mona Lisa doesn't have eyebrows." // Fix HTML entities outside of HTML Ftfy::fixText('PÉREZ'); // PÉREZ // Correctly-decoded text is left unchanged Ftfy::fixText('IL Y MARQUÉ…'); // IL Y MARQUÉ…
Installing
composer require kit-jotform/php-ftfy
Requirements: PHP >= 8.1, ext-mbstring, ext-intl
Usage
Ftfy::fixText(string $text, ?TextFixerConfig $config = null): string
Fix all encoding issues in a string.
use Ftfy\Ftfy; $fixed = Ftfy::fixText('Ã\xa0 perturber la réflexion'); // à perturber la réflexion
Ftfy::fixEncoding(string $text): string
Fix only encoding/mojibake issues, without applying other text fixes.
$fixed = Ftfy::fixEncoding("l'humanité"); // l'humanité
Ftfy::needsFix(string $text, ?TextFixerConfig $config = null): bool
Fast dry-run that checks whether text needs fixing without performing corrections. Use as a gate before fixText() on hot paths — 10-26x faster depending on input.
use Ftfy\Ftfy; if (Ftfy::needsFix($text)) { $text = Ftfy::fixText($text); } // Clean text exits almost instantly Ftfy::needsFix('Hello world'); // false Ftfy::needsFix('Héllo wörld'); // false // Detects all fixable issues Ftfy::needsFix('schön'); // true (mojibake) Ftfy::needsFix('& test'); // true (HTML entity) Ftfy::needsFix("\u{201C}test"); // true (curly quotes)
Respects TextFixerConfig — disabled fixers are skipped:
$config = new TextFixerConfig(uncurlQuotes: false); Ftfy::needsFix("\u{201C}test", $config); // false
Ftfy::fixAndExplain(string $text, ?TextFixerConfig $config = null): array
Returns ['text' => string, 'explanation' => array] with the fixed text and a list of changes made.
[$fixed, $explanation] = array_values(Ftfy::fixAndExplain('âœ" No problems')); // $fixed => '✔ No problems' // $explanation => [['name' => 'fix_encoding', 'cost' => 1, ...]]
Configuration
use Ftfy\Ftfy; use Ftfy\TextFixerConfig; $config = new TextFixerConfig( unescapeHtml: 'auto', // 'auto', true, or false — decode HTML entities removeTerminalEscapes: true, // strip ANSI terminal escape sequences fixEncoding: true, // fix mojibake restoreByteA0: true, // restore byte 0xA0 as non-breaking space replaceLossySequences: true, // replace lossy codec sequences decodeInconsistentUtf8: true, // decode inconsistent UTF-8 fixC1Controls: true, // fix C1 control characters fixLatinLigatures: true, // expand Latin ligatures (fi → fi) fixCharacterWidth: true, // normalize fullwidth characters uncurlQuotes: true, // straighten curly quotes (' " → ' ") fixLineBreaks: true, // normalize line breaks to \n fixSurrogates: true, // fix surrogate characters removeControlChars: true, // remove control characters normalization: 'NFC', // Unicode normalization form (NFC, NFD, NFKC, NFKD, or null) ); $fixed = Ftfy::fixText($garbled, $config);
Use $config->with(uncurlQuotes: false) to produce a modified copy.
Note on large inputs: Internally, regex matching uses chunked processing for inputs larger than 8 KB to avoid hitting PCRE backtracking/recursion limits. No configuration is needed — this is handled automatically.
Command-line usage
A CLI script is included at bin/ftfy.
Fix a string directly:
php bin/ftfy "schön" # schön
Pipe from stdin:
echo "Hello & world" | php bin/ftfy # Hello & world
Fix a file:
php bin/ftfy --file input.txt
Show what was fixed (explanation goes to stderr):
php bin/ftfy --explain "schön" # schön # # explanation: # - encode: sloppy-windows-1252 # - decode: utf-8
Check if text needs fixing (exit code 1 = needs fix):
php bin/ftfy --needs-fix "schön" # true php bin/ftfy --needs-fix "schön" # false
Override config options with -c key=value (repeatable):
php bin/ftfy -c uncurlQuotes=false "It\u2019s great"
php bin/ftfy -c normalization=NFKC -c fixLineBreaks=false --file input.txt
Install globally (optional):
ln -s "$(pwd)/bin/ftfy" /usr/local/bin/ftfy ftfy "schön"
Options:
| Option | Short | Description |
|---|---|---|
--explain |
-e |
Print what was fixed (to stderr) |
--needs-fix |
-n |
Print true/false; exit 0 if no fix needed, 1 if fix needed |
--file |
-f |
Read input from a file |
--config key=val |
-c |
Set a TextFixerConfig option (repeatable) |
--help |
-h |
Show help |
Boolean config keys accept true/false/1/0: uncurlQuotes, fixEncoding, fixLineBreaks, fixSurrogates, removeControlChars, removeTerminalEscapes, restoreByteA0, replaceLossySequences, decodeInconsistentUtf8, fixC1Controls, fixLatinLigatures, fixCharacterWidth. String keys: unescapeHtml (auto/true/false), normalization (NFC/NFKC/null), maxDecodeLength (integer).
Running tests
composer install vendor/bin/phpunit tests/
Credits
- Original Python library: ftfy by Robyn Speer, licensed under Apache 2.0
- PHP port licensed under MIT