yarri / utf8-cleaner
Removes invalid UTF-8 characters from the given text
Installs: 2 539
Dependents: 0
Suggesters: 0
Security: 0
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Requires
- php: >=5.3.0
Requires (Dev)
- atk14/tester: *
- atk14/translate: ^1.2
This package is auto-updated.
Last update: 2024-12-12 20:21:08 UTC
README
Removes invalid UTF-8 byte sequences from the given text.
Utf8Cleaner is inspired by https://stackoverflow.com/questions/1401317/remove-non-utf8-characters-from-string
Usage
Consider you have a string with an illegal UTF-8 byte sequence.
$invalid_char = chr(200).chr(200); // invalid byte sequence for UTF-8
$malformed_text = "Příliš žluťoučk$invalid_char kůň";
$text = \Yarri\Utf8Cleaner::Clean($malformed_text);
By default, each invalid byte sequence is replaced with � (i.e. a black diamond with a white question mark - REPLACEMENT CHARACTER used to replace an unknown, unrecognized or unrepresentable character, U+FFFD).
echo $text; // "Příliš žluťoučk� kůň"
The default replacement can be overridden by an option.
$text = \Yarri\Utf8Cleaner::Clean($malformed_text,["replacement" => "?"]);
// or
$text = \Yarri\Utf8Cleaner::Clean($malformed_text,"?");
echo $text; // "Příliš žluťoučk? kůň"
Installation
The best way how to install Utf8Cleaner is to use the Composer:
composer require yarri/utf8-cleaner
Testing
composer update --dev
./vendor/bin/run_unit_tests test/
License
Utf8Cleaner is free software distributed under the terms of the MIT license