tecnickcom / tc-lib-unicode
PHP library containing Unicode methods
Fund package maintenance!
Requires
- php: >=8.1
- ext-mbstring: *
- ext-pcre: *
- tecnickcom/tc-lib-unicode-data: ^2.0
Requires (Dev)
- pdepend/pdepend: ^2.16
- phpcompatibility/php-compatibility: ^10.0.0@dev
- phpmd/phpmd: ^2.15
- phpunit/phpunit: ^13.1 || ^12.5 || ^11.5 || ^10.5
- squizlabs/php_codesniffer: ^4.0
- dev-main
- 2.1.2
- 2.1.1
- 2.1.0
- 2.0.52
- 2.0.50
- 2.0.49
- 2.0.47
- 2.0.46
- 2.0.45
- 2.0.43
- 2.0.42
- 2.0.41
- 2.0.40
- 2.0.39
- 2.0.38
- 2.0.37
- 2.0.36
- 2.0.35
- 2.0.34
- 2.0.32
- 2.0.31
- 2.0.30
- 2.0.29
- 2.0.28
- 2.0.27
- 2.0.25
- 2.0.24
- 2.0.23
- 2.0.22
- 2.0.20
- 2.0.19
- 2.0.18
- 2.0.17
- 2.0.16
- 2.0.15
- 2.0.14
- 2.0.13
- 2.0.12
- 2.0.11
- 2.0.10
- 2.0.8
- 2.0.7
- 2.0.6
- 1.4.33
- 1.4.32
- 1.4.31
- 1.4.29
- 1.4.28
- 1.4.27
- 1.4.26
- 1.4.25
- 1.4.23
- 1.4.22
- 1.4.21
- 1.4.20
- 1.4.19
- 1.4.18
- 1.4.17
- 1.4.16
- 1.4.15
- 1.4.14
- 1.4.13
- 1.4.12
- 1.4.11
- 1.4.10
- 1.4.9
- 1.4.8
- 1.4.7
- 1.4.6
- 1.4.5
- 1.4.4
- 1.4.1
- 1.4.0
- 1.3.11
- 1.3.10
- 1.3.9
- 1.3.8
- 1.3.7
- 1.3.6
- 1.3.5
- 1.3.4
- 1.3.3
- 1.3.2
- 1.3.0
- 1.2.2
- 1.2.1
- 1.2.0
- 1.1.2
- 1.1.1
- 1.1.0
- 1.0.21
- 1.0.20
- 1.0.19
- 1.0.18
- 1.0.17
- 1.0.16
- 1.0.15
- 1.0.14
- 1.0.13
- 1.0.12
- 1.0.11
- 1.0.10
- 1.0.9
- 1.0.8
- 1.0.7
- 1.0.6
- 1.0.5
- 1.0.4
- 1.0.3
- 1.0.2
- 1.0.1
- 1.0.0
This package is auto-updated.
Last update: 2026-05-01 19:06:30 UTC
README
UTF-8 and Unicode processing utilities, including bidirectional text handling.
If this project is useful to you, please consider supporting development via GitHub Sponsors.
Overview
tc-lib-unicode provides Unicode conversion helpers and bidirectional algorithm support for robust multilingual text processing.
It is built to handle multilingual text paths where normalization, code-point handling, and bidirectional ordering directly affect rendering quality. By isolating Unicode-heavy operations, dependent libraries can keep text processing accurate and easier to audit.
| Namespace | \Com\Tecnick\Unicode |
| Author | Nicola Asuni info@tecnick.com |
| License | GNU LGPL v3 - see LICENSE |
| API docs | https://tcpdf.org/docs/srcdoc/tc-lib-unicode |
| Packagist | https://packagist.org/packages/tecnickcom/tc-lib-unicode |
Features
Unicode Utilities
- UTF-8 character and ordinal conversion helpers
- String/character array transformations
- Integration-ready conversion methods for document engines
Bidirectional Support
- Unicode Bidirectional Algorithm implementation
- Right-to-left and mixed-direction text processing
- Supporting shaping/step logic for complex scripts
Character Substitution
- Context-sensitive codepoint-level substitution via
Substitution::replaceChars() - Thai — repositions leading vowels (Sara E/AE/O/AI, U+0E40–U+0E44, U+0E4D) to follow their base consonant, matching PDF visual-order glyph streams
- Devanagari — moves left-positional matras (U+093F) to precede their base consonant cluster, including conjuncts joined by Virama (U+094D)
- Hangul — composes Hangul Jamo sequences (U+1100–U+11FF, U+A960–U+A97F, U+D7B0–U+D7FF) into precomposed syllables (U+AC00–U+D7A3) per Unicode Standard §3.12
Requirements
- PHP 8.1 or later
- Extensions:
mbstring,pcre - Composer
Installation
composer require tecnickcom/tc-lib-unicode
Quick Start
<?php require_once __DIR__ . '/vendor/autoload.php'; $bidi = new \Com\Tecnick\Unicode\Bidi('hello ', null, null, 'R', false); echo $bidi->getString();
Character substitution
Substitution::replaceChars() takes an array of Unicode codepoints and returns a transformed array with script-specific substitutions applied. It is a pure codepoint-level transform with no font or PDF dependency.
<?php require_once __DIR__ . '/vendor/autoload.php'; $sub = new \Com\Tecnick\Unicode\Substitution(); // Thai: leading vowel repositioned after its base consonant // Logical order: [U+0E40 SARA E, U+0E01 KO KAI] // Visual order: [U+0E01 KO KAI, U+0E40 SARA E] $result = $sub->replaceChars([0x0E40, 0x0E01]); // $result === [0x0E01, 0x0E40] // Devanagari: left matra repositioned before its base consonant cluster // Logical order: [U+0915 KA, U+093F VOWEL SIGN I] // Visual order: [U+093F VOWEL SIGN I, U+0915 KA] $result = $sub->replaceChars([0x0915, 0x093F]); // $result === [0x093F, 0x0915] // Hangul: Jamo composed into a precomposed syllable // [U+1100 KIYEOK, U+1161 JUNGSEONG A, U+11A8 JONGSEONG KIYEOK] → [U+AC01 각] $result = $sub->replaceChars([0x1100, 0x1161, 0x11A8]); // $result === [0xAC01]
Supported scripts and Unicode ranges
| Script | Unicode range(s) | Transformation |
|---|---|---|
| Thai | U+0E00–U+0E7F | Leading vowels repositioned after base consonant |
| Devanagari | U+0900–U+097F | Left matras repositioned before consonant cluster |
| Hangul Jamo | U+1100–U+11FF, U+A960–U+A97F, U+D7B0–U+D7FF | Jamo composed to precomposed syllables (U+AC00–U+D7A3) |
Codepoints belonging to unsupported scripts are passed through unchanged.
Development
make deps
make help
make qa
Packaging
make rpm make deb
For system packages, bootstrap with:
require_once '/usr/share/php/Com/Tecnick/Unicode/autoload.php';
Contributing
Contributions are welcome. Please review CONTRIBUTING.md, CODE_OF_CONDUCT.md, and SECURITY.md.
Contact
Nicola Asuni - info@tecnick.com