cybercog / php-unicode
PHP Unicode library
Fund package maintenance!
Requires
- php: ^8.1
- ext-mbstring: *
Requires (Dev)
- phpstan/phpstan: ^2.1
- phpstan/phpstan-strict-rules: ^2.0
- phpunit/phpunit: ^10.5 || ^11.0 || ^12.0
Suggests
- ext-intl: Required for Grapheme and GraphemeString classes (grapheme cluster support)
- dev-master
- 2.0.0
- 1.0.1
- 1.0.0
- dev-remove-unused-grapheme-methods
- dev-replace-preg-split-with-mb-str-split
- dev-optimize-docker-dev
- dev-add-upgrading-guide
- dev-add-static-analysis
- dev-add-stringable-interface
- dev-fix-tests-namespace
- dev-improve-code-point-conversions
- dev-add-grapheme-support
- dev-add-php-newer-versions-tests
This package is auto-updated.
Last update: 2026-03-01 14:11:27 UTC
README
Introduction
Streamline Unicode strings, code points and grapheme clusters manipulations. Object oriented implementation.
The library provides two levels of abstraction:
- Code point level (
CodePoint,UnicodeString) — works with individual Unicode code points. Requiresext-mbstring. - Grapheme level (
Grapheme,GraphemeString) — works with user-perceived characters (grapheme clusters). Requiresext-intl.
Requirements
| Class | Required Extensions |
|---|---|
CodePoint |
ext-mbstring |
UnicodeString |
ext-mbstring |
Grapheme |
ext-mbstring, ext-intl |
GraphemeString |
ext-mbstring, ext-intl |
PHP 8.1 or higher is required.
Installation
Pull in the package through Composer.
composer require cybercog/php-unicode
For grapheme cluster support, install the intl PHP extension.
Usage
Code Point
$codePoint = \Cog\Unicode\CodePoint::of('ÿ'); $codePoint = \Cog\Unicode\CodePoint::ofDecimal(255); $codePoint = \Cog\Unicode\CodePoint::ofHexadecimal('U+00FF'); $codePoint = \Cog\Unicode\CodePoint::ofHtmlEntity('ÿ'); $codePoint = \Cog\Unicode\CodePoint::ofXmlEntity('ÿ');
Represent Code Point in any format
$codePoint = \Cog\Unicode\CodePoint::of('ÿ'); echo strval($codePoint); // (string) "ÿ" echo $codePoint->toDecimal(); // (int) 255 echo $codePoint->toHexadecimal(); // (string) "U+00FF" echo $codePoint->toHtmlEntity(); // (string) "ÿ" echo $codePoint->toXmlEntity(); // (string) "ÿ"
Unicode String (code point level)
$string = \Cog\Unicode\UnicodeString::of('Hello');
UnicodeString object will contain a list of code points.
For example, the Unicode string "Hello" is represented by the code points:
- U+0048 (H)
- U+0065 (e)
- U+006C (l)
- U+006C (l)
- U+006F (o)
echo strval($string); // (string) "Hello" $codePointList = $string->codePointList; // list<CodePoint>
Grapheme (grapheme cluster level)
Requires ext-intl.
$grapheme = \Cog\Unicode\Grapheme::of('👨👩👧👦'); echo strval($grapheme); // (string) "👨👩👧👦" $codePointList = $grapheme->codePointList; // list<CodePoint>
Grapheme String (grapheme cluster level)
Requires ext-intl.
$string = \Cog\Unicode\GraphemeString::of('Ае👨👩👧👦'); $graphemeList = $string->graphemeList; // list<Grapheme> // 'А', 'е', '👨👩👧👦' — 3 graphemes (not 9 code points) echo strval($string); // (string) "Ае👨👩👧👦"
Real-world examples
Convert a character to all supported formats
$codePoint = \Cog\Unicode\CodePoint::of('©'); echo $codePoint->toDecimal(); // 169 echo $codePoint->toHexadecimal(); // "U+00A9" echo $codePoint->toHtmlEntity(); // "©" echo $codePoint->toXmlEntity(); // "©"
Round-trip between entity formats
$cp = \Cog\Unicode\CodePoint::ofHtmlEntity('♥'); echo $cp->toXmlEntity(); // "♥" echo $cp->toDecimal(); // 9829 $cp2 = \Cog\Unicode\CodePoint::ofDecimal($cp->toDecimal()); echo strval($cp2); // "♥"
Inspect code points in a string
$string = \Cog\Unicode\UnicodeString::of('café'); foreach ($string->codePointList as $cp) { echo $cp->toHexadecimal() . ' '; } // U+0063 U+0061 U+0066 U+00E9
Code points vs. graphemes — why it matters
// Flag emoji: 2 code points, but 1 visible character $flag = \Cog\Unicode\UnicodeString::of('🇦🇶'); echo count($flag->codePointList); // 2 $flag = \Cog\Unicode\GraphemeString::of('🇦🇶'); echo count($flag->graphemeList); // 1 // Family emoji: 7 code points (persons + ZWJ), 1 visible character $family = \Cog\Unicode\GraphemeString::of('👨👩👧👦'); echo count($family->graphemeList); // 1 $familyGrapheme = $family->graphemeList[0]; echo count($familyGrapheme->codePointList); // 7
Detect combining marks
$acute = \Cog\Unicode\CodePoint::of("\u{0301}"); // combining acute accent echo $acute->isCombining(); // true $a = \Cog\Unicode\CodePoint::of('A'); echo $a->isCombining(); // false
Why this library?
PHP provides mb_* and grapheme_* functions, but they are procedural and return raw strings. This library wraps them in immutable, type-safe value objects with two key benefits:
- Two levels of abstraction.
CodePoint/UnicodeStringwork with individual Unicode code points.Grapheme/GraphemeStringwork with user-perceived characters (grapheme clusters). Choose the right level for your use case instead of mixingmb_strlenandgrapheme_strlencalls. - Format conversion.
CodePointconverts between character, decimal, hexadecimal (U+XXXX), HTML entity, and XML entity formats in a single object. No need to chainmb_ord,dechex,htmlentitiesmanually.
// Procedural $char = '©'; $dec = mb_ord($char); $hex = 'U+' . strtoupper(sprintf('%04X', $dec)); $html = htmlentities($char, ENT_HTML5 | ENT_QUOTES); // With this library $cp = \Cog\Unicode\CodePoint::of('©'); $dec = $cp->toDecimal(); $hex = $cp->toHexadecimal(); $html = $cp->toHtmlEntity();
License
PHP Unicodepackage is open-sourced software licensed under the MIT license by Anton Komarev.
About CyberCog
CyberCog is a Social Unity of enthusiasts. Research the best solutions in product & software development is our passion.
