soloterm / grapheme
A PHP package to measure the width of unicode strings rendered to a terminal.
Requires
- php: ^8.1
- symfony/polyfill-intl-grapheme: ^1.27.0
- symfony/polyfill-intl-normalizer: ^1.27.0
- symfony/polyfill-mbstring: ^1.27.0
Requires (Dev)
- phpunit/phpunit: ^10.5|^11
Suggests
- ext-intl: For best performance
README
A highly optimized PHP library for calculating the display width of Unicode graphemes in terminal environments. Accurately determine how many columns a character will occupy in the terminal, including complex emoji, combining marks, and more. It also provides full-string and chunked grapheme segmentation so downstream renderers can share the same Unicode boundary logic.
This library was built to support Solo, your all-in-one Laravel command to tame local development.
Why Use This Library?
Building CLI applications can be challenging when it comes to handling modern Unicode text:
- Emoji and CJK characters take up 2 cells in most terminals
- Zero-width characters (joiners, marks, etc.) don't affect layout but can cause width calculation errors
- Complex text like emoji with skin tones or flags require special handling
- PHP's built-in functions don't fully address these edge cases
This library solves these problems by providing an accurate, performant, and thoroughly tested way to determine the display width of any character or grapheme cluster.
Installation
composer require soloterm/grapheme
Usage
use SoloTerm\Grapheme\Grapheme; // Basic characters (width: 1) Grapheme::wcwidth('a'); // Returns: 1 Grapheme::wcwidth('ะฏ'); // Returns: 1 // East Asian characters (width: 2) Grapheme::wcwidth('ๆ'); // Returns: 2 Grapheme::wcwidth('ใ'); // Returns: 2 // Emoji (width: 2) Grapheme::wcwidth('๐'); // Returns: 2 Grapheme::wcwidth('๐'); // Returns: 2 // Complex emoji with modifiers (width: 2) Grapheme::wcwidth('๐๐ป'); // Returns: 2 Grapheme::wcwidth('๐จโ๐ฉโ๐งโ๐ฆ'); // Returns: 2 // Zero-width characters (width: 0) Grapheme::wcwidth("\u{200B}"); // Returns: 0 (Zero-width space) // Characters with combining marks (width: 1) Grapheme::wcwidth('รฉ'); // Returns: 1 Grapheme::wcwidth("e\u{0301}"); // Returns: 1 (e + combining acute) // Special cases Grapheme::wcwidth("โ \u{FE0E}"); // Returns: 1 (Warning sign in text presentation) Grapheme::wcwidth("โ \u{FE0F}"); // Returns: 2 (Warning sign in emoji presentation) // Empty string (width: 0) Grapheme::wcwidth(''); // Returns: 0
Segmentation
// Split a full string into grapheme clusters Grapheme::split("e\u{0301}"); // Returns: ["eฬ"] Grapheme::split("\u{2764}\u{FE0F}"); // Returns: ["โค๏ธ"] Grapheme::split('๐จโ๐ฉโ๐งโ๐ฆ'); // Returns: ["๐จโ๐ฉโ๐งโ๐ฆ"] Grapheme::split('ๆA'); // Returns: ['ๆ', 'A']
Streaming / Chunked Segmentation
splitChunk() preserves the trailing grapheme in carry so boundaries remain correct when text arrives in arbitrary
byte chunks. Pass an empty chunk to flush the final completed grapheme at end-of-input. Invalid UTF-8 bytes are
preserved as single-byte segments instead of throwing.
$carry = ''; $graphemes = []; foreach (["e", "\u{0301}"] as $chunk) { $result = Grapheme::splitChunk($carry, $chunk); $graphemes = [...$graphemes, ...$result['graphemes']]; $carry = $result['carry']; } $result = Grapheme::splitChunk($carry, ''); $graphemes = [...$graphemes, ...$result['graphemes']]; // ["eฬ"]
Cache Management
Results are cached automatically for performance. For long-running processes, you can manage the cache:
// Clear the cache to free memory Grapheme::clearCache(); // Set maximum cache size (default: 10,000) // Cache auto-clears when this limit is exceeded Grapheme::setMaxCacheSize(5000);
Features
- Highly optimized for performance with byte-level fast paths and smart caching
- Memory safe for long-running processes with configurable cache limits
- Full-string and streaming segmentation with a single source of truth for grapheme boundaries
- Comprehensive Unicode support including:
- CJK (Chinese, Japanese, Korean) characters
- Emoji (including skin tone modifiers, gender modifiers, flags)
- Zero-width characters and control codes
- Combining marks and accents
- Regional indicators and flags
- Variation selectors
- Carefully tested against a wide range of Unicode characters and streaming boundary cases (200+ assertions)
- Minimal dependencies - only requires PHP 8.1+ and an optional intl extension
- Compatible with most terminal environments
Terminal Compatibility
This library aims to match the behavior of wcwidth() in modern terminal emulators.
Requirements
- PHP 8.1 or higher
- The
symfony/polyfill-intl-grapheme,symfony/polyfill-mbstring, andsymfony/polyfill-intl-normalizerpackages are included as dependencies - The
ext-intlextension is recommended for best performance
Under the Hood
The library uses a series of optimized patterns and checks to accurately determine character width:
- Byte-level fast paths - Single-byte ASCII, CJK (UTF-8 0xE4-0xE9), and emoji (UTF-8 0xF0 0x9F) are detected by examining raw bytes, avoiding expensive regex operations
- Smart caching - Results are cached with automatic size limiting to prevent memory growth in long-running processes
- Best-available Unicode segmentation - Valid UTF-8 text is segmented with native grapheme functions first, with regex fallback only if that backend is unavailable
- Chunk-safe UTF-8 handling - Streaming segmentation preserves incomplete UTF-8 suffixes and the trailing grapheme in
carry - Special handling for complex scripts like Devanagari, emoji variation selectors, and invisible joiners
Performance benchmarks show ~1.6M uncached calls/sec and ~12M cached calls/sec on modern hardware.
Testing
composer test
composer benchmark
The test suite includes 200+ assertions covering extensive Unicode scenarios including ASCII, CJK, emoji, zero-width characters, variation selectors, complex ZWJ sequences, and chunked segmentation boundaries. Please feel free to add more.
Contributing
Contributions are welcome! Please feel free to submit a pull request.
License
The MIT License (MIT).
Support
This is free! If you want to support me:
- Check out my courses:
- Help spread the word about things I make
Related Projects
- Solo - All-in-one Laravel command for local development
- Screen - Pure PHP terminal renderer
- Dumps - Laravel command to intercept dumps
- Notify - PHP package for desktop notifications via OSC escape sequences
- Notify Laravel - Laravel integration for soloterm/notify
- TNotify - Standalone, cross-platform CLI for desktop notifications
- VTail - Vendor-aware tail for Laravel logs
Credits
Solo was developed by Aaron Francis. If you like it, please let me know!
- Twitter: https://twitter.com/aarondfrancis
- Website: https://aaronfrancis.com
- YouTube: https://youtube.com/@aarondfrancis
- GitHub: https://github.com/aarondfrancis/solo