jordanbrauer / runes
A low-level string character analysis library for PHP.
Requires
- php: >=7.1 || >=8.0
- ext-intl: *
Requires (Dev)
- ergebnis/composer-normalize: ^2.13
- friendsofphp/php-cs-fixer: ^2.18
- pestphp/pest: ^1.0
- phpunit/phpunit: ^9.3
- symfony/var-dumper: ^5.2
This package is auto-updated.
Last update: 2025-01-06 07:14:11 UTC
README
Runes
A low-level string character analysis library for PHP.
See compart unicode documentation for useful information about unicode characters that PHP does not handle yet.
Features:
- Per-character (multi-byte aware) analysis;
- Script detection, differentiating look-a-like characters (see IDN homograph attack);
- Character encoding detection & conversion;
- Various serialization formats;
Setup
Instructions on how to set this repository up for use in your own project, or as a develper contributing to this one.
Requirements
There are not many requirements for this library. All of them are host machine related.
- PHP
>= 7.1
; - PHP Intl extension;
- libicu;
Installation
Use composer and your autoloader.
composer require jordanbrauer/runes
For Contributors:
Clone the repository and install the development tools to begin running tests for your features & bug fixes.
git clone https://github.com/jordanbrauer/runes.git \ && cd ./runes \ && composer install;
Usage
Using the library is super simple. For a quick example, let's analyze the ancient, yet strangely familiar, ᛒ
from the Elder Futhark writing system!
use Rune\Rune; $rune = new Rune('ᛒ'); dump($rune->toJson());
Would output the following data about the glyph.
{ "bidirectionalClass": "L", "binary": "111000011001101110010010", "blockCode": 35, "bytes": 1, "category": "Lo", "codepoint": "U+16D2", "combiningClass": 0, "decimal": 14785426, "encoding": "UTF-8", "glyph": "ᛒ", "hex": "e19b92", "isMirrored": false, "name": "RUNIC LETTER BERKANAN BEORC BJARKAN B", "script": "Runic", "utf16": "0x16D2", "utf32": "0x000016D2", "utf8": "0xE1 0x9B 0x92", "version": "3.0.0.0" }
Motivation
Unicode is awesome. However, it can be the source of much pain for programmers. This tool aims to help alleviate said pain by providing a low-level/generic API that allows you to focus on the problem without getting bogged down with UTF-8 and it's cousins.
Project Name
Originally, the name of this project was UTFH8; a tongue-in-cheek word-play on the UTF-8 encoding algorithm and the (very strong) English word, "hate", by using the number eight in place of the letters "a-t-e", insinutating that at some point in every developer's career, they will say – "I hate unicode".
Now, the project has been renamed to Runes, inspired by Go's rune type, and the Elder Futhark – something less harsh & sinister sounding.