A low-level string character analysis library for PHP.

v0.1.1 2021-02-16 06:26 UTC

This package is auto-updated.

Last update: 2024-12-06 06:58:04 UTC


README


PHP Runes

Latest Stable Version Latest Unstable Version PHP from Packagist composer.lock available license

CI Workflow Maintenance Packagist

Runes

A low-level string character analysis library for PHP.

See compart unicode documentation for useful information about unicode characters that PHP does not handle yet.

Features:

  • Per-character (multi-byte aware) analysis;
  • Script detection, differentiating look-a-like characters (see IDN homograph attack);
  • Character encoding detection & conversion;
  • Various serialization formats;

Setup

Instructions on how to set this repository up for use in your own project, or as a develper contributing to this one.

Requirements

There are not many requirements for this library. All of them are host machine related.

Installation

Use composer and your autoloader.

composer require jordanbrauer/runes

For Contributors:

Clone the repository and install the development tools to begin running tests for your features & bug fixes.

git clone https://github.com/jordanbrauer/runes.git \
  && cd ./runes \
  && composer install;

Usage

Using the library is super simple. For a quick example, let's analyze the ancient, yet strangely familiar, from the Elder Futhark writing system!

use Rune\Rune;

$rune = new Rune('');

dump($rune->toJson());

Would output the following data about the glyph.

{
  "bidirectionalClass": "L",
  "binary": "111000011001101110010010",
  "blockCode": 35,
  "bytes": 1,
  "category": "Lo",
  "codepoint": "U+16D2",
  "combiningClass": 0,
  "decimal": 14785426,
  "encoding": "UTF-8",
  "glyph": "",
  "hex": "e19b92",
  "isMirrored": false,
  "name": "RUNIC LETTER BERKANAN BEORC BJARKAN B",
  "script": "Runic",
  "utf16": "0x16D2",
  "utf32": "0x000016D2",
  "utf8": "0xE1 0x9B 0x92",
  "version": "3.0.0.0"
}

Motivation

Unicode is awesome. However, it can be the source of much pain for programmers. This tool aims to help alleviate said pain by providing a low-level/generic API that allows you to focus on the problem without getting bogged down with UTF-8 and it's cousins.

Project Name

Originally, the name of this project was UTFH8; a tongue-in-cheek word-play on the UTF-8 encoding algorithm and the (very strong) English word, "hate", by using the number eight in place of the letters "a-t-e", insinutating that at some point in every developer's career, they will say – "I hate unicode".

Now, the project has been renamed to Runes, inspired by Go's rune type, and the Elder Futhark – something less harsh & sinister sounding.