covaleski/data-encoding

Data encoding and decoding.

v1.0.0 2023-10-22 23:04 UTC

This package is auto-updated.

Last update: 2024-05-23 00:40:01 UTC


README

License: MIT

PHP complete implementation of RFC 4648. Provides methods for data encoding and decoding in multiple bases - including custom encoding.

Uses bitwise operations to provide faster processing than string functions would do.

Note that both base16 and base64 encoding styles are already natively supported by PHP through base64_encode/base64_decode and bin2hex/hex2bin functions.

1 Installation

composer require covaleski/data-encoding

2 Usage

The following encoding styles - defined in RFC 4648 - are natively provided by this library:

  • base16:
    • Encoding with case-insensitive 16 character alphabet;
    • Available through Base16 class;
  • base32:
    • Encoding with case-insensitive 32 character alphabet;
    • Available through Base32 class;
  • base32hex:
    • Base32 with base16 extended alphabet;
    • Available through Base32Hex class;
  • base64:
    • Encoding with case-sensitive 64 character alphabet;
    • Available through Base64 class;
  • base64url:
    • Base64 with URL/filename safe character alphabet.
    • Available through Base64Url class;

You can easily extend the Encoder class to create custom encoding styles. See 2.3 Creating custom encodings.

2.1 Encoding

Encoding is available through the encode() static method in all encoder classes listed above.

use Covaleski\DataEncoding\Base32;

$encoded = Base32::encode('Dunder Mifflin Paper Company, Inc.');
// Will produce: IR2W4ZDFOIQE22LGMZWGS3RAKBQXAZLSEBBW63LQMFXHSLBAJFXGGLQ=

2.2 Decoding

Decoding is available through the decode() static method in all encoder classes listed above.

use Covaleski\DataEncoding\Base32;

$decoded = Base32::decode('4WGZPZF2VTULPLY=');
// Will produce: 南京路

2.3 Creating custom encodings

For custom encoding, you must extend the Encoder class and set the following properties (all static):

  • int $base:
    • Required;
    • Is the number of characters the alphabet should have;
    • Must be a power of 2;
  • array $alphabet:
    • Required;
    • Is a list with all the alphabet's characters;
    • Must have unique characters only;
    • Is accessed with indexes generated by the bits of input data;
  • bool $isCaseSensitive:
    • Required;
    • Defines whether the alphabet is case-sensitive;
    • Example: if not case-sensitive, decoding MZXW6=== and mzxw6=== would generate identical results;
  • string $alphabetEncoding:
    • Optional - defaults to "ASCII";
    • Is the encoding system used by the alphabet characters;
    • Non-ASCII alphabets MUST have their encoding explicitly defined;
    • Is used to build and split the encoded string;
  • string $paddingCharacter:
    • Optional - defaults to "=";
    • Is used to pad encoded strings;
    • Must not exist in the alphabet.

Example with ASCII alphabet:

class Base4 extends Encoder
{
    protected static array $alphabet = [
        'A', 'B', 'C', 'D',
    ];
    protected static int $base = 4;
    protected static bool $isCaseSensitive = true;
}

$encoded = Base8Emoji::encode('foo');
// Will produce: BCBCBCDDBCDD

Example with Unicode alphabet:

use Covaleski\DataEncoding\Encoder;

class Base8Emoji extends Encoder
{
    protected static array $alphabet = [
        '😎', '😔', '😳', '💀', '🤡', '👀', '✊', '💅',
    ];
    protected static int $base = 8;
    protected static string $alphabetEncoding = 'UTF-8';
    protected static bool $isCaseSensitive = false;
}

$encoded = Base8Emoji::encode('foo');
// Will produce: 💀😔🤡✊💅👀👀💅

3 Testing

Tests were made with PHPUnit. Use the following command to run all of them.

./vendor/bin/phpunit