ideil / binary-to-text-php
Collection of binary-to-text encoding utilities for PHP. Includes Base32 support and much more.
Requires
- php: >=5.2.14
This package is auto-updated.
Last update: 2024-10-23 05:39:43 UTC
README
For now, the only class in this repository is Base2n.
Base2n is for binary-to-text conversion with arbitrary encoding schemes that represent binary data in a base 2n notation. It can handle non-standard variants of many standard encoding schemes such as Base64 and Base32. Many binary-to-text encoding schemes use a fixed number of bits of binary data to generate each encoded character. Such schemes generalize to a single algorithm, implemented here.
Binary-to-text encoding is usually used to represent data in a notation that is safe for transport over text-based protocols, and there are several other practical uses. See the examples below.
Basic Base2n Usage
With Base2n, you define your encoding scheme parametrically. Let's instantiate a Base32 encoder:
// RFC 4648 base32 alphabet; case-insensitive $base32 = new Base2n(5, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ234567', FALSE, TRUE, TRUE); $encoded = $base32->encode('encode this'); // MVXGG33EMUQHI2DJOM======
Constructor Parameters
integer $bitsPerCharacter
Required. The number of bits to use for each encoded character; 1–8. The most practical range is 1–6. The encoding's radix is a power of 2:2^$bitsPerCharacter
.- base-2, binary
- base-4, quaternary
- base-8, octal
- base-16, hexadecimal
- base-32
- base-64
- base-128
- base-256
-
string $chars
This string specifies the base alphabet. Must be2^$bitsPerCharacter
long. Default:0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ-_
-
boolean $caseSensitive
To decode in a case-sensitive manner. Default:FALSE
-
boolean $rightPadFinalBits
How to encode the last character when the bits remaining are fewer than$bitsPerCharacter
. WhenTRUE
, the bits to encode are placed in the most significant position of the final group of bits, with the lower bits set to0
. WhenFALSE
, the final bits are placed in the least significant position. For RFC 4648 encodings,$rightPadFinalBits
should beTRUE
. Default:FALSE
-
boolean $padFinalGroup
It's common to encode characters in groups. For example, Base64 (which is based on 6 bits per character) converts 3 raw bytes into 4 encoded characters. If insufficient bytes remain at the end, the final group will be padded with=
to complete a group of 4 characters, and the encoded length is always a multiple of 4. Although the information provided by the padding is redundant, some programs rely on it for decoding; Base2n does not. Default:FALSE
-
string $padCharacter
When$padFinalGroup
isTRUE
, this is the pad character used. Default:=
encode()
Parameters
string $rawString
Required. The data to be encoded.
decode()
Parameters
string $encodedString
Required. The string to be decoded.boolean $strict
WhenTRUE
,NULL
will be returned if$encodedString
contains an undecodable character. WhenFALSE
, unknown characters are simply ignored. Default:FALSE
Examples
PHP does not provide any Base32 encoding functions. By setting $bitsPerCharacter
to 5 and specifying your desired alphabet in $chars
, you can handle any variant of Base32:
// RFC 4648 base32 alphabet; case-insensitive $base32 = new Base2n(5, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ234567', FALSE, TRUE, TRUE); $encoded = $base32->encode('encode this'); // MVXGG33EMUQHI2DJOM======
// RFC 4648 base32hex alphabet $base32hex = new Base2n(5, '0123456789ABCDEFGHIJKLMNOPQRSTUV', FALSE, TRUE, TRUE); $encoded = $base32hex->encode('encode this'); // CLN66RR4CKG78Q39EC======
Octal notation:
$octal = new Base2n(3); $encoded = $octal->encode('encode this'); // 312671433366214510072150322711
A convenient way to go back and forth between binary notation and its real binary representation:
$binary = new Base2n(1); $encoded = $binary->encode('encode this'); // 0110010101101110011000110110111101100100011001010010000001110100011010000110100101110011 $decoded = $binary->decode($encoded); // encode this
PHP uses a proprietary binary-to-text encoding scheme to generate session identifiers from random hash digests. The most efficient way to store these session IDs in a database is to decode them back to their raw hash digests. PHP's encoding scheme is configured with the session.hash_bits_per_character
php.ini setting. The decoded size depends on the hash function, set with session.hash_function
in php.ini.
// session.hash_function = 0 // session.hash_bits_per_character = 5 // 128-bit session ID $sessionId = 'q3c8n4vqpq11i0vr6ucmafg1h3'; // Decodes to 16 bytes $phpBase32 = new Base2n(5, '0123456789abcdefghijklmnopqrstuv'); $rawSessionId = $phpBase32->decode($sessionId);
// session.hash_function = 1 // session.hash_bits_per_character = 6 // 160-bit session ID $sessionId = '7Hf91mVc,q-9W1VndNNh3evVN83'; // Decodes to 20 bytes $phpBase64 = new Base2n(6, '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ-,'); $rawSessionId = $phpBase64->decode($sessionId);
Generate random security tokens:
$tokenEncoder = new Base2n(6, '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ-,'); $binaryToken = openssl_random_pseudo_bytes(32); // PHP >= 5.3 $token = $tokenEncoder->encode($binaryToken); // Example: U6M132v9FG-AHhBVaQWOg1gjyUi1IogNxuen0i3u3ep
The rest of these examples are probably more fun than they are practical.
We can encode arbitrary data with a 7-bit encoding. (Note that this is not the same as the 7bit MIME content-transfer-encoding.)
// This uses all 7-bit ASCII characters $base128chars = "\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0A\x0B\x0C\x0D\x0E\x0F" . "\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1A\x1B\x1C\x1D\x1E\x1F" . "\x20\x21\x22\x23\x24\x25\x26\x27\x28\x29\x2A\x2B\x2C\x2D\x2E\x2F" . "\x30\x31\x32\x33\x34\x35\x36\x37\x38\x39\x3A\x3B\x3C\x3D\x3E\x3F" . "\x40\x41\x42\x43\x44\x45\x46\x47\x48\x49\x4A\x4B\x4C\x4D\x4E\x4F" . "\x50\x51\x52\x53\x54\x55\x56\x57\x58\x59\x5A\x5B\x5C\x5D\x5E\x5F" . "\x60\x61\x62\x63\x64\x65\x66\x67\x68\x69\x6A\x6B\x6C\x6D\x6E\x6F" . "\x70\x71\x72\x73\x74\x75\x76\x77\x78\x69\x7A\x7B\x7C\x7D\x7E\x7F"; $base128 = new Base2n(7, $base128chars); $encoded = $base128->encode('encode this');
The following encoding guarantees that the most significant bit is set for every byte:
// "High" base-128 encoding $high128chars = "\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8A\x8B\x8C\x8D\x8E\x8F" . "\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9A\x9B\x9C\x9D\x9E\x9F" . "\xA0\xA1\xA2\xA3\xA4\xA5\xA6\xA7\xA8\xA9\xAA\xAB\xAC\xAD\xAE\xAF" . "\xB0\xB1\xB2\xB3\xB4\xB5\xB6\xB7\xB8\xB9\xBA\xBB\xBC\xBD\xBE\xBF" . "\xC0\xC1\xC2\xC3\xC4\xC5\xC6\xC7\xC8\xC9\xCA\xCB\xCC\xCD\xCE\xCF" . "\xD0\xD1\xD2\xD3\xD4\xD5\xD6\xD7\xD8\xD9\xDA\xDB\xDC\xDD\xDE\xDF" . "\xE0\xE1\xE2\xE3\xE4\xE5\xE6\xE7\xE8\xE9\xEA\xEB\xEC\xED\xEE\xEF" . "\xF0\xF1\xF2\xF3\xF4\xF5\xF6\xF7\xF8\xF9\xFA\xFB\xFC\xFD\xFE\xFF"; $high128 = new Base2n(7, $high128chars); $encoded = $high128->encode('encode this');
Let's create an encoding using exclusively non-printable control characters!
// Base-32 non-printable character encoding $noPrintChars = "\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0A\x0B\x0C\x0D\x0E\x0F" . "\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1A\x1B\x1C\x1D\x1E\x1F"; $nonPrintable32 = new Base2n(5, $noPrintChars); $encoded = $nonPrintable32->encode('encode this');
Why not encode data using only whitespace? Here's a base-4 encoding using space, tab, new line, and carriage return:
// Base-4 whitespace encoding $whitespaceChars = " \t\n\r"; $whitespace = new Base2n(2, $whitespaceChars); $encoded = $whitespace->encode('encode this'); // "\t\n\t\t\t\n\r\n\t\n \r\t\n\r\r\t\n\t \t\n\t\t \n \t\r\t \t\n\n \t\n\n\t\t\r \r" $decoded = $whitespace->decode( "\t\n\t\t\t\n\r\n\t\n \r\t\n\r\r\t\n\t \t\n\t\t \n \t\r\t \t\n\n \t\n\n\t\t\r \r" ); // encode this
Counterexamples
Base2n is not slow, but it will never outperform an encoding function implemented in C. When one exists, use it instead.
PHP provides the base64_encode()
and base64_decode()
functions, and you should always use them for standard Base64. When you need to use a modified alphabet, you can translate the encoded output with strtr()
or str_replace()
.
A common variant of Base64 is modified for URLs and filenames, where +
and /
are replaced with -
and _
, and the =
padding is omitted. It's better to handle this variant with native PHP functions:
// RFC 4648 base64url with Base2n... $base64url = new Base2n(6, 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_', TRUE, TRUE, FALSE); $encoded = $base64url->encode("encode this \xBF\xC2\xBF"); // ZW5jb2RlIHRoaXMgv8K_ // RFC 4648 base64url with native functions... $encoded = str_replace(array('+', '/', '='), array('-', '_', ''), base64_encode("encode this \xBF\xC2\xBF")); // ZW5jb2RlIHRoaXMgv8K_
Native functions get slightly more cumbersome when every position in the alphabet has changed, as seen in this example of decoding a Bcrypt hash:
// Decode the salt and digest from a Bcrypt hash $hash = '$2y$14$i5btSOiulHhaPHPbgNUGdObga/GC.AVG/y5HHY1ra7L0C9dpCaw8u'; $encodedSalt = substr($hash, 7, 22); $encodedDigest = substr($hash, 29, 31); // Using Base2n... $bcrypt64 = new Base2n(6, './ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789', TRUE, TRUE); $rawSalt = $bcrypt64->decode($encodedSalt); // 16 bytes $rawDigest = $bcrypt64->decode($encodedDigest); // 23 bytes // Using native functions... $bcrypt64alphabet = './ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789'; $base64alphabet = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/'; $rawSalt = base64_decode(strtr($encodedSalt, $bcrypt64alphabet, $base64alphabet)); // 16 bytes $rawDigest = base64_decode(strtr($encodedDigest, $bcrypt64alphabet, $base64alphabet)); // 23 bytes
You can encode and decode hexadecimal with bin2hex()
and pack()
:
// Hexadecimal with Base2n... $hexadecimal = new Base2n(4); $encoded = $hexadecimal->encode('encode this'); // 656e636f64652074686973 $decoded = $hexadecimal->decode($encoded); // encode this // It's better to use native functions... $encoded = bin2hex('encode this'); // 656e636f64652074686973 $decoded = pack('H*', $encoded); // encode this // As of PHP 5.4 you can use hex2bin() instead of pack()