kynx / code-utils
Utilities for generating PHP code
Requires
- php: ~8.2.0 || ~8.3.0 || ~8.4.0
- ext-intl: *
Requires (Dev)
- laminas/laminas-coding-standard: ^3.0
- phpunit/phpunit: ^10.5.37
- psalm/plugin-phpunit: ^0.19.0
- vimeo/psalm: ^5.26
README
Utilities for generating PHP code.
Normalizers
The normalizers generate readable PHP labels (class names, namespaces, property names, etc) from valid UTF-8 strings, transliterating them to ASCII and spelling out any invalid characters.
Usage
The following code (forgive the Japanese - a certain translation tool tells me it means "Pet Store"):
<?php use Kynx\Code\Normalizer\ClassNameNormalizer; $normalizer = new ClassNameNormalizer('Controller'); $namespace = $normalizer->normalize('ペット \ ショップ'); echo $namespace;
outputs:
Petto\Shoppu
and:
<?php use Kynx\Code\Normalizer\PropertyNameNormalizer; $normalizer = new PropertyNameNormalizer(); $property = $normalizer->normalize('2 $ bill'); echo $property;
outputs:
twoDollarBill
See the tests for more examples.
Why?
You must never run code generated from untrusted user input. But there are a few cases where you do want to output code generated from (mostly) trusted input.
In my case, I need to generate classes and properties from an OpenAPI specification. There are no hard-and-fast rules on the characters present, just a vague "it is RECOMMENDED to follow common programming naming conventions". Whatever they are.
How?
Each normalizer uses ext-intl
's Transliterator to turn the UTF-8 string into Latin-ASCII. Where a character has no
equivalent in ASCII (the "€" symbol is a good example), it uses the Unicode name of the character to spell it out (to
Euro
, after some minor clean-up). For ASCII characters that are not valid in a PHP label, it provides its own spell
outs. For instance, a backtick "`" becomes Backtick
.
Initial digits are also spelt out: "123foo" becomes OneTwoThreeFoo
. Finally reserved words are suffixed with a
user-supplied string so they don't mess things up. In the first usage example above, if we normalized "class" it would
become ClassController
.
The results may not be pretty. If for some mad reason your input contains ͖
- put your glasses on! - the label will
contain CombiningRightArrowheadAndUpArrowheadBelow
. But it is valid PHP, and stands a chance of being as unique as
the original. Which brings me to...
Unique labelers
The normalization process reduces around a million Unicode code points down to just 162 ASCII characters. Then it mangles the label further by stripping separators, reducing whitespace and turning it into camelCase, snake_case or whatever your programming preference. It's gonna be lossy - nothing we can do about that.
The unique labelers' job is to add back lost uniqueness, using a UniqueStrategyInterface
to decorate any non-unique
class names in the list it is given.
To guarantee uniqueness within a set of class name labels, use the UniqueClassLabeller
:
<?php use Kynx\Code\Normalizer\ClassNameNormalizer; use Kynx\Code\Normalizer\UniqueClassLabeler; use Kynx\Code\Normalizer\UniqueStrategy\NumberSuffix; $labeler = new UniqueClassLabeler(new ClassNameNormalizer('Handler'), new NumberSuffix()); $labels = ['Déjà vu', 'foo', 'deja vu']; $unique = $labeler->getUnique($labels); var_dump($unique);
outputs:
array(3) {
'Déjà vu' =>
string(7) "DejaVu1"
'foo' =>
string(3) "Foo"
'deja vu' =>
string(7) "DejaVu2"
}
There are labelers for each of the normalizers: UniqueClassLabeler
, UniqueConstantLabeler
, UniquePropertyLabeler
and UniqueVariableLabeler
. Along with the NumberSuffix
implementation of UniqueStrategyInterface
, we provide a
SpellOutOrdinalPrefix
strategy. Using that instead of NumberSuffix
above would output:
array(3) {
'Déjà vu' =>
string(11) "FirstDejaVu"
'foo' =>
string(3) "Foo"
'deja vu' =>
string(12) "SecondDejaVu"
}
Kinda cute, but a bit verbose for my taste.