lhcze / bcp47-tag
BCP47Tag parser and validator
Requires
- php: >=8.3
- ext-curl: *
- ext-intl: *
Requires (Dev)
- friendsofphp/php-cs-fixer: 3.*
- php-parallel-lint/php-parallel-lint: 1.*
- phpmd/phpmd: 2.*
- phpstan/phpstan: 2.*
- phpstan/phpstan-strict-rules: ^2.0
- phpunit/phpunit: 12.*
- psy/psysh: 0.*
- slevomat/coding-standard: 8.*
- squizlabs/php_codesniffer: 3.*
This package is auto-updated.
Last update: 2025-07-13 22:24:02 UTC
README
πͺ Donβt panic. Your tag is valid.
Validate, Normalize & Canonicalize BCP 47 Language Tags (en
, en-US
, zh-Hant-CN
, etc.)
BCP47Tag is a robust PHP library for working with BCP 47 language tags:
- βοΈ Validates against the real IANA Language Subtag Registry
- βοΈ ABNF-compliant (RFCβ―5646)
- βοΈ Supports language, script, region, variant, grandfathered tags
- βοΈ Auto-normalizes casing & separators (
en_us
βen-US
) - βοΈ Automatically expands collapsed ranges from the registry
- βοΈ Resolves partial language tags (e.g.,
en
βen-US
) using custom canonical matching, with scoring - βοΈ Error handling via clear exception types
- βοΈ Lightweight
LanguageTag
VO for validated tags - βοΈ Works perfectly with
ext-intl
βno surprises upon feeding ICU - βοΈ Easy fallback mechanism
- οΈπ«§ Supports grandfathered tags so old, they still remember when Unicode 2.0 was hot
- π Accepts
i-klingon
andi-enochian
for your occult projects - π€
ABNF
so clean, linguists shed a single tear
β Why not just use ext-intl
?
Good question β and the answer is: you should keep using it!
ext-intl
(ICU) is brilliant at formatting if your tag is clean.
However, it does not:
- β Validate that your tag fully follows the BCP 47 ABNF rules.
- β Reject or warn about grandfathered or deprecated subtags.
- β Match your tags against the authoritative IANA Language Subtag Registry.
- β
Resolve partial input (
en
βen-US
) to a known canonical list. - β
Enforce known tags only with
knownTags
+requireCanonical
.
If youβre in Symfony, you might also use
#[Assert\Locale]
for basic input validation.
And thatβs fine for checking user input β but it stops at structure. It wonβt canonicalize, resolve, or check IANA.
π So the best practice:
- β Use BCP47Tag to validate & normalize.
- β
Hand the cleaned tag to
ext-intl
or whatever else you have for formatting & display. - β Trust youβll never feed ICU any garbage.
- β Carry around immutable LanguageTag value object across your code base instead of string
BCP47Tag: RFC 5646 + IANA + real normalization + fallback + resolution.
No hustle with regex, str_replace()
or guesswork.
β‘οΈ Installation
composer require lhcze/bcp47-tag
π Basic Usage
use LHcze\BCP47\BCP47Tag; // Just normalize & validate $tag = new BCP47Tag('en_us'); echo $tag->getNormalized(); // "en-US" echo $tag->getICUformat(); // "en_US" // With canonical matching $tag = new BCP47Tag('en', useCanonicalMatchTags: ['de-DE', 'en-US']); echo $tag->getNormalized(); // "en-US" // Use fallback if invalid $tag = new BCP47Tag('notreal', 'fr-FR'); echo $tag->getNormalized(); // fr-FR // Invalid input β exception try { new BCP47Tag('invalid!!'); } catch (BCP47InvalidLocaleException $e) { echo $e->getMessage(); } // Feed to ext-intl $icu = $tag->getICULocale(); // en_US echo Locale::getDisplayLanguage($icu); // English // LanguageTag VO $langTag = $tag->getLanguageTag(); echo $langTag->getLanguage(); // "en" echo $langTag->getRegion(); // "US" echo (string) $langTag; // "en-US"
π Features & Flow
-
Normalize + parse
Clean casing/formatting and parse into components. -
Validate against IANA
Broken input or fallback triggers explicit exceptions:BCP47InvalidLocaleException
BCP47InvalidFallbackLocaleException
-
Canonical matching (optional)
- Pass an array of
useCanonicalMatchTags
- Each is matched and scored:
+100 language match, +10 region, +1 script - Highest score wins.
- Same score makes the first one to have it to make a home run
- Pass an array of
-
LanguageTag VO
Immutable, validated,Stringable
&JsonSerializable
.
π Supported Tags
BCP47Tag uses a precompiled static PHP snapshot of the latest IANA Language Subtag Registry to validate languages, scripts, regions, variants, and grandfathered tags. The registry is loaded once per process, kept hot in OPcache for maximum speed.
- β ISO language, script, region, variants
- β
Grandfathered/deprecated tags (e.g.,
i-klingon
) - β Collapsed registry ranges are auto-expanded
- β οΈ Extensions & private-use subtags (future)
π§© Key API
Method | Description |
---|---|
__construct(string $input, ?string $fallback, ?array $useCanonicalMatchTags) |
Main entry |
getInputLocale() |
Original input string |
getNormalized() |
RFCβ5646 formatted tag |
getICUformat() |
Underscore variant (xx_XX ) |
getLanguageTag() |
Returns LanguageTag VO |
__toString() / jsonSerialize() |
Returns normalized string |
π The Official BCP 47 ABNF
The syntax tags must follow is defined by RFC 5646 in ABNF:
langtag = language ["-" script] ["-" region] *("-" variant) *("-" extension) ["-" privateuse]
Examples:
- β
en
β valid - β
en-US
β valid - β
zh-Hant-CN
β valid - β
i-klingon
β valid (grandfathered) - β
en-US-x-private
β valid (extension/private use) - β
en-US--US
β invalid
BCP47Tag respects this ABNF, so your tags match the real spec β no hidden assumptions.
β Why is this useful?
Use cases include:
- Validating API
Accept-Language
headers - Multi-regional CMS deployments
- Internationalization pipelines
- Locale-dependent services where mis-typed tags lead to silent failures
βοΈ Requirements
- PHP 8.3+
ext-intl
π§ͺ Tests
composer qa
π Roadmap
- β IANA Language Subtag Registry integration
- β Language, script, region, variant validation
- β Lazy singleton registry loader
- β Static PHP snapshot of the IANA registry for ultra-fast lookups
- β Canonical matching with scoring
- β Typed exceptions for flow control
- βοΈ Extension/subtag support (planned)
- βοΈ Additional data use from IANA registry (suppress-script subtag, preferred, prefix)
- βοΈ Auto-registry refresh script
π License
π References
𧬠Now go and boldly canonicalize strange new tags the BCP 47 way! πβ¨