glance-project / latex-codec
Bidirectional LaTeX to Unicode and Unicode to LaTeX conversion library
Requires
- php: ^8.2
Requires (Dev)
- phpunit/phpunit: ^10
- psalm/plugin-phpunit: ^0.18.4
- squizlabs/php_codesniffer: ^3.6
- vimeo/psalm: ^5
This package is auto-updated.
Last update: 2026-03-09 16:48:36 UTC
README
Bidirectional LaTeX ↔ Unicode conversion library for PHP 8.2+.
Overview
latex-codec converts LaTeX markup to readable Unicode text and back. It handles symbol commands (\alpha → α), combining-accent forms (\'{o} → ó), and math-alphabet style commands (\mathbb{R} → ℝ, \mathfrak{g} → 𝔤).
Both conversion directions are pure string transformations: no external process, no state, no I/O beyond the initial dataset load. The library is designed to be wired into a PSR-11 dependency-injection container using the supplied LatexCodecDependencies::definitions() factory.
Etymology
The name is a portmanteau of LaTeX and codec (coder/decoder), reflecting its role as a symmetric, lossless (for well-formed inputs) codec between the TeX world and Unicode.
Stack
| Layer | Tool | Version |
|---|---|---|
| Language | PHP | ^8.2 |
| Unit tests | PHPUnit | ^10 |
| Static analysis | Psalm | ^5 (errorLevel 1) |
| Coding standard | PHP_CodeSniffer | ^3.6 (PSR-12) |
| CI/CD | GitLab CI | — |
| Package registry | Packagist | — |
Architecture follows the same Domain / Application / Infrastructure layering used across the ALICE GLANCE suite:
src/
├── Domain/ # Interfaces, value objects, exceptions — no I/O
│ ├── Exception/
│ ├── ConversionDirection.php
│ ├── ConverterInterface.php
│ ├── Mapping.php
│ └── MappingProviderInterface.php
├── Application/ # Orchestration — wires domain + infrastructure
│ ├── LatexToUnicodeConverter.php
│ └── UnicodeToLatexConverter.php
└── Infrastructure/ # Concrete I/O and algorithms
├── Json/ # Dataset loader (JsonMappingProvider)
├── Regex/ # Style-command resolver (StyleResolver)
├── Trie/ # Longest-prefix substitution engine (TrieConverter)
└── LatexCodecDependencies.php
Installation
composer require glance-project/latex-codec
Usage
Standalone
use Glance\LatexCodec\Application\LatexToUnicodeConverter;
use Glance\LatexCodec\Application\UnicodeToLatexConverter;
use Glance\LatexCodec\Infrastructure\Json\JsonMappingProvider;
$provider = new JsonMappingProvider();
$toUnicode = new LatexToUnicodeConverter($provider);
$toLaTeX = new UnicodeToLatexConverter($provider);
echo $toUnicode->convert('\alpha + \beta = \gamma');
// → α + β = γ
echo $toUnicode->convert('\mathbb{R}^n \supset \mathfrak{g}');
// → ℝⁿ ⊃ 𝔤 (ⁿ is a symbol, ℝ and 𝔤 are style-resolved)
echo $toUnicode->convert("G\\'{o}mez");
// → Gómez
echo $toLaTeX->convert('α + β = γ');
// → \alpha + \beta = \gamma
echo $toLaTeX->convert('ℝ');
// → \mathbb{R}
With a PSR-11 DI Container (php-di)
use DI\ContainerBuilder;
use Glance\LatexCodec\Infrastructure\LatexCodecDependencies;
$builder = new ContainerBuilder();
$builder->addDefinitions(LatexCodecDependencies::definitions());
$container = $builder->build();
$converter = $container->get(\Glance\LatexCodec\Application\LatexToUnicodeConverter::class);
echo $converter->convert('\Sigma');
// Σ
LatexCodecDependencies::definitions() returns plain closure factories — compatible with any PSR-11 container, not just php-di.
Dataset
The bundled dataset lives at resources/latex-unicode.json and is structured in three sections.
symbols — 1,049 entries
Direct command → Unicode character mappings. Examples:
| LaTeX | Unicode | Char |
|---|---|---|
\alpha | U+03B1 | α |
\rightarrow | U+2192 | → |
\infty | U+221E | ∞ |
\'{o} | U+00F3 | ó |
\ss | U+00DF | ß |
aliases — 10 entries
Short-form aliases that resolve to canonical symbol keys:
| Alias | Resolves to |
|---|---|
\le | \leq |
\ge | \geq |
\ne | \neq |
\to | \rightarrow |
\gets | \leftarrow |
\iff | \Leftrightarrow |
\implies | \Rightarrow |
\land | \wedge |
\lor | \vee |
\lnot | \neg |
styles — 15 commands, 928 characters
Math-alphabet style commands whose output characters are computed from the Unicode Mathematical Alphanumeric Symbols block (U+1D400–U+1D7FF), with all block exceptions handled (e.g. \mathbb{C} = ℂ, \mathfrak{C} = ℭ):
\mathbb, \mathbf, \mathbfit, \mathbit, \mathcal, \mathfrak, \mathit, \mathmit, \mathscr, \mathsf, \mathsfbf, \mathsfbfsl, \mathsfsl, \mathsl, \mathtt
Source
The dataset was derived and cleaned from the legacy latexmap.json file used across the ALICE GLANCE suite, with the following transformations applied:
- Only backslash-prefixed commands retained
- HTML entities,
$-mode keys, brace-artifact keys discarded - Style commands (
\math*{base}) moved to thestylessection and recomputed from Unicode Math Alphabets \neqand\negadded as canonical symbols (present in source only as\not =and\lnot)
Running Locally
# Install dependencies
composer install
# Run tests
composer run test:unit
# Static analysis (Psalm level 1)
composer run test:types
# Coding standard (PSR-12)
composer run test:lint
# Auto-fix coding standard
composer run fix:lint
Coverage (requires Xdebug with coverage mode enabled):
XDEBUG_MODE=coverage composer run test:ci
Contributing
- Branch naming —
feature/short-descriptionorfix/short-description - Commit style — lead with a gitmoji, e.g.
✨ add support for \boldsymbol - Quality gates — all three must pass before merge:
test:unit,test:types,test:lint - Test coverage — new behaviour must be covered; the CI gate requires ≥95% line coverage
Project links
| Resource | URL |
|---|---|
| GitLab repository | https://gitlab.cern.ch/fence/common/latex-codec |
| Packagist package | https://packagist.org/packages/glance-project/latex-codec |
| CERN GitLab | https://gitlab.cern.ch |