glance-project/latex-codec

Bidirectional LaTeX to Unicode and Unicode to LaTeX conversion library

Maintainers

Package info

gitlab.cern.ch/fence/common/latex-codec

pkg:composer/glance-project/latex-codec

Statistics

Installs: 4

Dependents: 0

Suggesters: 0

v1.1.1 2026-03-09 16:44 UTC

This package is auto-updated.

Last update: 2026-03-09 16:48:36 UTC


README

PHP CI Coverage Packagist License

Bidirectional LaTeX ↔ Unicode conversion library for PHP 8.2+.

Overview

latex-codec converts LaTeX markup to readable Unicode text and back. It handles symbol commands (\alphaα), combining-accent forms (\'{o}ó), and math-alphabet style commands (\mathbb{R}, \mathfrak{g}𝔤).

Both conversion directions are pure string transformations: no external process, no state, no I/O beyond the initial dataset load. The library is designed to be wired into a PSR-11 dependency-injection container using the supplied LatexCodecDependencies::definitions() factory.

Etymology

The name is a portmanteau of LaTeX and codec (coder/decoder), reflecting its role as a symmetric, lossless (for well-formed inputs) codec between the TeX world and Unicode.

Stack

LayerToolVersion
LanguagePHP^8.2
Unit testsPHPUnit^10
Static analysisPsalm^5 (errorLevel 1)
Coding standardPHP_CodeSniffer^3.6 (PSR-12)
CI/CDGitLab CI
Package registryPackagist

Architecture follows the same Domain / Application / Infrastructure layering used across the ALICE GLANCE suite:

src/
├── Domain/             # Interfaces, value objects, exceptions — no I/O
│   ├── Exception/
│   ├── ConversionDirection.php
│   ├── ConverterInterface.php
│   ├── Mapping.php
│   └── MappingProviderInterface.php
├── Application/        # Orchestration — wires domain + infrastructure
│   ├── LatexToUnicodeConverter.php
│   └── UnicodeToLatexConverter.php
└── Infrastructure/     # Concrete I/O and algorithms
    ├── Json/           # Dataset loader (JsonMappingProvider)
    ├── Regex/          # Style-command resolver (StyleResolver)
    ├── Trie/           # Longest-prefix substitution engine (TrieConverter)
    └── LatexCodecDependencies.php

Installation

composer require glance-project/latex-codec

Usage

Standalone

use Glance\LatexCodec\Application\LatexToUnicodeConverter;
use Glance\LatexCodec\Application\UnicodeToLatexConverter;
use Glance\LatexCodec\Infrastructure\Json\JsonMappingProvider;

$provider = new JsonMappingProvider();

$toUnicode = new LatexToUnicodeConverter($provider);
$toLaTeX   = new UnicodeToLatexConverter($provider);

echo $toUnicode->convert('\alpha + \beta = \gamma');
// → α + β = γ

echo $toUnicode->convert('\mathbb{R}^n \supset \mathfrak{g}');
// → ℝⁿ ⊃ 𝔤   (ⁿ is a symbol, ℝ and 𝔤 are style-resolved)

echo $toUnicode->convert("G\\'{o}mez");
// → Gómez

echo $toLaTeX->convert('α + β = γ');
// → \alpha + \beta = \gamma

echo $toLaTeX->convert('ℝ');
// → \mathbb{R}

With a PSR-11 DI Container (php-di)

use DI\ContainerBuilder;
use Glance\LatexCodec\Infrastructure\LatexCodecDependencies;

$builder = new ContainerBuilder();
$builder->addDefinitions(LatexCodecDependencies::definitions());
$container = $builder->build();

$converter = $container->get(\Glance\LatexCodec\Application\LatexToUnicodeConverter::class);
echo $converter->convert('\Sigma');
// Σ

LatexCodecDependencies::definitions() returns plain closure factories — compatible with any PSR-11 container, not just php-di.

Dataset

The bundled dataset lives at resources/latex-unicode.json and is structured in three sections.

symbols — 1,049 entries

Direct command → Unicode character mappings. Examples:

LaTeXUnicodeChar
\alphaU+03B1α
\rightarrowU+2192
\inftyU+221E
\'{o}U+00F3ó
\ssU+00DFß

aliases — 10 entries

Short-form aliases that resolve to canonical symbol keys:

AliasResolves to
\le\leq
\ge\geq
\ne\neq
\to\rightarrow
\gets\leftarrow
\iff\Leftrightarrow
\implies\Rightarrow
\land\wedge
\lor\vee
\lnot\neg

styles — 15 commands, 928 characters

Math-alphabet style commands whose output characters are computed from the Unicode Mathematical Alphanumeric Symbols block (U+1D400–U+1D7FF), with all block exceptions handled (e.g. \mathbb{C} = ℂ, \mathfrak{C} = ℭ):

\mathbb, \mathbf, \mathbfit, \mathbit, \mathcal, \mathfrak, \mathit, \mathmit, \mathscr, \mathsf, \mathsfbf, \mathsfbfsl, \mathsfsl, \mathsl, \mathtt

Source

The dataset was derived and cleaned from the legacy latexmap.json file used across the ALICE GLANCE suite, with the following transformations applied:

  • Only backslash-prefixed commands retained
  • HTML entities, $-mode keys, brace-artifact keys discarded
  • Style commands (\math*{base}) moved to the styles section and recomputed from Unicode Math Alphabets
  • \neq and \neg added as canonical symbols (present in source only as \not = and \lnot)

Running Locally

# Install dependencies
composer install

# Run tests
composer run test:unit

# Static analysis (Psalm level 1)
composer run test:types

# Coding standard (PSR-12)
composer run test:lint

# Auto-fix coding standard
composer run fix:lint

Coverage (requires Xdebug with coverage mode enabled):

XDEBUG_MODE=coverage composer run test:ci

Contributing

  1. Branch namingfeature/short-description or fix/short-description
  2. Commit style — lead with a gitmoji, e.g. ✨ add support for \boldsymbol
  3. Quality gates — all three must pass before merge: test:unit, test:types, test:lint
  4. Test coverage — new behaviour must be covered; the CI gate requires ≥95% line coverage

Project links

ResourceURL
GitLab repositoryhttps://gitlab.cern.ch/fence/common/latex-codec
Packagist packagehttps://packagist.org/packages/glance-project/latex-codec
CERN GitLabhttps://gitlab.cern.ch