matecat / subfiltering
Matecat Subfiltering component
Installs: 11 680
Dependents: 0
Suggesters: 0
Security: 0
Stars: 0
Watchers: 3
Forks: 0
Open Issues: 1
Requires
- php: >=7.4
- ext-dom: *
- ext-libxml: *
- ext-xml: *
- matecat/emoji-to-entity-converter: ^1
- matecat/xml-dom-parser: ^1
Requires (Dev)
- phpunit/phpunit: ^9
- dev-master
- v2.3.1
- v2.3.0
- v2.2.10
- v2.2.9
- v2.2.8
- v2.2.7
- v2.2.6
- v2.2.5
- v2.2.4
- v2.2.3
- v2.2.2
- v2.2.1
- v2.2.0
- v2.1.2
- v2.0.5
- v2.0.4
- v2.0.3
- v2.0.2
- v2.0.1
- v2.0.0
- v1.2.14-beta.2
- v1.2.14-beta
- v1.2.13
- v1.2.12
- v1.2.11
- v1.2.10-beta
- v1.2.9
- v1.2.8
- v1.2.7
- v1.2.6
- v1.2.5
- v1.2.4
- v1.2.3
- v1.2.2
- v1.2.1
- v1.2.0.1
- v1.2.0
- v1.1.17
- v1.1.16
- v1.1.15
- v1.1.14
- v1.1.13
- v1.1.12
- v1.1.11
- v1.1.10
- v1.1.9
- v1.1.8
- v1.1.7
- v1.1.6
- v1.1.5
- v1.1.4
- v1.1.3
- v1.1.2
- v1.1.1
- v1.1.0
- v1.0.16
- v1.0.15
- v1.0.14
- v1.0.13
- v1.0.12
- v1.0.11
- v1.0.10
- v1.0.9
- v1.0.8
- v1.0.7
- v1.0.6
- v1.0.5
- v1.0.4
- v1.0.3
- v1.0.2
- v1.0.1
- v1.0.0
- dev-remove-xliff-parser-dep
- dev-equiv-text-x-tags
- dev-renovate/configure
- dev-fix-sprintf-regex
- dev-fix-apostrophe-in-simil-html
- dev-fix-nbsp-to-space
- dev-fix-variables
- dev-nbsp-double-encode
- dev-nbsp
- dev-remove-singlecurlybracketstoph
- dev-fix-square-sprintf
- dev-lokalise-tags
- dev-sprintf-update
- dev-remove-double-underscore
- dev-fix-smart-counts
- dev-html_allow_double_points
- dev-lm-placeholders
- dev-ctype
- dev-remove-skyscanner-placeholders
- dev-g-xid
- dev-skyscanner-syntax
- dev-fix-tag-g-double-encoding
- dev-sprintf-az
- dev-v.1.0.4
- dev-remove-%%
- dev-@@-syntax
- dev-fix-amp-url
- dev-ltb
- dev-revertTo1011
- dev-fix-href
- dev-fix-ph-tags-without-dataRef
- dev-HU-%-ot
- dev-fix-html
- dev-fix-dont-touch-already-parsed-ph
- dev-twig-improvements
- dev-fix-ph-no-dataref
- dev-fix-double-lt
- dev-curly-brackets
This package is auto-updated.
Last update: 2025-09-26 09:19:27 UTC
README
Subfiltering is a component used by Matecat and MyMemory for converting strings between the database, external services, and the UI layers. It provides a pipeline of filters to safely transform content across these layers while preserving XLIFF tags, HTML placeholders, and special entities.
Overview
Embedding XML in a REST JSON payload is notoriously hard to render safely and legibly in a web browser. Browsers, frameworks, and JSON serializers all have opinions about angle brackets, entities, and special characters. The result is typically a mix of double-encoding, broken markup, or inline codes that translators can accidentally damage.
This library solves that by introducing reversible “layers” and a transformation pipeline that makes XML- and XLIFF-rich content safe for transport and UI display, while guaranteeing you can restore the exact original.
What makes XML in JSON hard for the browser
-
Angle brackets and entities: Raw < and > conflict with HTML, and HTML/JS frameworks may escape or re-escape entities differently than you expect.
-
Inline codes in text: XLIFF inline tags (ph, pc, etc.), HTML/XML snippets, ICU, or sprintf tokens can be misinterpreted or edited improperly when shown as-is.
-
Safety vs. readability: You need to prevent XSS and layout breakage, but you also need a UI where users can read and edit the text around inline codes.
-
Use it when:
- Your source text includes variables, placeholders, XML, or HTML tags.
- You must accept user edits while preventing structural damage to tags.
-
What it gives you:
- Converts inline tags to robust placeholders with base64 “memory” of the original, then restores exactly after the round-trip.
- Prevents double-encoding and protects structural elements.
In short, this library is a bridge between “XML-correct” and “browser-safe,” letting you serve and accept JSON payloads that are straightforward to display and edit in the web UI, while guaranteeing that your original XML/XLIFF structure is preserved perfectly end to end.
How the library addresses it
-
Normalizes and preserves XLIFF tags across transformations.
-
Encodes/decodes special characters and placeholders for safe round-trips.
-
Converts between three processing layers:
- Layer 0 (Database): A database-safe XML form, suitable for persistence, export, and exact reconstruction.
- Layer 1 (External services): A transport-safe form tailored for MT/TM systems that aren’t XML-aware.
- Layer 2 (UI): A browser/UI-friendly form that replaces raw tags with safe placeholders and base64-backed metadata.
-
UI-friendly placeholders
- XML/XLIFF/HTML tags are converted to stable placeholders with an embedded, base64-encoded “memory” of the original tag.
- The UI can display and move placeholders without exposing raw markup, reducing the risk of accidental tag damage.
-
Reversible roundtrips
- When the browser sends edited text back, the library restores Layer 2 content to Layer 0, reconstructing the exact original tags from the placeholders.
- The same applies for Layer 0 ↔ Layer 1 when calling external services.
-
Supports XLIFF 2.x dataRef replacement, aligning inline codes from
<originalData>
with inline tags in segments.- If your XLIFF uses originalData with dataRef/dataRefStart/dataRefEnd, the library will create meaningful placeholders for the UI and then restore real XLIFF tags afterward.
- This keeps both the JSON payload and browser rendering safe without losing fidelity.
Installation
Install via Composer:
bash composer require matecat/subfiltering
Requirements:
- PHP 7.4+
- PHPUnit 9.x for running tests (dev)
Filters
Two concrete filters are provided (both implement AbstractFilter
):
Matecat\SubFiltering\MateCatFilter
Matecat\SubFiltering\MyMemoryFilter
Create instances using the static getInstance
factory:
<?php use Matecat\SubFiltering\MateCatFilter; use Matecat\SubFiltering\Contracts\FeatureSetInterface; use Matecat\SubFiltering\Mocks\FeatureSet; // Example implementation lives under tests/ (use your own in production) $featureSet = new FeatureSet(); // must implement FeatureSetInterface // Optional parameters: // - $source (string): source language (e.g., 'en-US') // - $target (string): target language (e.g., 'it-IT') // - $dataRefMap (array): map for XLIFF 2 dataRef replacement (see section below) $filter = MateCatFilter::getInstance($featureSet, 'it-IT', 'en-US', []);
The first argument MUST be a concrete implementation of Matecat\SubFiltering\Contracts\FeatureSetInterface
.
DataRef replacement (XLIFF 2)
XLIFF 2.0/2.1 allows binding inline tags to <originalData>
via:
<ph>
,<sc>
,<ec>
usingdataRef
<pc>
usingdataRefStart
anddataRefEnd
This library can automatically introduce an equiv-text
attribute (base64-encoded original value) based on a provided dataRef map, and convert <pc>
pairs to Matecat-compatible <ph>
placeholders for UI consumption. On the way back, it restores the original XLIFF structure.
- Full documentation and examples: docs/dataRef.md
How to provide the map:
- Build an associative array where keys are data ids from
<originalData><data id="...">value</data></originalData>
. - Pass that array as the fourth parameter when instantiating the filter.
Example:
<?php use Matecat\SubFiltering\MateCatFilter; use Matecat\SubFiltering\Mocks\FeatureSet; $dataRefMap = [ 'source1' => '${AMOUNT}', 'source2' => '${RIDER}', ]; $filter = MateCatFilter::getInstance(new FeatureSet(), 'en-US', 'it-IT', $dataRefMap); // When converting to Layer 2 (UI), the filter will: // - add equiv-text to <ph>/<sc>/<ec> using the map // - convert <pc> ranges to UI placeholders with originalData captured // When converting back to Layer 1/0, it restores the original XLIFF tags.
Note:
- If a dataRef key exists but its value is null or empty, it is treated as the literal string
NULL
. - If the dataRef map is empty, the component still preserves inline codes by encoding original tags as Matecat placeholders to keep them safe in the UI.
See docs/dataRef.md for concrete before/after string examples and behavior details.
Basic usage
Once you have a filter instance, use the methods below to convert between layers.
MateCatFilter
methods:
fromLayer0ToLayer2
fromLayer1ToLayer2
fromLayer2ToLayer1
fromLayer2ToLayer0
fromLayer0ToLayer1
fromLayer1ToLayer0
fromRawXliffToLayer0
fromLayer0ToRawXliff
MyMemoryFilter
methods:
fromLayer0ToLayer1
fromLayer1ToLayer0
Where:
- Layer 0 = Database
- Layer 1 = External services (MT/TM)
- Layer 2 = Matecat UI
Example: DB to UI and back (with dataRef map)
<?php use Matecat\SubFiltering\MateCatFilter; use Matecat\SubFiltering\Mocks\FeatureSet; $featureSet = new FeatureSet(); $dataRefMap = [ 'd1' => '_', 'd2' => '**', ]; $filter = MateCatFilter::getInstance($featureSet, 'en-US', 'it-IT', $dataRefMap); // Example Layer 0 content holding XLIFF inline codes $layer0 = "Hi %s ."; // 1) Layer 0 -> Layer 2 (UI) $ui = $filter->fromLayer0ToLayer2($layer0); // 'Hi <ph id="mtc_1" ctype="x-sprintf" equiv-text="base64:JXM="/> .' // 2) User edits happen in UI ... // 3) Layer 2 -> Layer 0 (restore original XLIFF structure) $backToDb = $filter->fromLayer2ToLayer0($ui);
Example: External service roundtrip
<?php use Matecat\SubFiltering\MateCatFilter;use Matecat\SubFiltering\Mocks\FeatureSet; $filter = MateCatFilter::getInstance(new FeatureSet(), 'en-US', 'de-DE', []); $layer0 = 'Text with <ph id="1" equiv-text="&lt;br/&gt;"/> and placeholders.'; // Prepare for MT/TM $layer1 = $filter->fromLayer0ToLayer1($layer0); // 'Text with <ph id="mtc_1" ctype="x-original_ph" x-orig="PHBoIGlkPSIxIiBlcXVpdi10ZXh0PSImbHQ7YnIvJmd0OyIvPg==" equiv-text="base64:Jmx0O2JyLyZndDs="/> and placeholders.' // ... send $layer1 to MT/TM and get $translatedLayer1 back ... // Restore for DB $layer0Restored = $filter->fromLayer1ToLayer0($layer1);
Injecting custom handlers into the pipeline
Goal Show how to inject only a subset of supported injectable handlers into the transformation pipeline so they run alongside the built-in handlers. Key points
- Handlers are classes that extend the base handler and implement a transform method.
- You do not manually construct handlers; the pipeline instantiates them and injects the Pipeline instance via setPipeline.
- You inject handlers by passing an array of class names to the filter factory method. Unknown classes are ignored. The sorter normalizes the final execution order.
Example:
<?php use Matecat\SubFiltering\MateCatFilter; use Matecat\SubFiltering\Enum\InjectableFiltersTags; // Example 1: enable only a subset of supported injectable handlers. // Only handlers known to the sorter will be kept and ordered. $featureSet = new YourFeatureSetImplementation(); // implements FeatureSetInterface $filter = MateCatFilter::getInstance( $featureSet, 'en-US', 'it-IT', [], // dataRef map [ InjectableFiltersTags::markup, // supported InjectableFiltersTags::single_curly, // supported (disabled by default, but its injection is allowed and thus, enabled here) 'foobar' // any unsupported class would be ignored ] ); $input = 'You have {NUM_RESULTS, plural, =0 {no results} one {1 result} other {# results}} for "{SEARCH_TERM}".'; // 'You have {NUM_RESULTS, plural, =0 {no results} one {1 result} other {# results}} for "<ph id="mtc_1" ctype="x-curly-brackets" equiv-text="base64:e1NFQVJDSF9URVJNfQ=="/>".' $l1 = $filter->fromLayer0ToLayer1($input); $l2 = $filter->fromLayer0ToLayer2($input);
Disable all injectable handlers by passing null
Example:
<?php use Matecat\SubFiltering\MateCatFilter; // Example 2: disable all injectable handlers by passing null. // Only the fixed, non-injectable pipeline steps will run. $featureSet = new YourFeatureSetImplementation(); // implements FeatureSetInterface $filterNoInjectables = MateCatFilter::getInstance( $featureSet, 'en-US', 'it-IT', [], null // no injectable handlers ); $string = 'This is &lt;b&gt;bold&lt;/b&gt; text.'; $l1_no = $filter->fromLayer0ToLayer1($input); // 'This is &lt;b&gt;bold&lt;/b&gt; text.' $l2_no = $filterNoInjectables->fromLayer0ToLayer2($input);
FeatureSet
You must provide a FeatureSetInterface
implementation to adjust the pipeline per transformation. A simple, working example lives under the tests/ folder. In your application, implement only the features you need and register them via your FeatureSet.
Running tests
bash composer install ./vendor/bin/phpunit
Support
Please open issues and feature requests on GitHub: https://github.com/matecat/subfiltering/issues
Authors
- Domenico Lupinetti - https://github.com/ostico
- Mauro Cassani - https://github.com/mauretto78
License
This project is licensed under the MIT License - see the LICENSE file for details