lsimul/tyop

Solves most of the typos found in the Czech phrases.

Maintainers

Details

github.com/lSimul/tyop

Source

Issues

Installs: 7

Dependents: 0

Suggesters: 0

Security: 0

Stars: 2

Watchers: 1

Forks: 0

Open Issues: 0

pkg:composer/lsimul/tyop

0.1.0 2024-08-06 20:04 UTC

This package is not auto-updated.

Last update: 2026-01-08 13:53:58 UTC


README

Set of regular expressions trying to fix common mistakes in Czech like

  • Multiple spaces,
  • spaces on incorrect places (befora comma/fullstop, lack of space after comma/fullstop etc.),
  • proper date format,
  • currency formating,
  • conversion from '-' to '–' where necessary,
  • and some more small, but neat fixes.

Installation

composer require lsimul/tyop

Usage

use LSimul\Tyop;

$corrector = new Tyop\Corrector;

$text = 'Ale,ovšem ,tato věc , ta je od 2.12. za 300 ,- Kč .  '

$text = $corrector->normalize($text);
// Ale, ovšem, tato věc, ta je od 2.12. za 300,- Kč.

$text = $corrector->fix($text);
// Ale, ovšem, tato věc, ta je od 2. 12. za 300 Kč.

Corrector has two methods and the idea was to use them in the order normalize -> fix (-> fixDomain)

  • normalize(string $text): string resolves some simple issues with spaces, like lack of them before bracket and so on.
  • fix(string $text): string digs a little bit deeper and it is willing to format dates, currencies by removing/adding different chars than just a whitespace.

Extending

Corrector has two methods which allow one to extend its behaviour:

  • addRule(string $regex, string $replacement): self
    • Adds new rule, basically it just validates regex and stores it.
  • fixDomain(string $text): string
    • Uses all of the new rules on the text.

fixDomain is here to fix some issues which are specific for given text and it cannot be put into the main set of rules (for example aggressively turning lt into l because here it always means liter).

Fixing HTML

Next to Corrector there is also a Bridge\Html, which wraps Corrector and can work on HTML; trimming whitespaces on the end of the paragraph, removing newlines.

<p>
	<em><u>Interval 3.3. - 8.3. 2024   </u></em>
</p>
<p><br></p>

<!-- Will be turned into: -->
<p>
	<em><u>Interval 3. 3. – 8. 3. 2024</u></em>
</p>