oblak/syllabizer

Splits Serbian words into syllables (podela reči na slogove). Supports Latin and Cyrillic.

Maintainers

Package info

github.com/oblakstudio/syllabizer

pkg:composer/oblak/syllabizer

Fund package maintenance!

seebeen

Statistics

Installs: 0

Dependents: 0

Suggesters: 0

Stars: 0

Open Issues: 0

v1.0.0 2026-06-07 13:53 UTC

This package is auto-updated.

Last update: 2026-06-07 13:55:45 UTC


README

Syllabizer

Split serbian words into syllables (podela reči na slogove)

Packagist Version Packagist PHP Version

Installation

You can install the package via composer:

$ composer require oblak/syllabizer

Usage

<?php

require __DIR__ . '/vendor/autoload.php';

use Oblak\Syllabizer;

$syllabizer = new Syllabizer();

$syllabizer->syllabize('jednak');   // ['jed', 'nak']
$syllabizer->syllabize('tramvaj');  // ['tram', 'vaj']
$syllabizer->syllabize('pidžama');  // ['pi', 'dža', 'ma']
$syllabizer->syllabize('mačka');    // ['ma', 'čka']

// Cyrillic works just as well
$syllabizer->syllabize('сломљен');  // ['слом', 'љен']

// Syllabic R (slogotvorno r) is a nucleus of its own
$syllabizer->syllabize('brzo');     // ['br', 'zo']
$syllabizer->syllabize('rđa');      // ['r', 'đa']

// Count the syllables
count($syllabizer->syllabize('slogovnik')); // 3

syllabize() accepts a string or any Stringable, and returns an ordered array of syllables. Joining the result reproduces the original word exactly:

$word = 'doneti';

implode('', $syllabizer->syllabize($word)) === $word; // true

Joining syllables

tokenize() is a convenience wrapper that returns the syllables as a single string, joined by a separator (a hyphen by default):

$syllabizer->tokenize('doneti');        // 'do-ne-ti'
$syllabizer->tokenize('сломљен');       // 'слом-љен'

// Pass any separator you like
$syllabizer->tokenize('doneti', '·');   // 'do·ne·ti'

How it works

The library follows the standard pedagogical rules for Serbian syllabification:

  • Both scripts — Latin and Cyrillic input are supported. The Latin digraphs lj, nj and (in any case) count as a single consonant and are never split, just like their Cyrillic counterparts љ, њ, џ.
  • Vowels carry syllables — the number of syllables equals the number of vowels (a e i o u), plus any syllabic R.
  • Syllabic R — an r with no neighbouring vowel (between consonants, or word‑initial before a consonant) becomes a syllable nucleus: pr‑st, tr‑ka, r‑vač.
  • Consonant clusters — a single consonant opens the following syllable (li‑va‑da); within a cluster the boundary falls between two sonants (or‑la, tram‑vaj) or between a plosive and a following non‑approximant (lop‑ta, sred‑stvo); otherwise the whole cluster opens the next syllable (la‑sta, je‑dva, sve‑tlost).

Testing

$ composer test

Coding standards

$ composer lint      # check
$ composer lint:fix  # auto-fix

License

The MIT License (MIT). Please see the License File for more information.