nmapx/bmdm-soundex

Beider-Morse plus Daitch-Mokotoff soundex

Maintainers

Details

github.com/nmapx/BMDMSoundex

Source

Installs: 14 028

Dependents: 0

Suggesters: 0

Security: 0

Stars: 0

Watchers: 1

Forks: 10

Type:project

2.1.0 2020-09-09 17:57 UTC

This package is auto-updated.

Last update: 2025-01-10 03:34:34 UTC


README

This is a fork of the algorithm developed by Alexander Beider and Stephen P. Morse for phonetic matching of names and words. This algorithm generates less quantity of false hits comparing to soundex() and methaphone(). Also it's possible to use this algorithm for some non-latin alphabets without a transliteration.

Credits

Authors: Alexander Beider, Paris and Stephen P. Morse, San Francisco
Website: http://stevemorse.org/phoneticinfo.htm (source download, information and contacts)

Information

Currently there are 16 languages supported: Czech, Dutch, English, French, German, Greek (and Greek Latin), Hebrew, Hungarian, Italian, Latvian, Polish, Portuguese, Romanian, Russian (latin and cyrillic), Spanish, Turkish. Also BMPM (Beider-Morse Phonetic Matching) and BMDM as it's derivative can parse Hebrew names by Ashkenazic and Sephardic rules.

Differences

This fork's goal is to get rid of deprecated and global functions, global variables and to represent algorithm in OOP-like style. Also there were implemented some fixes and modifications for unification purposes. While exceeding the limits of procedural code now it's possible to include algorithm in frameworks and third-parity applications without a headache. Latvian language experimental support added.

Requirements

PHP 5.4+; mbstring extenstion

Performance

I strongly encourage to use PHP 7.0 and newer due to major performance enhancement since 5.x versions especially in array processing which is crucial for BMDM. Also there's built-in caching support - make sure that ./runtime directory is writable and let BMDM precompile and cache it's runtime rules. Here're charts of performance with and without caching. Also caching lowers I/O load. Test results available here .

Usage

Include BMDM.php or better use composer to install: composer require dautkom/bmdm

<?php

// You want to run ./composer install before
require "../vendor/autoload.php";
$bmdm = new \dautkom\bmdm\BMDM();

// Process string with a Beider-Morse algorithm and retrieve BM phonetic keys
$p = $bmdm->set('Hello world')->soundex()

// Try to guess string's language
$l = $bmdm->set('Grzegorz')->guess()

// Retrieve all supported languages
$g = $bmdm->getLanguages()

// Process string with a Beider-Morse algorithm and retrieve phonetic keys
$b = $bmdm->set('ברצלונה')->bm->soundex()

// Try to guess string's language and retrieve only language names
$l = $bmdm->set('Grzegorz')->bm->getLanguageNames()

// Retrieve Daitch-Mokotoff soundex values
// Only latin symbols are supported
$d = $bmdm->set('Grzegorz')->dm->soundex()

Ashkenazic and Sephardic support

<?php

require "../vendor/autoload.php";

// Using 'ash' upon init will load Ashkenazi phonetic rules
// Use 'sep' instead of 'ash' to init Sephardic rules
$bmdm = new \dautkom\bmdm\BMDM('ash');

Multiple languages in one string

<?php

require "../vendor/autoload.php";
$bmdm = new \dautkom\bmdm\BMDM();

$p = $bmdm->set('This is Спарта!')->soundex()

Different languages matching

<?php

require "../vendor/autoload.php";
$bmdm = new \dautkom\bmdm\BMDM();

// Words in different languages with the same pronunciation
// in most cases give intersections in results.

print_r($bmdm->set('Zelinska')->soundex());
print_r($bmdm->set('Зелинска')->soundex());

// ## Latin string
// Array
// (
//     [input] => zelinska
//     [numeric] => Array
//         (
//             [0] => Array
//                 (
//                     [0] => 486450
//                 )
// 
//         )
// 
//     [phonetic] => Array
//         (
//             [0] => Array
//                 (
//                     [0] => zYlnzki
//                     [1] => zilnzki
//                 )
// 
//         )
// 
// )
//
// ## Cyrillic string 
// Array
// (
//     [input] => зелинска
//     [numeric] => Array
//         (
//             [0] => Array
//                 (
//                     [0] => 486450
//                 )
// 
//         )
// 
//     [phonetic] => Array
//         (
//             [0] => Array
//                 (
//                     [0] => zYlnzka
//                     [1] => zYlnzko
//                     [2] => zilnzka
//                     [3] => zilnzko
//                 )
// 
//         )
// 
// )

Modification

If you are going to modify rules - disable cache for development process and cleanup ./runtime directory afterwards. Otherwise expired cached data will be loaded.

License

Project is distributed under GNU GPL v3 in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

Copyright (c) 2008-2016 Alexander Beider and Stephen P. Morse
Copyright (c) 2013-2016 Olegs Capligins