geocurly / name-splitter
Name split utility
0.1
2020-05-29 19:53 UTC
Requires
- php: ^7.4
Requires (Dev)
- phpunit/phpunit: ^9.1
- symfony/var-dumper: ^5.0
This package is not auto-updated.
Last update: 2024-05-06 01:43:21 UTC
README
Unfortunately, utility supports only сyrillic names
There is a name split utility. It's take input string and parse it to the object.
Usage:
<?php declare(strict_types=1); use NameSplitter\NameSplitter; $splitter = new NameSplitter(['enc' => 'CP1251']); $result = $splitter->split('Иванов Иван Иванович'); [$surname, $name, $middleName] = [ $result->getSurname(), $result->getName(), $result->getMiddleName(), ];
Quality
The NameSplitter's tests cover ~ 13000 cases of russian names with accuracy 99.65. Every case took a part with many templates, so result cases count was 124283.
You can run tests with your data set (use --verbose
option to see templates errors):
[aleksandr@aleksandr name-splitter]$ ./bin/name-split-test --file=$(realpath fio.csv)
TESTED TEMPLATES:
%Surname %Name %Middle
%Name %Middle %Surname
%Name %Middle
%Name %Surname
%Surname %Name
%Surname %StrictInitials
%StrictInitials %Surname
%Surname %SplitInitials
%SplitInitials %Surname
ACCURACY: 99.65
COUNT CASE TOTAL: 124283
COUNT CASE PASS: 123848
COUNT CASE ERROR: 435
Format for fio.csv
file is:
SomeSurname;SomeName;SomeMiddleName
Problems
- Utility can't recognize templates like
%Name %Surname
when surname matches with middle name (for exampleИван Иванович
). - Some templates may not correctly work when split name doesn't exist in dictionaries
Decision
You can use pre and post templates:
<?php declare(strict_types=1); use NameSplitter\{ NameSplitter, Template\SimpleMatch, Contract\TemplateInterface as TPL, Contract\StateInterface }; $before = [ // for this case we explicitly match name parts with template new SimpleMatch([ TPL::SURNAME => 'Difficult Surname', TPL::NAME => 'Difficult Name' ]), static function(StateInterface $state) { // TODO there is will be your implementation return [ TPL::SURNAME => $surname ?? null, TPL::NAME => $name ?? null, ]; }, ]; // There are may be any callable types if they take to input the StateInterface $after = []; $splitter = new NameSplitter([], $before, $after); $result = $splitter->split('Difficult Surname Difficult Name');