zachleigh/pronounce-php

Create English pronunciation strings for over 130,000 words. Hyphenate English words.

0.8 2015-06-28 02:18 UTC

This package is auto-updated.

Last update: 2024-12-22 13:47:49 UTC


README

Build Status Latest Stable Version Latest Unstable Version License

-Converts words to pronunciation strings using the Carnegie Mellon University Pronouncing Dictionary (CMUdict) file. Currently converts to the International Phonetic Alphabet (IPA) and to an easier to read spelling approximation.
-Hyphenates English words. Hyphenation for IPA and spelling approximation hopefully coming soon. -Outputs to the console, to a file, or to a database

Contents

Installation
Usage

Installation

Requirements
PHP 5.3.9 or higher

Linux users can find PHP releases in their distribution repositories. For other operating systems, please visit the php installation guide for instructions.

composer

Check the composer documentation for installation instructions.

Install

If requirements are met, you can install the package in two ways.

Download

Recommended. Download here and run

composer install
Through composer
composer require zachleigh/pronounce-php

If you install through composer, the program will be in vendor/zachleigh/pronounce-php

Usage

#####General syntax overview

pronounce-php command [argument]  [options]

Commands

all

Output the entire CMUdict file with arpabet, hyphenation, IPA, and spelling approximation strings to either a file or a database. Default is to write to a file called 'output.txt'.

Syntax overview
pronounce-php all [options]
Options
--destination [-d]

Set the output destination. Default is to output to a file called 'output.txt'. If file is selected (default), fields will be seperated by a forward slash (/) surrounded by spaces.
Available desitinations: [file, database]

pronounce-php all --destination=database
--fields [-f]

Set the output fields to be displayed. Fields must be in a comma seperated list. All fields are enabled by default.
Available fields: [word, hyphenated_word, arpabet, ipa, spelling]

pronounce-php all --fields=word,arpabet,ipa
--file [-o]

If 'file' is selected for the output destination, the 'file' option can be used to set a file name to write to. The default file name is 'output.txt' and is written to the pronounce-php directory.

pronounce-php all --destination=file --file=my_file.txt
--multiple [-m]

For some words in the CMUdict file, there are multiple pronunciation entries. The file deals with these by appending a number in parentheses to each additional entry.

ACERO  AH0 S EH1 R OW0
ACERO(1)  AH0 S Y EH1 R OW0
ACERO(2)  AH0 TH EH1 R OW0

The 'multiple' command sets the behavior to deal with thses multiple entries for the output. By default, 'multiple' is set to 'none', which outputs entries exactly as they appear in the CMUdict file. 'repeat' outputs entries without the parentheses or numbers.

pronounce-php lookup words_to_lookup --multiple=repeat
--symbol [-s]

Set the character to be used for hyphenation. The default value is a hyphen (-). Note: if writing to a file, keep in mind that items in the file will be divided by forward slashes(/) so setting the hyphenation symbol to forward slash will complicate reading of the file.

pronounce-php all --symbol=.
Examples

Basic usage

./pronounce-php all


Lines in file will look like this:
accepting / ac-cept-ing / AE0 K S EH1 P T IH0 NG / æksɛ'ptɪŋ / akse'pting /

Set desired output fields with the 'fields' option.

./pronounce-php all --fields=word,ipa,spelling


Lines in file will look like this:
accepting / æksɛ'ptɪŋ / akse'pting /

Change the symbol used to divide the hyphenated word with the 'symbol' option.

./pronounce-php all --symbol=.


Lines in file will look like this:
accepting / ac.cept.ing / AE0 K S EH1 P T IH0 NG / æksɛ'ptɪŋ / akse'pting /

Set the ouput destination with the 'destination' option. Only one destination may be choosen. If 'destination' is set to 'file' (the default value), use the 'file' option to specify a file to write to.

./pronounce-php all --file=all.txt


Successfully wrote to all.txt

If 'destination' is set to 'database', database credentials will be read from .env and configuration will be read from config.php.

./pronouncephp all --destination=database


Successfully wrote to database

hyphenate

Hyphenate a word or words. Note that this function is mostly accurate, but there may be some errors. If you find an error, please report it so I can add the word to the exception list.

Syntax overview
pronounce-php hyphenate words_to_hyphenate [options]
Options
--destination [-d]

Set the output destination. Default is to output a table to the console. If file is selected, fields will be seperated by a forward slash (/) surrounded by spaces.
Available desitinations: [table, string, file, database]

pronounce-php hyphenate words_to_hyphenate --destination=file
--file [-o]

If 'file' is selected for the output destination, the 'file' option can be used to set a file name to write to. The default file name is 'output.txt' and is written to the pronounce-php directory.

pronounce-php hyphenate words_to_hyphenate --destination=file --file=my_file.txt
--symbol [-s]

Set the character to be used for hyphenation. The default value is a hyphen (-). Note: if writing to a file, keep in mind that items in the file will be divided by forward slashes(/) so setting the hyphenation symbol to forward slash will complicate reading of the file.

pronounce-php hyphenate words_to_hyphenate --symbol=/
Examples

Basic usage

./pronounce-php hyphenate hello


+-------+-----------------+
| word  | hyphenated_word |
+-------+-----------------+
| hello | hel-lo          |
+-------+-----------------+

A comma seperated list of words may also be given.

./pronounce-php hyphenate basket,curtain,hyphenate


+-----------+-----------------+
| word      | hyphenated_word |
+-----------+-----------------+
| basket    | bas-ket         |
| curtain   | cur-tain        |
| hyphenate | hy-phen-ate     |
+-----------+-----------------+

Change the symbol used to divide word with the 'symbol' option.

./pronounce-php hyphenate machine --symbol=.


+---------+-----------------+
| word    | hyphenated_word |
+---------+-----------------+
| machine | ma.chine        |
+---------+-----------------+

Set the ouput destination with the 'destination' option. Only one destination may be choosen. Setting 'destination' to 'string' produces a string instead of a table.

./pronounce-php hyphenate flower,mountain --destination=string


word: flower hyphenated word: flower 
word: mountain hyphenated_word: moun-tain 

Setting 'destination' to 'file' writes the output to a file. The default file is 'output.txt'.

./pronounce-php hyphenate cupcakes,headphones --destination=file


Successfully wrote to output.txt

ouput.txt

cupcakes / cup-cakes / 
headphones / head-phones / 

If 'destination' is set to 'file', use the 'file' option to specify a file to write to.

./pronounce-php hyphenate reading,eating,shopping --destination=file --file=hyphen.txt


Successfully wrote to hyphen.txt

hyphen.txt

reading / read-ing / 
eating / eat-ing / 
shopping / shop-ping / 

If 'destination' is set to 'database', database credentials will be read from .env and configuration will be read from config.php.

./pronouncephp hyphenate goodbye --destination=database


Successfully wrote to database

lookup

Look up a word and output the Arpabet, IPA and Spelling approximation pronunciation strings. The lookup command takes one argument: the word or words to be looked up.

Syntax overview
pronounce-php lookup words_to_lookup [options]
Options
--destination [-d]

Set the output destination. Default is to output a table to the console. If 'file' is selected, fields will be seperated by a forward slash (/) surrounded by spaces.
Available desitinations: [table, string, file, database]

pronounce-php lookup words_to_lookup --destination=string
--fields [-f]

Set the output fields to be displayed. Fields must be in a comma seperated list. All fields are enabled by default.
Available fields: [word, arpabet, ipa, spelling]

pronounce-php lookup words_to_lookup --fields=word,arpabet,ipa,spelling
--file [-o]

If 'file' is selected for the output destination, the 'file' option can be used to set a file name to write to. The default file name is 'output.txt' and is written to the pronounce-php directory.

pronounce-php lookup words_to_lookup --destination=file --file=my_file.txt
--hyphenate [-y]

If the 'hyphenate' flag is given, applicable fields will be hyphenated. Currently, only the 'word' field may be hyphenated.

pronounce-php lookup words_to_lookup --hyphenate
--multiple [-m]

For some words in the CMUdict file, there are multiple pronunciation entries. The file deals with these by appending a number in parentheses to each additional entry.

ACERO  AH0 S EH1 R OW0
ACERO(1)  AH0 S Y EH1 R OW0
ACERO(2)  AH0 TH EH1 R OW0

The 'multiple' command sets the behavior to deal with thses multiple entries for the output. By default, 'multiple' is set to 'none', which outputs entries exactly as they appear in the CMUdict file. 'repeat' outputs entries without the parentheses or numbers.

pronounce-php lookup words_to_lookup --multiple=repeat
--symbol [-s]

Set the character to be used for hyphenation. The default value is a hyphen (-). Note: if writing to a file, keep in mind that items in the file will be divided by forward slashes(/) so setting the hyphenation symbol to forward slash will complicate reading of the file.

pronounce-php lookup words_to_lookup --symbol=_
Examples

Basic usage

./pronounce-php lookup hello


+-------+--------------+--------+----------+
| word  | arpabet      | ipa    | spelling |
+-------+--------------+--------+----------+
| hello | HH AH0 L OW1 | hʌɫoʊ' | huhloh'  |
+-------+--------------+--------+----------+

A comma seperated list of words may also be given. Note that words will be returned in alphabetical order.

./pronounce-php lookup elephant,zebra,giraffe


+----------+---------------------+----------+------------+
| word     | arpabet             | ipa      | spelling   |
+----------+---------------------+----------+------------+
| elephant | EH1 L AH0 F AH0 N T | ɛ'ɫʌfʌnt | e'luhfuhnt |
| giraffe  | JH ER0 AE1 F        | dʒɝæ'f   | jura'f     |
| zebra    | Z IY1 B R AH0       | zi'brʌ   | zee'bruh   |
+----------+---------------------+----------+------------+

Using the 'hyphenate' flag hyphenates the 'word' field.

./pronounce-php lookup money,coffee,schedule --hyphenate


+-----------+------------------+----------+----------+
| word      | arpabet          | ipa      | spelling |
+-----------+------------------+----------+----------+
| cof-fee   | K AA1 F IY0      | kɑ'fi    | ko'fee   |
| mon-ey    | M AH1 N IY0      | mʌ'ni    | muh'nee  |
| sched-ule | S K EH1 JH UH0 L | skɛ'dʒʊɫ | ske'juul |
+-----------+------------------+----------+----------+

Use the 'symbol' option to set the character used for hyphenation.

./pronounce-php lookup monkey,furry --hyphenate --symbol=~


+---------+----------------+--------+-----------+
| word    | arpabet        | ipa    | spelling  |
+---------+----------------+--------+-----------+
| fur~ry  | F ER1 IY0      | fɝ'i   | fur'ee    |
| mon~key | M AH1 NG K IY0 | mʌ'ŋki | muh'ngkee |
+---------+----------------+--------+-----------+

Set desired output fields with the 'fields' option. Fields will be displayed in the order given.

./pronounce-php lookup blue,red,green --fields=word,ipa


+-------+-------+
| word  | ipa   |
+-------+-------+
| blue  | bɫu'  |
| green | gri'n |
| red   | rɛ'd  |
+-------+-------+

Set the ouput destination with the 'destination' option. Only one destination may be choosen.
Setting 'destination' to 'string' produces a string instead of a table.

./pronounce-php lookup desk,chair,pencil --destination=string


word: chair arpabet: CH EH1 R ipa: tʃɛ'r spelling: che'r 
word: desk arpabet: D EH1 S K ipa: dɛ'sk spelling: de'sk 
word: pencil arpabet: P EH1 N S AH0 L ipa: pɛ'nsʌɫ spelling: pe'nsuhl 

Setting 'destination' to 'file' writes the output to a file. The default file is 'output.txt'.

./pronounce-php lookup guitar --destination=file


Successfully wrote to output.txt

ouput.txt

guitar / G IH0 T AA1 R / gɪtɑ'r / gito'r /

If 'destination' is set to 'file', use the 'file' option to specify a file to write to.

./pronounce-php lookup night,day,noon --destination=file --file=words.txt


Successfully wrote to words.txt

words.txt

day / D EY1 / deɪ' / dey' / 
night / N AY1 T / naɪ't / nahy't / 
noon / N UW1 N / nu'n / noo'n /

If 'destination' is set to 'database', database credentials will be read from .env and configuration will be read from config.php.

./pronouncephp lookup goodbye --destination=database


Successfully wrote to database

Database Usage

Requirements

If you wish to fill a database with the information gained from using this program, you must be sure that your database meets the following requirements:

  • Tables must have an auto-incrementing 'id' column
  • Column names must exactly match the expected field names.
    • All command field names: 'word', 'hyphenated_word', 'arpabet', 'ipa', 'spelling'
    • Hyphenate command field names: 'word', 'hyphenated_word'
    • Lookup command field names: 'word', 'arpabet', 'ipa', 'spelling'

Use the 'field' option to set which fields you wish to insert into your database.

Setup

First, copy the .env.example file (found in the pronounce-php root folder) to a new file called .env. Open the .env file in a text editor and enter applicable database information.

Next, open config.php in a text-editor. In the 'database' field, enter in the database type you are using. Currently, only Mysql is supported (see below for information about other database types). If you wish, you can change the charset in the 'connections' field, but the default 'utf8' should satisfy most people. That should be all you have to do. The other information in the file is pulled in from the .env file you setup in the previous step.

Other database types

The database connection uses php PDO drivers that can be changed out fairly easily. Currently, PDO supports 12 database types. Check the driver list for more information. If you wish to make an adapter for one of these database types, adapter name rules must be followed.

  • Cuprid: CupridDatabase
  • FreeTDS / Microsoft SQL Server / Sybase: DblibDatabase
  • Firebird: FirebirdDatabase
  • IBM DB2: IbmDatabase
  • IBM Informix Dynamic Server: InformixDatabase
  • MySQL: MysqlDatabase
  • Oracle Call Interface: OciDatabase
  • ODBC v3 (IBM DB2, unixODBC and win32 ODBC): OdbcDatabase
  • PostgreSQL: PgsqlDatabase
  • SQLite 3 and SQLite 2: SqliteDatabase
  • Microsoft SQL Server / SQL Azure: SqlsrvDatabase
  • 4d: FourD (A class naming rule exception exists for this, but it is untested)

The adapter class should be in its own file in src/Database/Databases/ and must implement DatabaseInterface. If you make a new adapter, please let me know so I can include it in the main program. If you dont know how to write a new adapter, let me know and Ill do it if time permits.

Besides making an adapter, you will also have to make a new array for the database in 'connections' in config.php.