Speech:Train Archive First


 * Home
 * Semesters - Project Work by Semester
 * [Information]
 * System Description
 * Experiments - List of speech experiments

Project Notes

 * Unix Notes
 * Speech Corpus Setup - Switchboard,  NOAA
 * Speech Recognition Related Readings
 * Experiment Setup
 * Scripts Page
 * Model Building - more info on data prep,  language models, &  building models
 * Step 1: Run a Train
 * Archive
 * Wizard Style
 * All In One
 * [First Edition]
 * Step 2: Create the Language Model
 * Step 3: Run a Decode

First Edition
Read following before starting: 


 * 1) Replace all instances of:  with your experiment number!
 * 2) *Experiment numbers are 4 digits long (includes any preceding zeros), starting from 0001 to 9999.
 * 3) *Do not include the '<' or '>'.
 * 4) Similarly, replace all items encapsulated in < and > with the appropriate text.
 * 5) * Usually its a filename/path.
 * 6) *Do not include the '<' or '>'.
 * 7) Pay attention as to what directory you execute scripts in!
 * 8) *Certain scripts need to be executed in specific directories.
 * 9) DO copy and paste commands from this page. Do NOT copy and paste multiple commands from this page at once.
 * 10) *Most commands/scripts on this page need specific information added specific to your experiment. If you paste multiple commands at once into the terminal without adding in this information, bad things may result.
 * 11) Percent signs (%) indicate a command to be executed on the shell.
 * 12) *Leave them out  when copying a command from this page.
 * 13) Do NOT execute any of the following commands as root.
 * 14) *While it won't result in any of the following consequences, it does mess up the permissions for any directory and files created during the process.
 * 15) **This effectively blocks others from accessing the data derived from the experiment. Which isn't a very nice thing to do.


 * Please note:
 * The Base Experiment directory is specific to each experiment, and refers to
 * The Root Experiment directory is generic to all experiments, and refers to


 * Failure to pay heed to the above may result in:


 * 1) At best: Script failure.
 * 2) At worst: Data deletion.
 * 3) Very annoyingly: Will create a mess.
 * 4) But most annoyingly: Will create a mess in a publicly used directory such as /mnt/main/Exp.

Steps for running a Train
September 6th (Cedric Woodbury) - Changes have been made to the entire process during the Summer 2012 Semester. To see the that process click here.

Important information: March 22, 2013 (Eric Beikman) - This document has been updated for legibility and to reflect the changes incorporated by (Cedric Woodbury), along with additions and clarifications from the Spring 2013 Modelling group Speech:Spring_2013_Modeling_Group.

---

March 22: This page is currently undergoing updates/revisions by the spring 2013 modeling group. Please let the Modeling team know if there are any errors.

<font color='green'>Set up the task directory:

 * 1) Navigate to the Experiment root directory:
 * 2) * % cd /mnt/main/Exp
 * 3) Create the specific experiment folder using your specific experiment number. All experiment specific data, including any created models and test data, will reside within this directory.
 * 4) * The number should follow the last experiment number created.
 * 5) * % mkdir <experiment #>
 * 6) Move into your experiment directory:
 * 7) * % cd <experiment # >
 * 8) Prep the experiment directory. This process creates all sub-folders, copies over some essential scripts (though not all), and imports a generic train configuration file (sphinx_train.cfg).
 * 9) * % /mnt/main/root/tools/SphinxTrain-1.0/scripts_pl/setup_SphinxTrain.pl -task <experiment #>
 * 10) Similarly, we set up the necessary Decode files.
 * 11) * % /mnt/main/root/sphinx3/scripts/setup_sphinx3.pl -task <experiment #>
 * 12) * Do not execute this in the root experiment folder!
 * It will make a mess.

<font color='green'>Set up the Sphinx Train Configuration file:

 * 1) The next few steps require us to be in the ./etc directory. Go there using % cd etc
 * 2) * This directory contains quite a bit of important files throughout the experiment process. Including but not limited to the Sphinx configuration file, the transcript, the experiment dictionary, the experiment file-IDs, and quite a bit of other important files.
 * 3) We now need to modify the Sphinx trainer configuration file (sphinx_train.cfg) to match our specific experiment environment.
 * 4) * This file is used by the Sphinx trainer and during the Feats creation, among others processes.
 * 5) *Use your favorite text editor and open it up for editing.
 * 6) ** Emacs may or may not be installed on the machine, we recommend using VI/VIM. Open it up using % vi sphinx_train.cfg
 * 7) *** Use the following guide to get started if you aren't familiar: vi editor quick Reference
 * 8) * The lines we are interested in changing are on lines 6 through 8 and 79 & 80.
 * 9) *Edit them so they look like the following: Substituting <experiment #> for your current experiment number like always.
 * &#35;These are filled in at configuration time
 * $CFG_DB_NAME = "<experiment #>";
 * $CFG_BASE_DIR = "/mnt/main/Exp/<experiment #>";
 * 1) *Comment out the line on line 80 by inserting a hash/pound/number symbol (#) in front of it; likewise, uncomment the line on line 79 by removing that symbol. It should look like the following when done:
 * $CFG_HMM_TYPE = '.cont.';# Sphinx III
 * &#35;$CFG_HMM_TYPE = '.semi.'; # Sphinx II
 * 1) **This portion indicates what format the Sphinx Trainer will output for the Acoustic model. We want the Sphinx3/Continuous format, using the Sphinx II/semi-continuous format will cause the Sphinx-3 decoder to error out.
 * 2) *Save and exit the editor.
 * 3) **In Vi this can be done by pressing the escape key (ESC), then typing :wq
 * 4) ***(Remember this with "Command == colon, then Write the file, after that Quit" the editor)
 * 1) *Save and exit the editor.
 * 2) **In Vi this can be done by pressing the escape key (ESC), then typing :wq
 * 3) ***(Remember this with "Command == colon, then Write the file, after that Quit" the editor)

<font color='green'>Generate the transcript and its associated audio-file list.

 * 1) We now need to generate the transcripts to be used.
 * 2) *Transcripts consist of two portions:
 * 3) The text transcript files: <experiment #>_train.trans
 * 4) The audio file ID list which contains the list of audio files which make up the transcript: <experiment #>_train.fileids
 * 5) *To generate these, we need to do the following:
 * 6) Determine a corpus subset to use.
 * 7) Run the genTrans8.pl script
 * 8) *The main Corpus subsets are found in /mnt/main/corpus/switchboard/
 * 9) **Do NOT simply use a directory found above as it will error out. Rather, within these directory there are further subsets, have genTrans8.pl use those.
 * 10) ***These directories represent different corpus subsets to use for various stages of making and testing models. Each of those directories contain both audio files and textual transcripts (though neither are in a format that we can use directly).
 * 11) ***For example, /mnt/main/corpus/switchboard/mini/ contains "./dev", "./eval", and "./train". "./train" would be used for training and "./eval" would be used for evaluating the resulting model in a subsequent experiment.
 * 12) ****Note: Do not use /mnt/main/corpus/switchboard/mini/dev, it references missing audio files, causing issues.
 * 13) * Now, once you pick a corpus subset to use. Execute the following script <font color='red'>from your base experiment directory! I.E.
 * 14) *Go to your experiment folder <experiment #>. If currently in the etc directory, go there using % ..
 * 15) ** % /mnt/main/scripts/user/genTrans8.pl <experiment #>
 * 16) ***For example, to create a transcript for experiment 0028 with corpus subset mini/train execute: % /mnt/main/scripts/user/genTrans8.pl /mnt/main/corpus/switchboard/mini/train 0028
 * 17) **genTrans8.pl may take a little bit to process, especially if the transcript is long.
 * 18) ***In the event of an error message: it may be wise to restart the experiment with a different corpus subset or ensure that the audio/transcript files referenced in the messages are removed from both the transcript and fileID list.

<font color='green'>Create the experiment dictionary (and copy over the filler dictionary):

 * Once the transcript is successfully created, we now need to create a custom dictionary for this experiment.
 * This dictionary file resides in the etc directory and has the file-name <experiment #>.dic The dictionary is a file which contains a list of words along with its corresponding pronunciation in Arpabet format.
 * The reasons for creating a custom dictionary for the train is simple: Train and decode Speed.
 * If we use a dictionary containing all the words in the English language, the machine will have to parse through quite a bit of words that will never be used in the transcript just to find what we need; introducing a delay.


 * 1) Go to your experiment's  directory.
 * 2) * cd etc
 * 3) The script that prunes the master dictionary, creating a new dictionary with only the words we are interested in is pruneDictionary2.pl.
 * 4) *It needs to be executed in the etc directory of your experiment.
 * 5) **It has three arguments (in order), the name of the transcript to generate a word list from, a "Master" dictionary to reference from, and the file name of the new dictionary to be created.
 * 6) **Normal useage is as follows: % /mnt/main/scripts/train/scripts_pl/pruneDictionary2.pl <experiment #>_train.trans /mnt/main/corpus/dist/custom/switchboard.dic <experiment #>.dic
 * 7) ***The switchboard.dic has pronunciations for every unique word in our corpus. This includes all partially pronounced words.
 * 8) **This dictionary creation process can be very time consuming and is based on the size of the master dictionary and the amount of unique words in the transcript. It may take a while if you have a lot of words.
 * 9) Next, copy over the filler dictionary into the same directory '<experiment #>/etc
 * 10) * % cp -i /mnt/main/root/tools/SphinxTrain-1.0/train1/etc/train1.filler <experiment #>.filler
 * 11) *The Filler dictionary is composed of non-speech events, mapping them to user-defined phones. See [here]

<font color='green'>Generate the phone list.

 * 1) We now need to generate the phone list.
 * 2) *Phones are the smallest component of a phonetic transcription code (such as Arpabet), they represent how each part of a word sounds like.
 * 3) *First things first: Copy the genPhones.csh script to your etc folder:
 * 4) ** % cp -i /mnt/main/scripts/user/genPhones.csh.
 * 5) *Execute it with: % ./genPhones.csh <experiment #>
 * 6) We need to insert a new phone into the <experiment #>.phone list created in the last step.
 * 7) *Use your favorite text editor and edit <experiment #>.phone
 * 8) **Insert SIL in the appropriate alphabetic-ordered spot. Not doing this will cause the trainer to error out.

<font color='green'>Generate Feats data.
NOW, we can do the second to last step: Creating the feats data. Feats data, short for Features, is used in training and is derived from the recordings. The data derived from this step is also used when decoding.
 * 1) To create Feats for your train, simply execute In your base experiment folder:
 * 2) * % /mnt/main/scripts/train/scripts_pl/make_feats.pl -ctl /mnt/main/Exp/<experiment #>/etc/<experiment #>_train.fileids
 * 3) **For example, to create the feats data for experiment 0028, execute: % /mnt/main/scripts/train/scripts_pl/make_feats.pl -ctl /mnt/main/Exp/0028/etc/0028_train.fileids

<font color='green'>Start the Train!
NOW: We can finally start the train.
 * 1) * Run the following in your Base experiment folder:
 * 2) ** % nohup /mnt/main/scripts/train/scripts_pl/RunAll.pl &
 * 3) ***The nohup and & are used to run the train in the background. This allows the train to continue even if you close your SSH session.
 * 4) *The first thing the trainer does is verify that it has everything it needs to build a model. Checking to see if:
 * 5) Transcript list is valid, it can find audio files the transcript references and vice versa.
 * 6) The experiment dictionary contains all the words used in the transcript.
 * 7) All Phones used in the dictionary are defined in the <experiment #>.phone file.

<font color='red'>Please note: Trains will usually fail the first time executing RunAll.pl!

It will output what is wrong on the terminal, but also in a HTML file located at the Experiment base directory. It is called <experiment #>.html and can be opened with % lynx <experiment #>.html

Resolve the issues and execute RunAll.pl again.

Usually these initial errors are related to the trainer finding a word used in the transcript but not defined in the experiment dictionary (<experiment #>.dic). To resolve this: Reference the instructions in the next section.

Trains will usually run less than the length of audio data provided, though this isn't an exact rule. The Npart option in the sphinx_train.cfg file allows us to use 2 CPU's at the same time. Essentially cutting the time in half. Using anything greater than that will not give much of a gain regarding time. Unlike the decoder, it outputs to the terminal a nice steady flow of status messages, of which it will also put this data in the <experiment #>.html file at the base experiment directory for future reference. It has a series of stages it calls "Modules" ranging from module 1 to module 99; the error checking part is module 1, the final module is 99 which is the Sphinx-II model conversion (which you have disabled by editing the sphinx_train.cfg file, right?). The most time consuming portions of the train, which is the actual model building parts, are modules 40-49. Please note that some module numbers are skipped over, so there may not actually be 99 individual modules.

After completion, you have successfully created the Acoustic model!

Issue 1:

 * The Trainer cannot find words referenced in the transcript within the dictionary!


 * <font color='green'>Symptoms:

When looking at the trainer output (both terminal output and within the <experiment #>.html logfile), you will see these errors show up similar to: WARNING: This word: DUCTWORK was in the transcript file, but is not in  the dictionary ([DEL: WHICH IS TOTALLY LEGAL BUT THE COST OF DOING THIS   IS ASTRONOMICAL THEY ACTUALLY SHAVE UP DUCTWORK AND THINGS AND SO WE'RE   UH VERY VERY UH COGNIZITIVE AND AWARE OF ALL THESE TYPE OF UH :DEL] ). Do cases match?


 * <font color='green'>Summary:

This is perhaps the most common issue when starting a train. The transcripts will contain files which aren't found in the master dictionary (/mnt/main/corpus/dist/cmudict.0.6d), or even contain words which aren't even spelled correctly!


 * <font color='green'>Solution:

There are three steps needed to resolve this issue:
 * 1) Getting a list of words to be added to the dictionary.
 * 2) Generating the dictionary entries for these words.
 * 3) Inserting these entries into the experiment dictionary.


 * 1. Getting the list of words to be added to the dictionary.

The Sphinx trainer is fairly vocal in regards to missing word errors. It will spit out this list onto the terminal before quitting. That being said, if there are many words that are missing, the list may be longer than the terminal client's buffer, effectively cutting it off partway.

Thankfully, the trainer will create an HTML logfile at your <font color='green'>base experiment directory with the name, this document contains everything that was outputted to the screen by the trainer. To take a look at it, use the terminal-based web-browser lynx. lynx <experiment #>.html

Use the up and down arrows to scroll up and down. Press q then y to exit lynx.

The list of words which caused the issue are usually at the bottom of the output.

<font color='red'>Please note:  Each time you run the Sphinx trainer, the output will be added to the end of this document. So in other words, to get the list of words preventing the last executed train from running: Scroll all the way down!

<font color='red'>Before proceeding to the next step: 
 * 1) Open up a text editor on your local desktop.
 * 2) *You know, like notepad.
 * 3) Copy each word from the terminal and paste it over to this document.
 * 4) *Please note that some of these words may be slightly misspelled. Using copy-paste is recommended.


 * 2. To get the phonetic spelling for a word:
 * 1) You could search for the word at the CMU Pronouncing dictionary..
 * 2) *Be sure to <font color='red'>click on the "Show lexical stress" check-box before searching! The trainer expects these lexical stress indicators, which are the numbers 0 through 2 which are attached to the end of certain phones, they slightly modify how the phone is pronounced.
 * 3) *If you are trying to find a number, type the number out as a word instead of an actual numeric character. (I.E. "seven" instead of "7").
 * 4) *Also, <font color='red'>do not include the periods that the dictionary puts at the end of each word! It will cause the trainer to error out.
 * 5) Generate the phonetic spelling based on similar words.
 * 6) *This method is especially useful when pronouncing compound words.
 * 7) **For example, to create the phonetic spelling for Sawmill, get the phonetic spellings of Saw (S AO1) and Mill (M IH1 L)from the CMU pronouncing dictionary, concatenating each one at the end to form S AO1 M IH1 L
 * 8) Generate the phonetic spelling yourself. This way is a bit harder, I only recommend doing it if you can't find word in the previous methods.
 * 9) Get the IPA spelling from a good dictionary
 * 10) Using the IPA to Arpabet phoneme comparisonlist. Translate each IPA symbol from the dictionary to the matching Arpabet symbol.
 * 11) *You will need to add the stress values at the end of each stressed syllabic vowel.

Prepare each word for which you have gotten a pronunciation for by making a new file either on the remote machine (call it add.txt or something like that, its needed for the second dictionary update method), or on your local desktop (best for the first dictionary update method). The dictionary file is in the following format: SOUTHBEND S AW1 TH B EH1 N D VOCALIZED V OW1 K AH0 L AY2 Z D MOOSEWOOD M UW1 S W UH2 D UNDERGRAD AH1 N D ER0 G R AE1 D GTE JH IY1 T IY1 IY1 MARYLANDER M EH1 R IY0 L AE2 N D ER0 MARYLANDER'S M EH1 R IY0 L AE2 N D ER0 Z PLANOITE P L EY1 N OW0 AY0 T DADGUM D AE1 D G AH1 M EXPERIENCEWISE  IH0 K S P IH1 R IY0 AH0 N S W AY1 Z CANSEGO  K AE1 N S EY1 G OW1 HOPELY HH OW1 P L IY0 STORLY S T AO1 R L IY0 KID'LL K IH1 D L

Notice how the entries in the dictionary are:
 * 1) Entirely upper-case.
 * 2) There is one word entry per line.
 * 3) There is a space (or two) between the grammatical spelling of the word and the first phone of its phonetic spelling.
 * 4) Vowel phones have stress indicators at the end, which are numbers ranging from 0 to 2.

This format is crucial, deviating from it is not recommended!

Important! Always keep a record of all additions you make to the dictionary! We can add them to the master dictionary, thus creating less problems for others when they try to run trains! Insert this list along with the results of your experiment!


 * 3. To add the updated word list to the dictionary:

There are two ways we can proceed. The first way is easiest if you only have not too many additions and aren't updating any existing pronunciations. The latter method isn't as tedious and repetitive than the first and thus MUCH more practical for adding lots of new dictionary entries. It also includes some more error checking as it looks for redundant dictionary entries in both the addition list and the dictionary; however,it requires more prep-work.

Method 1: Use built-in Unix commands. Now you have a nice updated dictionary! Start the train again (RunTrain.pl) and repeat the process if necessary.
 * 1) Go to your experiment's etc directory if you aren't already there.
 * 2) Make an initial backup of your dictionary.
 * 3) *This step is <font color='green'>optional, but is highly recommended in case you need to start over!
 * 4) * % cp ./<experiment #>.dic ./<experiment #>.dic.backup
 * 5) Rename the dictionary file by executing in your experiment's ETC directory:
 * 6) * % mv ./<experiment #>.dic ./<experiment #>.dic.old
 * 7) For each line of pronunciations to be added, execute:
 * % echo " " >> <experiment #>.dic.old
 * 1) *This will append each line to the bottom of the dictionary. You really should only do this one entry at at a time.
 * 2) * IMPORTANT: Ensure that each line you enter follows the format described above!
 * 3) **The trainer will NOT accept the newly added words otherwise.
 * 4) Now sort the updated dictionary alphabetically by executing:
 * 5) * % sort <experiment #>.dic.old >> <experiment #>.dic

Method 2: Use updateDict.pl

See Speech:Spring_2013_updateDict.pl or execute  for more information on the script and its usage. This script essentially merges two separate dictionary files together.
 * 1) Make a new directory called "temp" (or whatever you want really, the name itself doesn't matter)in your <experiment #>/etc folder:
 * 2) * mkdir temp
 * 3) move the dictionary file into the newly created "temp" directory mv <experiment #>.dic temp
 * 4) Go into <experiment #>/etc/temp.
 * 5) * cd temp
 * 6) Create or move the addition text file into etc/test.
 * 7) *Insure that it is in the same format as the dictionary, see above.
 * 8) Copy over the updateDict.pl script to etc/test.
 * 9) * % cp -i /mnt/main/scripts/user/updateDict.pl.
 * 10) Execute updateDict.pl
 * 11) * % ./updateDict.pl -m <experiment #>.dic <addition List>
 * 12) **The '-m' argument (short for 'merge') is required; not supplying it will result in script failure.
 * 13) ** updateDict.pl assumes that the dictionary will be given first, followed by the addition list. <font color='red'>Reversing this order is not a good idea.
 * 14) After updateDict.pl is done, move it back to the level above.
 * 15) * % mv <experment #>.dic ..

You may notice that updateDict.pl by default will create a new file called <experiment #>.dic.old in the directory it currently is in. This isn't truly a "new" file, but rather a backup of the initial dictionary you started with. Its useful in case you did something wrong and need to start over. The addition file is not edited by the program and thus no backup file is needed.

This script is useful when updating existing dictionary entries as well. Just put the updated entry (with updated pronunciation) into the addition text file. When updateDict.pl sees a redundant entry in the addition file but with a different pronunciation, if will prompt you as to which one to keep. You can force the script to assume that the pronunciation in the addition file is correct by adding 'f' to the list of parameters.

Issue 2:

 * make_feats.pl and/or genTrans2.pl are Erroring out! Hey, there aren't any wavefiles in my <experiment #>/wav directory either!


 * <font color='green'>Symptoms:
 * make_feats.pl is giving you the error message like:
 * INFO: fe_sigproc.c(771): Will not use double bandwidth in mel filter
 * INFO: wave2feat.c(139): /mnt/main/Exp/0030/wav/sw2001B-ms98-a-0012.sph
 * ERROR: "wave2feat.c", line 655: Cannot read /mnt/main/Exp/0030/wav/sw2001B-ms98-a-0012.sph
 * FATAL_ERROR: "wave2feat.c", line 90: error converting files...exiting
 * genTrans2.pl is giving you:
 * Error executing: <long Sox command>
 * Is sox installed?
 * There aren't any wavefiles in  after running the old genTrans.pl script (not genTrans2.pl).
 * Is sox installed?
 * There aren't any wavefiles in  after running the old genTrans.pl script (not genTrans2.pl).

<font color='red'>The following servers are known to be affected by this:
 * <font color='green'>Summary:
 * 1) Miraculix

If you experience any of the above symptoms. It is likely that Sox isn't installed on your specific server. Sox is used to extract Wav-files from the corpus's .sph files. With genTrans.pl, there was an annoying issue where it expected that Sox was already installed, and subsequently ignored any errors resulting from attempting to execute it; essentially this created a situation where the script would appear to run successfully, but actually didn't create any of the audio files for the experiment. As make_feats.pl needs these audio files, it would error out.


 * <font color='green'>Solution:

There are a few solutions to this issue:

To prevent issues later in the training process.
 * 1) Use genTrans2.pl
 * 2) *genTrans2.pl performs all the functions of genTrans.pl. Except it will stop itself and warn the user if Sox errors out. See [Speech:Spring_2013_genTrans2.pl] for more information.
 * 3) *If you experience an error in genTrans2.pl, continue to the following steps:
 * 4) If make_feats.pl (or genTrans2.pl) errors out:
 * 5) Verify sox is installed on the machine.
 * 6) *Simply execute the following command:
 * % sox
 * 1) *It you get a command-not-found error, then Sox isn't installed. If it prints out a usage sheet, then Sox is installed.
 * 2) If Sox isn't installed:
 * 3) *Re-run the "<font color='green'>Generate the transcript and its associated audio-file list " step while on a server that has Sox installed.
 * 4) **You could use Caesar for this.
 * 5) *** <font color='red'>Once you are done running genTrans2.pl, go back to your original server!
 * 6) If Sox is installed and you are still having issues:
 * 7) * <font color='red'>Contact the modelling team.
 * 8) Once genTrans2.pl has run successfully.
 * 9) *Verify that you have wave-files in
 * 10) *If you do, then re-run the <font color='green'>Generate Feats data. step. Then start the train (assuming you have completed all the previous steps)
 * 11) *If you are going through this process after you had ran the original genTrans.pl script, please note that you do not need to restart the experiment. Even though genTrans.pl didn't make the wavefiles for the experiment, it did do everything else it needed to do, such as making the transcript.

Issue 3:

 * The Trainer can't find phones used in the dictionary!


 * <font color='green'>Symptoms:


 * The Sphinx trainer is giving you errors similar to:
 * WARNING: This phone (AA) occurs in the dictionary
 * (/mnt/main/Exp/0025/etc/0025.dic), but not in the phonelist
 * (/mnt/main/Exp/0025/etc/0025.phone)
 * (/mnt/main/Exp/0025/etc/0025.phone)

Essentially the issue is what the error messages suggest, you use a phone in the dictionary that isn't in the phone list. This usually occurs after adding new entries in the experiment dictionary using invalid phones.
 * <font color='green'>Summary:

For each phone that is having the issue:
 * <font color='green'>Solution:
 * Step 1,
 * 1) If the phone is a | Vowel: <font color='red'>Make sure it has a | stress indicator at the end!
 * 2) Verify that the given phone is valid.
 * 3) *Reference the phone list on Wikipedia's | Arpabet page.

Once you determine what is wrong with the phones listed by the trainer:
 * Step 2,
 * 1) Go into the experiment dictionary using a text editor.
 * 2) *Remember, the dictionary is in your experiment's etc directory and has the name
 * 3) Search for each instance of the phone given.
 * 4) *In Vi, you can do this by hitting the forward-slash (/) key, typing in the search term, and pressing "Enter".
 * 5) **Pressing the "n" key will progress forward, pressing the "Shift" and "n" key will search backwards.
 * 6) *Tip: Add a space at the end of the phone while searching. It will eliminate almost all results within the grammatical part of the dictionary entries.
 * 7) For each instance of the provided phone:
 * 8) *Fix the phone as determined in Step 1.
 * 9) Once you are finished fixing all the appropriate phones. Restart the  script to retry the train. Repeat the above process if necessary.

Top Problems we ran into

 * 1) Incorrect Dictionary entries.
 * 2) setup_SphinxTrain.pl the SPHINXTRAINDIR = had to change to the /mnt/main/scripts/train/scripts_pl from $0
 * 3) make_feats.pl make sure feat directory has files.
 * 4) Running the RunAll.pl from the wrong directory.
 * 5) Putting SIL into the .phone
 * 6) Changing the catdir line to have dirname($0)