- Semesters - Project Work by Semester
- Experiments - List of speech experiments
- Unix Notes
- Speech Corpus Setup - Switchboard, NOAA
- Speech Recognition Related Readings
- Experiment Setup
- Scripts Page
- Model Building - more info on data prep, language models, & building models
Run a Train Setup Script
This guide will go over the steps in creating your Experiment directory so you can start a new train.
Before running the train, you need to add a base experiment to the master experiment page as well as adding a sub experiment on the media wiki. You will do this by executing the addExp.pl script.
Once executed, the script will prompt you to enter your login info for the domain. After you enter this information, the script will check to see what the last experiment created was by its experiment number. It will then increment this and assign you the next available number*.
The script will then ask you what your experiment name is. After entering this, the script will prompt you to enter the author's name, in this case that is your name. Finally, the script will ask you to enter a brief description of what your experiment hopes to accomplish or test. The script will fill in all other information automatically, and if this process is done correctly, the script will return with a message saying "Your experiment number is (your experiment number). Please go to Caesar and make a directory for this experiment."
*NOTE: Check that the base experiment number is correct. If someone creates a experiment without using this script and skips one or more numbers the auto increment will continue after the previously created experiment even if it was entered incorrectly.
1. Setup Experiment Directory
This first step will create your new Experiment directory. You should have already successfully ssh'd into Caesar and then ssh'd into one of your assigned machines from following this step.
So now we need to go inside the Exp directory which is located at:
Once inside, you need to find the next Experiment number available (NOTE: this may have been assigned to you by Professor Jonas depending on how he's handling new Experiment directories after Spring 2014). If he didn't assign you one, use the
ls command in your terminal to list all the current Experiment directories. Find the next number in the sequential order and run this command:
mkdir <new ExpId> i.e.
2. Change Directory
Once the new experiment directory is created, cd into it.
cd <new ExpId>. You will need to create a sub experiment by running another
mkdir <SubExpId> i.e.
mkdir 001. Make sure to
cd into this new sub experiment directory.
If you forget to do this step your train will fail when you get to step 6.
3. Create Directory Structure
Now we are all setup in running the setup scripts. The makeTrain.pl script handles a majority of the work. It builds the files and directory structure for the experiment, modifies the CFG file to set the db name and directory, creates a symlink to the master audio list (wav) by calling genTrans.pl. It then runs pruneDictionary.pl, which generates and adds words to the filler and phones files and a few other small tasks.
It takes two arguments, which is the master data corpus to use, in this case switchboard. the second being the user generated sub corpi in this case 30hr/train:
makeTrain.pl switchboard 30hr/train
Advanced ignore this unless you need it for a special use case: If you need to create an experiment that uses the /mnt/main/corpus/switchboard/30hr/test/trans .trans files for use on unseen and testing purposes you can do this by calling the built in flags available. -d: Points to the <corpus_dir>/trans/dev.trans -e: Points to the <corpus_dir>/trans/eval.trans -t: Points to the <corpus_dir>/trans/train.trans
Note:if you use a flag than you should point the <corpus_dir> (argument 3) to the test/trans location within the corpus directory.
makeTrain.pl -t switchboard 30hr/test
4. Modify Train Configuration
This section is for advanced users who have done the research and read the guide on how to do this: | Sphinx Train Configuration Guide
For your first Experiment you can use the default values. Once you are trying to improve performance, configure the values by editing sphinx_train.cfg which will be in your experiment's /etc directory after you run 'prepareExperiment.pl'.
How to Edit
Typing the following command allows you to edit the file:
- To Edit Density
- Once vi'd into sphinx_train.cfg, scroll through the text until you find the following variable: $CFG_FINAL_NUM_DENSITIES = 8;
- The default value is set to 8 for density. To change this, simply move the cursor with the arrow keypad and delete and replace with a density of your choosing.
- Note: Density values must be multiples of 2 (8, 16, 32, etc.)
- To Edit Senones
- Once vi'd into sphinx_train.cfg, scroll through the text until you find the following variable: $CFG_N_TIED_STATES = 1000;
- The default value is set to 1000 for senones. To change this, simply move the cursor with the arrow keypad and delete and replace with a senone value of your choosing.
5. Generate Feats Data
First 'cd' back into the top level of your sub-experiment. For example: "/mnt/main/Exp/0283/008"
Next up is to just run this script as it is:
6. Run the Train
And now that the entire Experiment directory is setup with all the proper data and configurations, we can now run the train. Make sure you type the TOP command before starting a train to see if someone is already running a train and if so please switch to another server. Run this script exactly as is. Note: the nohup and .& allow you to essentially "disconnect" from the computer on your end and still have the script be running on the machine you're ssh'd into. VERY IMPORTANT YOU DO THIS.
nohup scripts_pl/RunAll.pl &