Speech:Models AM Build


 * Home
 * Semesters - Project Work by Semester
 * [Information]
 * System Description
 * Experiments - List of speech experiments

Project Notes

 * Unix Notes
 * Speech Corpus Setup - Switchboard,  NOAA
 * Speech Recognition Related Readings
 * Experiment Setup
 * Scripts Page
 * Model Building - more info on data prep, language models, & [building models]
 * Step 1: Run a Train
 * Step 2: Create the Language Model
 * Step 3: Run a Decode

Model Building: Building (train) & Verifying (decode) Acoustic Models
Description of initial setup and preparation of data needed to build statistical language models and generate a robust set of acoustic models (training) and verifying them by testing (decoding) on the trained corpus. For detailed steps on how to train and decode, see the sub-steps under Model Building above.

Some Helpful Online Resources by CMU

 * Building acoustic model
 * Building language model

The above links have more details about training and decoding acoustic models. fill this in with information

The task at hand is to create an acoustic and language model by setting up a "trainer" and "decoder".

Building an Acoustic Model
A mini train and decode was completed several times with different data following these steps. The purpose of this task is to take conversations saved in the .wav format and their transcripts to be able to create a speech recognition tool. The trainer grabs the .wav files, phonemes dictionary, dictionary, and transcript of the conversations. It then matches up the audio with the transcript. In order for the trainer to do this it needs a dictionary with every word that's in the transcript. Then it needs an accurate phoneme dictionary with every word that is in the dictionary.

Verifying an Acoustic Model
The decoder is necessary to check to see if the trainer actually worked and how accurate it is. This is completed by running one script, run_decode.pl


 * The table described above lists all of the scripts used to run the trainer. Some of the scripts, as shown above, have different scripts that are called. The location of these scripts are listed in the table
 * The main directory of the scripts is located in /speechtools/SphinxTrain-1.0/taskName (e.g. train1)
 * Executable files are also called by the scripts
 * Location of the executable files is in /mnt/main/local/bin/
 * make_feats.pl calls the sphinx_fe executable
 * Inside of Runall.pl it calls a script, slave_align.pl which is located in the /04.vtln_align/ directory
 * slave_align.pl calls the sphinx3_align executable file


 * Decode
 * There are a couple of scripts used to build the language model in order to decode
 * lm_create.pl is used, this script calls the sphinx_lm_convert executable file
 * This webpage has detailed information on how a basic trainer works. Obviously ours is configured for us so this isn't exactly how we run the trainer or decoder. However, it does have a lot of detailed information that could be useful for people to understand how it works.
 * http://cmusphinx.sourceforge.net/wiki/tutorialam