Speech:Exp


 * Home
 * Semesters - Project Work by Semester
 * [Information]
 * System Description
 * Experiments - List of speech experiments

Project Notes

 * Unix Notes
 * Speech Corpus Setup - Switchboard,  NOAA
 * Speech Recognition Related Readings
 * [Experiment Setup]
 * Scripts Page
 * Model Building - more info on data prep,  language models, &  building models
 * Step 1: Run a Train
 * Step 2: Create the Language Model
 * Step 3: Run a Decode

What is an Experiment
This is a multi step process done in order to take sets of data in order to train and decode the data. By taking data the experiment is to achieve a low error rate by train and decoding the data.

What is a Train
We run a train in order to build an acoustic model for use in a decode. An acoustic model is created by taking audio recordings of speech, and their text transcriptions, and using software to create statistical representations of the sounds that make up each word. It is used by a speech recognition engine to recognize speech.The first thing the trainer does is verify that it has everything it needs to build a model. Checking to see if:

Transcript list is valid, it can find audio files the transcript references and vice versa. The experiment dictionary contains all the words used in the transcript. All Phones used in the dictionary are defined in the .phone file.

Some quick links regarding Training and Acoustic Models:
 * Detailed Guide on Data Prep
 * CMU Acoustic Model Training

What is a Language Model
A language model is used to restrict word search. It defines which word could follow previously recognized words (remember that matching is a sequential process) and helps to significantly restrict the matching process by stripping words that are not probable. To reach a good accuracy rate, your language model must be very successful in search space restriction. This means it should be very good at predicting the next word. A language model usually restricts the vocabulary considered to the words it contains.

Some quick links regarding Language Models:
 * Language Models and Building Them
 * CMU Language Models

What is a Decode
Decoding is essentially the ability to interpenetrate audio to text. We want to make our Acoustic model as strong as possible in an attempt to get the best decode results. And a decoding is done by:

As suggested in the instructions, you can use an acoustic model different from the one within the current experiment directory. This is mainly for simple decode experiments, where the objective is to evaluate an existing model and thus doesn't require training. It still needs a dictionary, feats, a transcript, and a language model created within the experiment.

After the script finishes you should have a file called decode.log that has the results of the decode. There should be little no output from the decoder. All data about the decoding process is within that logfile.

The decoder may take a little while to run. Like the trainer, it isn't consistent with the length of the audio/transcript data sent to it. It isn't uncommon for it to take more than a few hours.

Archived Experiment Setup Pages
Archive 1 : This page was archived and replaced during spring 2014