Speech:Models AM Build

From Openitware
Jump to: navigation, search

Project Notes

Model Building: Building (train) & Verifying (decode) Acoustic Models

Description of initial setup and preparation of data needed to build statistical language models and generate a robust set of acoustic models (training) and verifying them by testing (decoding) on the trained corpus. For detailed steps on how to train and decode, see the sub-steps under Model Building above.

Some Helpful Online Resources by CMU

The above links have more details about training and decoding acoustic models. fill this in with information

The task at hand is to create an acoustic and language model by setting up a "trainer" and "decoder".

Building an Acoustic Model

A mini train and decode was completed several times with different data following these steps. The purpose of this task is to take conversations saved in the .wav format and their transcripts to be able to create a speech recognition tool. The trainer grabs the .wav files, phonemes dictionary, dictionary, and transcript of the conversations. It then matches up the audio with the transcript. In order for the trainer to do this it needs a dictionary with every word that's in the transcript. Then it needs an accurate phoneme dictionary with every word that is in the dictionary.

Verifying an Acoustic Model

The decoder is necessary to check to see if the trainer actually worked and how accurate it is. This is completed by running one script, run_decode.pl

Script Details
Scripts Details Location
  • "$TRAIN is used in parapemters (/root/speechtools/SphinxTrain-1.0/$TASK. )
  • $HMM = $TRAIN . ""model_parameters/$TASK.cd_cont_1000""
  • $LM = $TRAIN . ""LM/tmp2.arpa""
  • $DICT = $TRAIN. ""etc/$TASK.dic""
  • $FDICT = $TRAIN . ""etc/$TASK.filler""
  • $CTL = $TRAIN . ""etc/$TASK"" . ""_train.fileids""
  • $CEPDIR = $TRAIN . ""feat""
  • $CEPEXT = "".mfc""
  • The ""."" idicates it is a concatenation"
genTrans.pl Grabs the unedited transcript and transcript. Removes all the characters prior to the speaker identification and utterance ID. Removes all unnecessary characters. Substitutes a black space for everything after white space. etc
genPhones.csh Creates the train#.phone file in the etc directory. etc
  • sphinx_fe
  • This script make_feats.pl will compute, for each training utterance, a sequence of 13-dimensional vectors (feature vectors) consisting of the Mel-frequency cepstral coefficients (MFCCs). Note that the list of wave files contains a list with the full paths to the audio files. Since the data are all located in the same directory as you are working, the paths are relative, not absolute. You may have to change this, as well as the an4_test.fileids file, if the location of data is different. The MFCCs will be placed automatically in a directory called 'feat'. Note that the type of features vectors you compute from the speech signals for training and recognition, outside of this tutorial, is not restricted to MFCCs. You could use any reasonable parameterization technique instead, and compute features other than MFCCs. CMUSphinx can use features of anytype or dimensionality. The format of the features is described on the page MFC Format .
  • 00.verify/verify_all.pl
  • 01.vector_quantize/slave.VQ.pl
  • 02.falign_ci_hmm/slave_convg.pl
  • 03.force_align/slave_align.pl
  • 04.vtln_align/slave_align.pl
  • 05.lda_train/slave_lda.pl (that's an L not one)
  • 06.mllt_train/slave_mllt.pl
  • 20.ci_hmm/slave_convg.pl
  • 30.cd_hmm_untied/slave_convg.pl
  • 40.buildtrees/slave.treebuilder.pl
  • 45.prunetree/slave.state-tying.pl
  • 50.cd_hmm_tied/slave_convg.pl
  • 90.deleted_interpolation/deleted_interpolation.pl
  • 99.make_s2_models/make_s2_models.pl
  • The table described above lists all of the scripts used to run the trainer. Some of the scripts, as shown above, have different scripts that are called. The location of these scripts are listed in the table
  • The main directory of the scripts is located in /speechtools/SphinxTrain-1.0/taskName (e.g. train1)
  • Executable files are also called by the scripts
    • Location of the executable files is in /mnt/main/local/bin/
    • make_feats.pl calls the sphinx_fe executable
    • Inside of Runall.pl it calls a script, slave_align.pl which is located in the /04.vtln_align/ directory
      • slave_align.pl calls the sphinx3_align executable file
  • Decode
    • There are a couple of scripts used to build the language model in order to decode
      • lm_create.pl is used, this script calls the sphinx_lm_convert executable file
  • This webpage has detailed information on how a basic trainer works. Obviously ours is configured for us so this isn't exactly how we run the trainer or decoder. However, it does have a lot of detailed information that could be useful for people to understand how it works.