Alternate Transcript Train using Last_5hr corpus


Author: Group B&C

Date: 4/9/13

Purpose: To see the effects of utilizing a transcript which was created with specific utterances mapped to custom phones.


The goal for this experiment was to determine the best possible way to remove non-word entries in the transcript, and how best to have the trainer anticipate these words, but not actually train on them. We eventually determined that by encapsulating each non-word or intelligible word in the transcript with double-plusses ‘++’, we could force the trainer to not take these into account when training. We then added each word that appears encapsulated as such in the phone list, mapping them each to the fake phones +NOISE+”, “+UNINTELLIGIBLE+” and “+LAUGHTER+.

Results The experiment dictionary was derived from the one created in experiment 0074

The experiment filler dictionary was populated with every item that was “++”ed out, mapping to custom phones defined in the phone list. Created fake phones: “+NOISE+”, “+UNINTELLIGIBLE+” and “+LAUGHTER+, The trainer accounted for them, but ignored them when building the models.

The experiment was continued in Experiment 0083, and Experiment 0087,