Alternate Transcript Train using Last_5hr corpus
Author: Group B&C
Purpose: To see the effects of utilizing a transcript which was created with specific utterances mapped to custom phones.
The goal for this experiment was to determine the best possible way to remove non-word entries in the transcript, and how best to have the trainer anticipate these words, but not actually train on them. We eventually determined that by encapsulating each non-word or intelligible word in the transcript with double-plusses ‘++’, we could force the trainer to not take these into account when training. We then added each word that appears encapsulated as such in the phone list, mapping them each to the fake phones +NOISE+”, “+UNINTELLIGIBLE+” and “+LAUGHTER+.
Results The experiment dictionary was derived from the one created in experiment 0074
The experiment filler dictionary was populated with every item that was “++”ed out, mapping to custom phones defined in the phone list. Created fake phones: “+NOISE+”, “+UNINTELLIGIBLE+” and “+LAUGHTER+, The trainer accounted for them, but ignored them when building the models.