Last_5hr Train using genTrans5 derived transcript.
Purpose: This experiment was designed to test the newly created genTrans5.pl script on the Last_5hr train corpus.
Details: The previous versions of the genTrans script merely removed markers which indicated transcribers' notes, leaving whatever was in those markers intact. This resulted in inaccurate models as the trainer was attempting to align and train phones listed in the transcript but did not exist in the audio file.
genTrans5 performed well, but the transcript it generated still contained the remnants of transcribers markers; specifically, lone "]"'s in between two spaces. They were removed using sed before the train was executed to avoid complications.
The Experiment dictionary required the following words to be added:
ACC AE1 K S APAR AH0 P EH1 R AV AH0 V IY1 BECAU B IH0 K AO1 BEFO B IH0 F AO1 BU B AH1 CH CH CHROMATO K R OW1 M EY1 T OW2 COMPA K AH1 M P AH0 ENCLO EH0 N K L OW1 ENER EH1 N ER0 ESPEC AH0 S P EH1 SH EVERYB EH1 V R IY0 B EXAC IH0 G Z AE1 K EXCHAN IH0 K S CH EY1 EY EY1 FAM F AE1 M FASCINAT F AE1 S AH0 N FO F AO1 FR F R GE JH IY1 IY1 GETT G EH1 T GI JH IY1 AY1 GR G R GUATE G W AA2 T AH0 IRRESPON IH0 R AH0 S P AA1 N ISRA IH1 Z R AA1 JUS JH AH1 S KN K EY1 EH1 N MYS M IH1 S NINET N AY1 N T PEOP P IY1 P PERIENCE P IH1 R IY0 AH0 N S PL P IY1 EH1 L RECYCL R IY0 S AY1 K AH0 L RETI R EH1 T AH0 RI AA1 R AY1 RUNN R AH1 N SHEE SH IY1 SITUA S IH2 CH UW0 EY1 SPEE S P IY1 SYST S IH1 S T THA DH AE1 WH W EH1 WHA W AH1 WHE W EH1 WHI W AY1 WI W AY1 WORDP W ER1 D P WOU W UH1
As you can see: most, if not all, of the above words are components of other words. These were likely clipped off by genTrans5 as the remaining portions were inaudible and thus encapsulated in brackets.