Speech:Exps 0089

From Openitware
Jump to: navigation, search

Last_5hr Train using genTrans5 derived transcript.


Description

Author:Team B&C

Date: 4/21/2013

Purpose: This experiment was designed to test the newly created genTrans5.pl script on the Last_5hr train corpus.

Details: The previous versions of the genTrans script merely removed markers which indicated transcribers' notes, leaving whatever was in those markers intact. This resulted in inaccurate models as the trainer was attempting to align and train phones listed in the transcript but did not exist in the audio file.

This experiment is only a train. Two scoring experiments have been used to test this train: Experiment 0090 & Experiment 0091.

Results

genTrans5 performed well, but the transcript it generated still contained the remnants of transcribers markers; specifically, lone "]"'s in between two spaces. They were removed using sed before the train was executed to avoid complications.

The Experiment dictionary required the following words to be added:

ACC  AE1 K S
APAR  AH0 P EH1 R
AV AH0 V IY1
BECAU B IH0 K AO1
BEFO B IH0 F AO1
BU  B AH1
CH CH
CHROMATO  K R OW1 M EY1 T OW2
COMPA  K AH1 M P AH0
ENCLO  EH0 N K L OW1
ENER  EH1 N ER0
ESPEC AH0 S P EH1 SH
EVERYB EH1 V R IY0 B
EXAC  IH0 G Z AE1 K
EXCHAN IH0 K S CH EY1
EY EY1
FAM F AE1 M
FASCINAT  F AE1 S AH0 N
FO F AO1
FR F R
GE JH IY1 IY1
GETT G EH1 T
GI  JH IY1 AY1
GR G R
GUATE G W AA2 T AH0
IRRESPON IH0 R AH0 S P AA1 N
ISRA IH1 Z R AA1
JUS JH AH1 S
KN  K EY1 EH1 N
MYS M IH1 S
NINET  N AY1 N T
PEOP P IY1 P
PERIENCE P IH1 R IY0 AH0 N S
PL  P IY1 EH1 L
RECYCL  R IY0 S AY1 K AH0 L
RETI R EH1 T AH0
RI AA1 R AY1
RUNN R AH1 N
SHEE SH IY1
SITUA S IH2 CH UW0 EY1
SPEE S P IY1
SYST S IH1 S T
THA DH AE1
WH W EH1
WHA W AH1
WHE W EH1
WHI W AY1
WI W AY1
WORDP W ER1 D P
WOU W UH1

As you can see: most, if not all, of the above words are components of other words. These were likely clipped off by genTrans5 as the remaining portions were inaudible and thus encapsulated in brackets.