Speech:Exps 0305 013

Description
Author: Prof Jonas (UserID: mcy59)

Date: 3-14-2018

Purpose: Removes instances of [] and - marked words in LM, transcripts and dictionary for 5 hour train and use same set to test.

Details:

Here I removed both [] and - marked words from language models, transcripts, dictionary. Look at 0305/011 for details as well as  0305/012 for how we presently are doing it by leaving in both [] and - (note that currently this isn't being done correctly as /mnt/main/scripts/user/parseLMtrans.pl is not working properly). I added two more regular expressions (look in etc/scripts):


 * abc- -> abc
 * -abc -> abc

Again, I trained a set of models (i.e. on the 5 hour set) and then decoded, testing with the same training set.

Results:

This did surprisingly even better with a WER 32.8% compared with only removing []. I'm still assuming that with only 5 hours of training this is not a big enough sample to draw clear conclusions. Must really run this as a 300 hour train.

,-.     |                            hyp.trans                            | |-|     | SPKR    | # Snt # Wrd | Corr    Sub    Del    Ins    Err  S.Err | |-+-+-|     | Sum/Avg | 4172  60569 | 73.7   18.4    7.9    6.5   32.8   88.3 | |=================================================================|     |  Mean   |  1.3   19.2 | 76.3   17.9    5.9   15.2   38.9   88.6 | | S.D.   |  0.5   16.5 | 18.0   15.1    7.8   28.8   32.1   29.2 | | Median |  1.0   15.0 | 76.9   16.3    2.6    3.5   33.3  100.0 | `-'