Speech:Exps 0305 014

Description
Author: Tri Nguyen (UserID: tmn1001)

Date: 3-22-2018

Purpose: Removes instances of [] and - marked words in LM, transcripts and dictionary for 5 hour train and use same set to test. Te same experiment as 0305/013

Details:

Here I removed both [] and - marked words from language models, transcripts, dictionary. Look at 0305/011 for details as well as  0305/012 for how we presently are doing it by leaving in both [] and - (note that currently this isn't being done correctly as /mnt/main/scripts/user/parseLMtrans.pl is not working properly). I added two more regular expressions (look in etc/scripts):


 * a bc- -> abc
 * -abc -> abc

Again, I trained a set of models (i.e. on the 5 hour set) and then decoded, testing with the same training set.

Results:

| Sum/Avg | 4172 60569 | 73.7   18.4    7.9    6.5   32.8   88.3 | |=================================================================|     |  Mean   |  1.3   19.2 | 76.3   17.9    5.9   15.2   38.9   88.6 | | S.D.   |  0.5   16.5 | 18.0   15.1    7.8   28.8   32.1   29.2 | | Median |  1.0   15.0 | 76.9   16.3    2.6    3.5   33.3  100.0 | `-'