Speech:Exps 0283 005

Description
Authors:

Ben Leith James Schumacher Ryan O'Neal Jon Shallow

Date created: 2/24/16

Purpose:
 * Increase dictionary size limit and see effect on WER

Details:
 * generateFeats.pl started at 4:17 PM on 2-24-16
 * Train started at 5:24 PM on 2-24-16
 * Train finished at 11:15 PM on 2-26-16
 * Followed all steps to create the language model except executing the lm_create.pl script, as that is the script that needs to be modified so that the max cap on the dictionary is increased to 30000

Useful Link for Determining the Proposed Changes

CODE that needs to be modified: print "Generating Vocab file . . .\n"; system( $folder."wfreq2vocab  tmp.vocab" );

Proposed CODE:

print "Generating Vocab file . . .\n"; system( $folder."wfreq2vocab -top 30000  tmp.vocab" );


 * Ran the lm_create.pl script after making the proposed change
 * OUTPUT from lm_create.pl, specifically the wfreq2vocab output:

''Generating Vocab file. . . wfreq2vocab : Will generate a vocabulary containing the most frequent 30000 words. Reading wfreq stream from stdin...'' wfreq2vocab : Done.


 * Started the decode (1000 files, 8000 senones) at 12:00 PM (2-27-16)
 * Decode ended at 5:25 PM (2-27-16)

Results:

Number of words in tmp.vocab in LM directory of Experiment 0283/004: 20,004 Number of words in tmp.vocab in LM directory of Experiment 0283/005: 23,573

SYSTEM SUMMARY PERCENTAGES by SPEAKER ,-.     |                            hyp.trans                            | |-|     | SPKR    | # Snt # Wrd | Corr    Sub    Del    Ins    Err  S.Err | |-+-+-|     | sw2001a |   32    541 | 79.7   17.0    3.3   23.3   43.6  100.0 | |-+-+-|     | sw2001b |   34    488 | 82.4   14.3    3.3   26.8   44.5  100.0 | |-+-+-|     | sw2005a |   53   1172 | 87.9    8.6    3.5   10.0   22.1   92.5 | |-+-+-|     | sw2005b |   77    817 | 72.6   19.2    8.2   25.1   52.5   98.7 | |-+-+-|     | sw2006a |   40    608 | 86.0   11.3    2.6   15.0   28.9   97.5 | |-+-+-|     | sw2006b |   43   1012 | 80.0   12.7    7.2    6.4   26.4   95.3 | |-+-+-|     | sw2007a |   86   1064 | 84.7   10.1    5.3   11.4   26.7   83.7 | |-+-+-|     | sw2007b |   80   1183 | 83.6   12.4    4.0    9.2   25.6   92.5 | |-+-+-|     | sw2008a |   28    369 | 89.2    8.7    2.2   17.1   27.9   96.4 | |-+-+-|     | sw2008b |   32    436 | 84.6   13.1    2.3   18.3   33.7   93.8 | |-+-+-|     | sw2009a |   37    605 | 76.2   16.7    7.1   10.9   34.7   97.3 | |-+-+-|     | sw2009b |   44    649 | 80.9   14.5    4.6   16.5   35.6   93.2 | |-+-+-|     | sw2010a |   38    528 | 87.7    8.9    3.4   18.0   30.3  100.0 | |-+-+-|     | sw2010b |   33    659 | 73.9   16.5    9.6   13.1   39.2  100.0 | |-+-+-|     | sw2012a |   67   1420 | 84.4    8.9    6.6   11.3   26.8   92.5 | |-+-+-|     | sw2012b |   43    846 | 83.3   12.4    4.3   11.0   27.7  100.0 | |-+-+-|     | sw2013a |   52    766 | 74.4   18.0    7.6   13.8   39.4   96.2 | |-+-+-|     | sw2013b |   88   1526 | 68.3   20.4   11.3    8.2   39.9   95.5 | |-+-+-|     | sw2014a |   23    311 | 74.6   20.6    4.8   41.8   67.2  100.0 | |-+-+-|     | sw2014b |   27    543 | 78.6   15.8    5.5   12.9   34.3   92.6 | |-+-+-|     | sw2015a |   29    611 | 81.2   10.8    8.0    2.9   21.8   89.7 | |-+-+-|     | sw2015b |   14    181 | 80.1   12.7    7.2   11.0   30.9   85.7 | |=================================================================|     | Sum/Avg | 1000  16335 | 80.4   13.7    6.0   13.4   33.0   94.7 | |=================================================================|     |  Mean   | 45.5  742.5 | 80.7   13.8    5.5   15.2   34.5   95.1 | | S.D.   | 21.2  354.6 |  5.5    3.8    2.5    8.3   10.7    4.6 | | Median | 39.0  630.0 | 81.0   12.9    5.0   13.0   32.3   95.8 | `-'

This is our best result yet. Our last experiment (0283/004) produced a WER of 33.9% and this experiment produced a WER of 33.0%. Providing the -top 30000 option seems to have made a 0.9% decrease in WER.