Speech:Exps 0283 007

Description
Ben Leith James Schumacher Ryan O'Neal Jon Shallow
 * Authors:

Date Created: 3/2/16

We found out that the corpus data (256hr) has approximately 11,000 utterances that are corrupt, starting after the first 32,000 files. We have also been instructed that our 125 Hr train configurations will have little effect on more data from a 256 hr train. We need to start running 256 hr trains and cannot use the same configurations. This will change our thought process for our experiments because we will now have to come up with new configurations to reduce the WER. This includes coming up with a new baseline for our new corpus. We need to find a way to copy the good corpus data and keep the corrupted files out of the experiment. We received permission from Prof. Jonas to perform this task.
 * Purpose:

Ran with no changes to sphinx_train.cfg. This train shows the results of default values.
 * Details:

SYSTEM SUMMARY PERCENTAGES by SPEAKER ,-.     |                            hyp.trans                            | |-|     | SPKR    | # Snt # Wrd | Corr    Sub    Del    Ins    Err  S.Err | |-+-+-|     | sw2001b |   34    488 | 65.6   29.1    5.3   27.0   61.5  100.0 | |-+-+-|     | sw2001a |   32    541 | 56.0   38.3    5.7   22.4   66.4  100.0 | |-+-+-|     | sw2005a |   32    764 | 69.8   22.9    7.3    8.8   39.0   96.9 | |-+-+-|     | sw2005b |   60    661 | 49.6   35.2   15.1   18.9   69.3  100.0 | |=================================================================|     | Sum/Avg |  158   2454 | 60.5   30.8    8.7   18.1   57.7   99.4 | |=================================================================|     |  Mean   | 39.5  613.5 | 60.2   31.4    8.4   19.3   59.0   99.2 | | S.D.   | 13.7  123.7 |  9.1    6.8    4.6    7.8   13.7    1.6 | | Median | 33.0  601.0 | 60.8   32.2    6.5   20.6   63.9  100.0 | `-'
 * Results:


 * Concerns: