Speech:Exps 0025

Description
Author: Eric Beikman Date: (Started) 3/16/2013

Purpose: To gain a familiarity with the Sphinx trainer and the steps necessary to complete a model. This experiment will test to see if removing edundant transcript entries before training will have a difference in the resulting model's accuracy. (Part 2)

Details: Like experiment 0024, we utilized the existing Mini/Train corpus for both Training and decoding. In Experiment 0020, we had issues scoring due to redundant transcript entries. We wish to determine if leaving these redundancies in has a negative or positive effect on the resulting model. For this Experiment, we will be removing any redundant transcript entries in for the training and decoding processes. Results To ensure that this particular experiment was exactly the same as Experiment 0024 with a few adjustments, the dictionary created for 0024 was copied for use in this experiment. The Uniq-generated transcript generated when scoring Experiment 0024 was also utilized.

To run the train sucessfully, the file list for this experiment (0025_train.fileids) had to have completely unique entries removed to match the transcript list used.

The Model creation and testing process proceeded without inccident. SCLite generated the following score:

,-.     |                              hyp.trans (Exp 0025                |      |-|      |         | # Snt # Wrd | Corr    Sub    Del    Ins    Err  S.Err |      |=================================================================|      | Sum/Avg |  549  10919 | 83.6    9.9    6.5    6.3   22.7   91.6 |      |=================================================================|      |  Mean   |  2.9   57.8 | 82.2   12.0    5.7   11.6   29.4   93.2 |      |  S.D.   |  1.9   45.0 | 13.1   11.5    5.2   20.1   26.5   17.2 |      | Median  |  3.0   47.0 | 85.0    9.3    4.9    6.7   23.1  100.0 |      `-'

We expected no statistical difference between the models created in Experiments 0025 and 0024; however, when comparing it to Experiment 024, there were minor differences in the error rates between the two scores. Experiment 0025, with the unique transcript removed before training, had a slightly (0.7) higher average error rate than experiment 0024, with a higher Substitution and Deletion rate than Experiment 0024. Oddly enough, both experiments had the same insertion error rate (6.3).