Speech:Exps 0024

From Openitware
Jump to: navigation, search

Mini Test Train with Decode


Description

Author: Eric Beikman

Date: (Started) 3/12/2013

Purpose: To gain a familiarity with the Sphinx trainer and the steps necessary to complete a model. This experiment will test to see if removing edundant transcript entries before training will have a difference in the resulting model's accuracy.

Details: For this Experiment, we utilized the existing Mini/Train corpus for both Training and decoding. In Experiment 0020, we had issues scoring due to redundant transcript entries. We wish to determine if leaving these redundancies in has a negative or positive effect on the resulting model. The results of this eperiment will be compared with the results of the next Experiment, 0025. For this Experiment, we will be leaving the redundant transcript entries in for the training and decoding processes.

Results The first few trains failed due to missing dictionary entries. Added the following entries into the dictionary:

 IBM  AY1 B IY1 EH1 M
FEDERALES  F EH1 D ER AH0 L IY1 S
DUCTWORK  D AH1 K T W ER1 K
COGNIZITIVE  K AA1 G N AH0 Z IH0 T IH0 V
CHOWPERD CH AW1 P ER0 D
ALBRIDGE AO1 L B R IH1 JH
SOUTHBEND S AW1 TH B EH1 N D
VOCALIZED V OW1 K AH0 L AY2 Z D
MOOSEWOOD M UW1 S W UH2 D
UNDERGRAD AH1 N D ER0 G R AE1 D
GTE JH IY1 T IY1 IY1
MARYLANDER M EH1 R IY0 L AE2 N D ER0
MARYLANDER'S M EH1 R IY0 L AE2 N D ER0 Z
PLANOITE P L EY1 N OW0 AY0 T
DADGUM  D AE1 D G AH1 M
EXPERIENCEWISE  IH0 K S P IH1 R IY0 AH0 N S W AY1 Z
CANSEGO  K AE1 N S EY1 G OW1
HOPELY  HH OW1 P L IY0
STORLY  S T AO1 R L IY0
KID'LL  K IH1 D L
REINJURING  R IY2 IH1 N JH ER0 IH0 NG
NFL  EH1 N EH1 F EH1 L
PE  P IY1 IY1
UNDERGRADS AH1 N D ER0 G R AE1 D Z
MARYLANDER'S  M EH1 R IY0 L AE2 N D ER0 Z

After the dictionary was fixed, the train process ran successfully, creating the proper Continuous model. The creation of the language model and the Decode process proceeded with no issues, taking substancially less time than Experiment 0020.

Due to the existance of redundant entries within the reference and hypothesis transcripts, both transcripts needed to be ran through uniq. After this process, SCLite was able to generate the following score:

      ,-----------------------------------------------------------------.
      |                              hyp.trans                          |
      |-----------------------------------------------------------------|
      |         | # Snt # Wrd | Corr    Sub    Del    Ins    Err  S.Err |
      | Sum/Avg |  549  10919 | 84.0    9.2    6.8    6.3   22.3   89.8 |
      |=================================================================|
      |  Mean   |  2.9   57.8 | 83.3   10.4    6.3   12.2   28.9   92.2 |
      |  S.D.   |  1.9   45.0 | 11.8    9.7    5.8   20.4   24.8   18.2 |
      | Median  |  3.0   47.0 | 85.0    8.3    5.6    6.1   22.4  100.0 |
      `-----------------------------------------------------------------'