Speech:Exps 0096

From Openitware
Jump to: navigation, search

Decode and Score on last_5hr/test using 3rd-Party models


Description

Author: Team B&C

Date: 4/29/2013

Purpose: To Decode and score the last_5hr/test corpus with a set of models created from CMU.

Details:

To improve our models, we need to get a baseline of accuracy the Sphinx Decoder is capable of. We discovered that CMU provides a set of acoustic and language models for Sphinx utilizing which were created with a large amount of data.

For this experiment, we decoded the hour-long last_5hr/test corpus with the Voxforge English acoustic model, and the US English HUB4 Language model.

To minimize potential issues, we utilized the entire CMU 0.7a dictionary for this experiment.


Results

The results of this test were much worse than previous experiments with average error rate of 72.0.

                     SYSTEM SUMMARY PERCENTAGES by SPEAKER

      ,-----------------------------------------------------------------.
      |                            hyp.trans                            |
      |-----------------------------------------------------------------|
      | SPKR    | # Snt # Wrd | Corr    Sub    Del    Ins    Err  S.Err |
      |---------+-------------+-----------------------------------------|
      |=================================================================|
      | Sum/Avg |  437   6474 | 37.1   48.3   14.6    9.2   72.0   99.8 |
      |=================================================================|
      |  Mean   | 36.4  539.5 | 37.7   48.2   14.0    9.6   71.9   99.7 |
      |  S.D.   |  8.3  143.2 |  7.1    4.0    5.5    3.4    6.7    1.1 |
      | Median  | 32.5  546.5 | 37.4   47.8   13.3    9.4   71.9  100.0 |
      `-----------------------------------------------------------------'

In debugging as to the reason why our score was so terrible, we discovered that the Acoustic model was generated using high-quality (16khz) audio data; we are utilizing telephone-quality (8-khz) audio data. The Sphinx Decoder will have poor results with this quality-mismatch between the what is defined in the models and what is being fed into it.