Mini Test Train with Decode


Author: Eric Beikman

Date: (Started) 2/20/2013

Purpose: To gain a familiarity with the Sphinx trainer and the steps necessary to complete a model. This experiment was based on the suggested improvements we have determined while running Experiments 0018 and 0019

Details: We created a new corpus subset for this particular experiment. This corpus, called "Mini2", contains an hour worth of test data. We have also ensured that the Train configuration file was set to make a continuous model in the Sphinx 3 format.

Results The first few trains failed due to missing dictionary entries. Due to the large amount of missing words, we took the added word list created in 0017 and added it to the dictionary for 0020, cutting out the unneeded entries with parseDict.pl. The resulting dictionary still required the following words to be added:


After the dictionary was fixed, the train process ran successfully, creating the proper Continuous model. The creation of the language model proceeded with no issues.

The Decode process takes a very disproportionate amount of time. Going on for 8+ hours.

Scoring was troublesome for this experiment. The first attempts at scoring failed. With SCLite throwing out errors relating to missing transcripts. According to the instructions, this is not uncommon and can be resolved by removing all redundant entries within the hypothesis and reference transcripts; however, running this process did not resolve the problem.

Through experimentation, we eventually discovered that our hour long corpus was actually consisted of six 10-minute long samples all concatenated to itself. Further causing issues was that one of the transcripts had a duplicate entry, but with different wording of the transcripts. This caused major issues and thus had to be removed. After doing this, the following score was created.

      |                     0020_hyp_uniq_nd.trans                      |
      |         | # Snt # Wrd | Corr    Sub    Del    Ins    Err  S.Err |
      | Sum/Avg | 1044  22265 | 44.5   45.9    9.6   11.7   67.2   99.6 |
      |  Mean   |  2.7   58.1 | 45.0   47.0    8.0   17.6   72.5   99.7 |
      |  S.D.   |  1.7   44.0 | 14.2   14.0    6.6   21.7   23.5    3.0 |
      | Median  |  2.0   48.0 | 45.6   45.8    7.4   11.5   69.5  100.0 |