Speech:Spring 2013 April 10th Group BC

From Openitware
Jump to: navigation, search

Sub Groups

Group Members

Group Log

Week 1 (ending 4/9/13)

We were successful in running a test on train for the 5-hour corpus experiment, however issues (presumably with the transcript/genTrans.pl script) have resulted in a greater than expected word error rate.

Week 2 (ending 4/17/13)


Mike Mailloux - Started work on our newest experiment (0080). Completed the first few team assignments with no errors or problems during. Will periodically check back through the day to see progress team is making.

Harry Dodson - Generated transcript with no errors. Moved the dictionary from our group folder to the experiment /etc folder and copied over the filler dictionary. At the top of our group folder dictionary it looks like the first few lines might not have any purpose and may need to be looked at, not sure if that will effect the outcome of this experiment.

Mike Brown - Generated phone list and feats data. There was an error generating the feats data because the sphinx_train.cfg file was not in the 0080/etc. Eric copied that file over that the issue was solved.


Brian Drouin - Created the Language Model for Exp 0083. All that's left is to run the decode as well as scoring.


Marc Southard - [Exp 0083] Generating Feat data had trouble running. sphinx_train.cfg had corrupt data in it randomly. Fixed that up, and it ran finally. Completed without errors.

The past week, we ran through 4 experiments. Experiments 0080 through 0083 and experiment 0087.

  • Experiment 0080 was a test decode on experiment 0082. It was designed to determine the effects of having no stress indicators in the dictionary. Initially this experiment was designed to test the effects of a language model and decode on experiment 0074's acoustic model using a dictionary without stress indicators; however we discovered that the decoder requires both models to have the same phones defined. We then utilized this experiment to serve as the scoring for experiment 0082.
  • Experiment 0082 was a training experiment using the 5 hour last_5hr/train corpus. It was designed to test the whether an acoustic model could be built using a stress-indicator-less dictionary.
  • Experiment 0081 had a heavily modified transcript, filler dictionary, and phone line. All non-verbal noises and vocalizations were encapsulated in double 'plus' signs (++), each unique instance of a word encapsulated as such was defined in the filler dictionary, mapping it to 3 sets of fake phones: '+LAUGHTER+', '+NOISE+', and '+UNINTELLIGIBLE+'. The trainer will account for these noises in the transcript, but won't factor them into the final acoustic model.
  • Experiment 0083 was originally designed to be a decode using the last_5hr/test corpus; however, the wrong corpus was utilized. This experiment was completed regardless.
  • Experiment 0087 was a test decode using the last_5hr/test corpus on the acoustic model created in 0081. It can be thus compared directly to the results found in the previous week in experiment 0075.

Based on the results of the above experiments, we've determined the following: this week, we've determined the following:

  1. The Language Model and the Acoustic models must share the same phone list. (Exp.0080)
  2. The effects of removing stress indicators from the dictionary when training, for LM creation, and decoding are inconclusive as the decoder errored out on all but two audio files.(Exp.0080 & 0082)
  3. Non-Word vocalizations and other noises can be encapsulated in the transcript with "++", each unique instance of this encapsulated word must be defined in the filler dictionary, and mapped to a non-existent phone encapsulated in "+"s. The phone itself must also be defined in the phone list.
  4. There is no noticeable increase in word accuracy by doing the above, in fact, it may make things marginally worse! HOWEVER, these resultKr may be due fact that the process needs to be refined.

Week 3 (ending 4/24/13)

The genTrans.pl script continues to be updated in our efforts to improve the quality of our transcripts. The AM of the "best" scoring 1hour experiment has ben run against the 5hour corpus in an effort to narrow down the possible sources of errors. The most common training and decode errors has been identified. There seem to be issues occurring with the Baum-Welch "forward/backward" algorithm, which is used to determine hidden variables in the HMM.

Kevin Annis

  • Set up task directory for experiments 0094 and 0095
  • Generated transcripts for exp 0094 0095
  • Setup config files for 0094 and 0095

Harry Dodson

  • Went through log directory for 0089
  • Found all errors and warnings

1608 times. Most of the numbers would change, except the 0 and line#

utt>  3734       sw4925A-ms98-a-0013  205    0    76 23 ERROR: "backward.c", line 431: final state not reached
ERROR: "baum_welch.c", line 331: sw4925A-ms98-a-0013 ignored

1228 times. This error only showed up in Module 50. Only the mgau, density and component numbers would change

ERROR: "gauden.c", line 1700: var (mgau= 1099, feat= 0, density=7, component=17) < 0

3 times. Module 30

WARNING: "mod_inv.c", line 257: n_top 8 > n_density 1.  n_top <- 1
WARNING: "mod_inv.c", line 257: n_top 8 > n_density 2.  n_top <- 2
WARNING: "mod_inv.c", line 257: n_top 8 > n_density 4.  n_top <- 4

4 times. Module 50. Exactly the same each time

WARNING: "accum.c", line 626: The following seno never occur in the input data


Kevin Annis
  • Created the Language Model
  • Ran the Decoder
  • Scored the Decoder
  • Posted Results
    • Slight issue with Permissions, but Eric helped me clear it up.