Speech:Exps 0304 019

Description
Author: Stephen Thibault (UserID: sdt1001)

Date: 3-6-2018

Purpose: Replication of 0301/011 Unseen Decode and Scoring.

Details: Uses the copied Train Data from 0301/006 which resides in 0304/017 and the default Language Model created in this sub-experiment directory as well as the sphinx_train.cfg parameters from the 0301/011 Decode and Scoring including LDA. Run on drone server Asterix.

Results: INCOMPLETE. The 300hr Decode and Scoring that I started yesterday completed during the night. I concluded this by seeing that the sphinx3_decode process was no longer running on Asterix and that when I perform a "grep FWDVIT decode.log | wc" in the 0304/019/etc folder, it returns 4034, which is what I expect for a 300hr experiment. However, when I run the final step, "sclite -r...", it errors out with something I have not seen before:

"Error: Not enough Reference Files loaded

Missing:

(sw2001a-ms-98-a-0048)" and on and on and on.

According the the following, https://foss.unh.edu/projects/index.php/Speech:Run_Decode_Unseen_Data, at the bottom it states to perform a "% uniq hyp.trans >> hyp.trans.uniq" to remove all the redundant lines in the hyp.trans file in the etc directory of the decode sub-experiment. This yielded the following: "hyp.trans.uniq: Too many arguments." The wiki referenced above does not address this, but states to restart the SClite while using the newly created hyp.trans.uniq file. This resulted in a CORE DUMP (of course). The wiki referenced above states that if you get the same error again, repeat the same process EXCEPT doing so with the _train.trans file and to specify that file when running the SClite process again. Once again, this yielded the following: "017_train.trans.uniq: Too many arguments.". Continuing on with the SClite process again, as the wiki states to do, I left the hyp.trans.uniq in the command line and this also resulted in a CORE DUMP.

The cause of this appears to be that the "wav" directory has not been populating for ANY of the Unseen Decodes we have run this semester. Talking with the Experiment Group, it would appear that the "makeTest.pl" script does indeed create the "wav" directory, but it does not populate it as it does "etc" or "model_parameters" from the train experiment. This will need to be addressed, either by altering the "makeTest.pl" script or by adding another manual copying step to the process prior to decoding in the test sub-experiment.

ABANDONED. Determined on 17 April that just activating LDA in the sphinx_train.cfg file is NOT the only necessity to train with LDA correctly. So the results of 28.4% WER obtained by 2017 Spring Rebels Team in 0301/011 are irrelevant.