Speech:Exps 0310 019

Description
Author: Stephen Thibault (UserID: sdt1001)

Date: 4-17-2018

Purpose: 300hr baseline non-LDA with seen decode and new genTrans.pl script.

Details: Performed on new drone server Automatix as its first experiment.

Results: COMPLETED.

Made the Data Train on Automatix:


 * makeTrain.pl switchboard 300hr/train

makeTrain.pl with new genTrans.pl now generates something more detailed for the experiment runner to view:


 * Creating directory structure...
 * Done!
 * Modifying sphinx_train.cfg...
 * Done!
 * rm: cannot remove 'wav/*.sph': No such file or directory
 * Done!
 * Preparing data input files...
 * Generating transcript file and linking utterance files...
 * Processing
 * genTrans.pl 100% completed
 * Done!
 * Generating dictionary file...
 * /mnt/main/corpus/switchboard/dist/dict/custom/master.dic
 * Processing 35667 words against dictionary...
 * Added 3 files to add.txt
 * Created 019.dic
 * Done!
 * Generating filler dictionary...
 * Done!
 * Generating phones list...
 * etc/019
 * Done!
 * Preparation complete!
 * Preparation complete!

In the sphinx_train.cfg file, I changed the senone count to 8000 from 1000 before running "genFeats.pl -t".


 * genFeats.pl -t
 * nohup scripts_pl/RunAll.pl &

Allowed the "scripts_pl/RunAll.pl" to run overnight and the 019.cd_cont_8000_8 subdirectory in model_parameters was NOT created as I believe it should have been considering it exists in 0304/001 which is also a 300hr data train with a final density count of 8 (as reflected by the 001.cd_cont_8000_8 found in /001 model_parameters). I do not see an error in the 019.html file so I will have to run "scripts_pl/RunAll.pl" again.

I can either rm -rf the contents of model_architecture and model_parameters or I can just allow them to be overwritten. I will allow them to just be overwritten. Of course I could just run a completely new experiment and I will do so if this errors out. I want to see what overwriting does, see if it causes any errors.

Data Train appears to have completed. Viewing the contents of model_parameters indicates that two subdirectories were redone/created based on their respective time stamps:

019.ci_cont_flatinitial...23:10 17 Apr 18 019.cd_cont_untied........03:02 18 Apr 18 019.cd_cont_initial.......04:05 18 Apr 18 019.cd_cont_8000..........04:23 18 Apr 18 019.cd_cont_8000_1........05:22 18 Apr 18 019.cd_cont_8000_2........07:13 18 Apr 18 019.cd_cont_8000_4........09:53 18 Apr 18

(after rerunning "RunAll.pl", the following two were added...)

019.ci_cont...............12:57 18 Apr 18 019.cd_cont_8000_8........13:35 18 Apr 18

Created LM on Automatix:

Inside /mnt/main/Exp/0310/019:
 * mkdir LM
 * cd LM
 * cp -i /mnt/main/corpus/switchboard/300hr/train/trans/train.trans trans_unedited
 * parseLMTrans.pl trans_unedited trans_parsed
 * IF YOU LS THE DIRECTORY, SHOULD SEE: trans_parsed, trans_unedited
 * lm_create.pl trans_parsed

Inside /etc, awk '{print $1}' /mnt/main/corpus/switchboard/300hr/test/trans/train.trans >> /mnt/main/Exp/0310/019/etc/019_decode.fileids

Ran Seen Decode on Automatix:

Inside /etc, ''' /usr/local/bin/sphinx3_decode \ -hmm /mnt/main/Exp/0309/043/model_parameters/043.mllt_cd_cont_1000 \ -lm /mnt/main/Exp/0309/043/LM/tmp.arpa \ -dict /mnt/main/Exp/0309/043/etc/043.dic \ -fdict /mnt/main/Exp/0309/043/etc/043.filler \ -ctl /mnt/main/Exp/0309/043/etc/043_decode.fileids \ -cepdir /mnt/main/Exp/0309/043/feat \ -cepext .mfc >& decode.log & '''

Inside /etc, performed a "grep FWDVIT decode.log | wc" which returned "4034  58830   373178". The 4034 is what I expected to see for 300hr.

Inside /etc, parseDecode.pl decode.log hyp.trans

Inside /etc, sclite -r _train.trans -h hyp.trans -i swb >> scoring.log

SYSTEM SUMMARY PERCENTAGES by SPEAKER

,-.     |                            hyp.trans                            | |=================================================================|     | Sum/Avg | 4034  57411 | 68.6   24.6    6.8    9.2   40.6   90.7 | |=================================================================|     |  Mean   |  1.3   18.5 | 72.2   22.8    4.9   17.9   45.7   90.8 | | S.D.   |  0.5   16.1 | 19.4   17.0    7.2   32.2   35.4   27.1 | | Median |  1.0   13.0 | 72.0   22.2    0.0    5.9   38.1  100.0 | `-'

Successful Completion