Speech:Exps 0303 001

Description
Author: Steve Thibault

Date: 2-12-2018

Purpose: First Successful Train, Decode and Scoring

Details: Here are the following procedures and commands I used to successfully accomplish this:

'''

Run the Training Experiment
(From: https://foss.unh.edu/projects/index.php/Speech:Run_Train_Setup_Script )

-	ssh into caesar, THEN ssh into a drone (i.e. asterix, idefix, etc.), not Caesar…. cd into your Exp/0303/your_subexperiment_folder#

-	makeTrain.pl switchboard 5hr/train (or whatever hour train you are running)

-	genFeats.pl -t

-	nohup scripts_pl/RunAll.pl &

-	Will probably see the following: “MODULE: 99 Convert to Sphinx2 format models Can not create models used by Sphinx-II. If you intend to create models to use with Sphinx-II models, please rerun with: $ST::CFG_HMM_TYPE = ‘.semi.’ or $ST::CFG_HMM_TYPE = ‘.cont’ and $ST::CFG_FEATURE = ‘1s_12c_12d_3p_12dd’ and $ST::CFG_STATESPERHMM = ‘5’

That’s ok. I am told it ran fine and we expect that “error” to occur.

(From: https://foss.unh.edu/projects/index.php/Speech:Create_LM )

Create the Language Model (LM)
-	Inside 0303/your_subexperiment_folder# create LM with mkdir LM.

-	cd LM and within LM cp -i corpus path/train/trans/train.trans trans_unedited where the corpus path is the path used when creating the transcript (using genTrans.pl). For example, if you used the 30hr/train corpus, it would be “cp -i /mnt/main/corpus/switchboard/30hr/train/trans/train.trans trans_unedited”.

-	Next prepare the transcript and execute the script that will build the language model: Prepare the script; parseLMTrans.pl trans_unedited trans_parsed. This will do a lot then come back to command prompt. ("parseLMTrans.pl trans_undedited trans_parsed" replaced "/mnt/main/corpus/switchboard/dist/transcripts/ICSI_Transcriptions/trans/icsi/ParseTranscript.perl trans_unedited trans_parsed" after I ran this.)

-	Copy the script that creates the language model: cp -i /mnt/main/scripts/user/lm_create.pl . and don’t forget the period following a space after .pl.

-	If you ls the folder, the following should appear: lm_create.pl, trans_parsed, trans_unedited.

-	Execute the script: ./lm_create.pl trans_parsed This does a quick run and returns to command prompt.

-	At this point the Language Model has been created. Move on to Run Decode which can be used on Trained Data or Unseen Data. What follows is for Trained Data.

-	Before starting the Decode/Scoring, make sure the training completed successfully, then build the language model.

(From: https://foss.unh.edu/projects/index.php/Speech:Run_Decode_Trained_Data )

Decode and Score Trained Data
-	If still in /LM, move over to the etc folder found in /mnt/main/Exp/0303/your_subexperiment_folder#

-	The following sets up the Decode Directory and Runs the Decode:

-	In /etc, awk '{print $1}' /mnt/main/corpus/switchboard/30hr/test/trans/train.trans >> /mnt/main/Exp/0303/your_subexperiment_folder#/etc/your_subexperiment_folder#_decode.fileids

-	Still in /etc, run nohup run_decode.pl source experiment destination experiment senone count &. For example, “nohup run_decode.pl 0303/your_subexperiment_folder# 0303/your_subexperiment_folder# 1000 &”. In this example the taskName and the ExpID are the same because there are no sub experiments. 1000 is the default value for senone so if the default is desired leave it at 1000. If the senone count of your current experiment is unknown, execute the following before running run_decode: “ls ../model_parameters”. This will show ”taskName.cd_cont_senone count” (example: 001.cd_cont_1000) along with similarly named folders. Running this script will create a file called decode.log in the etc directory which will be used in the following step to score from a decode. After running “nohup…” I received the following “[1] 17568”. You should see something similar.

-	Scoring: Prepare the hypothesis transcript which will be compared to the reference transcript; still in the /etc directory, transform the decode.log file to hyp.trans. The Decoder combines output and status/error text into the single decode.log file. The status/error text needs to be removed so only the decoded sentences remain by using parseDecode.pl decode.log hyp.trans. It is normal for it to return the following issue: “rm: cannot remove `../etc/hyp.trans': No such file or directory”. This is normal and is expected. The script should still have run successfully. The script is trying to remove an existing hypothesis file before making a new one. But of course, since we never ran the script before, the hypothesis file doesn't exist; thus, returning an error. This is a bug that probably should be fixed.

-	This will place the newly created hypothesis transcript (hyp.trans) into the etc directory that you are currently in.

-	Which, conveniently enough, is where our reference transcript (_train.trans) is.

-	The parseDecode.pl script shouldn't take very long compared to other steps. But of course, it depends on the transcript size.

-	Now run the sclite scorer. Still in /etc, running the following will put the output in a file instead of being output to the command line sclite -r your_subexperiment_folder#_train.trans -h hyp.trans -i swb >> scoring.log.

- Look at your scoring log via Filezilla and you should see something that resembles what I have in 'Results' below.

Results: From scoring.log:

Alignment# 1 for speaker sw2020b

Alignment# 2 for speaker sw2020a

Alignment# 1 for speaker sw2022b

SYSTEM SUMMARY PERCENTAGES by SPEAKER

,-.     |                            hyp.trans                            | |-|     | SPKR    | # Snt # Wrd | Corr    Sub    Del    Ins    Err  S.Err | |-+-+-|     | sw2001b |    1      3 |100.0    0.0    0.0   66.7   66.7  100.0 | |-+-+-|     | sw2005a |    2     42 | 76.2   11.9   11.9    7.1   31.0  100.0 | |-+-+-|     | sw2006b |    1     29 | 69.0   20.7   10.3    3.4   34.5  100.0 | |-+-+-|     | sw2007b |    2     39 | 89.7   10.3    0.0    2.6   12.8  100.0 | |-+-+-|     | sw2007a |    1      7 | 57.1   28.6   14.3    0.0   42.9  100.0 | |-+-+-|     | sw2008b |    1      3 |100.0    0.0    0.0    0.0    0.0    0.0 | |-+-+-|     | sw2009a |    1     32 | 65.6   28.1    6.3    3.1   37.5  100.0 | |-+-+-|     | sw2009b |    1      3 |100.0    0.0    0.0    0.0    0.0    0.0 | |-+-+-|     | sw2010a |    1      5 | 40.0   60.0    0.0   20.0   80.0  100.0 | |-+-+-|     | sw2012b |    2     90 | 86.7   11.1    2.2    0.0   13.3  100.0 | |-+-+-|     | sw2013b |    1      5 | 40.0   60.0    0.0    0.0   60.0  100.0 | |-+-+-|     | sw2013a |    1     17 | 47.1   52.9    0.0   17.6   70.6  100.0 | |-+-+-|     | sw2014a |    1     21 | 76.2   19.0    4.8    9.5   33.3  100.0 | |-+-+-|     | sw2015b |    1     43 | 74.4   16.3    9.3    2.3   27.9  100.0 | |-+-+-|     | sw2017a |    1      3 |100.0    0.0    0.0   66.7   66.7  100.0 | |-+-+-|     | sw2018b |    1      3 | 66.7   33.3    0.0    0.0   33.3  100.0 | |-+-+-|     | sw2018a |    1     12 | 58.3   41.7    0.0   25.0   66.7  100.0 | |-+-+-|     | sw2019b |    1     38 | 78.9   13.2    7.9    0.0   21.1  100.0 | |-+-+-|     | sw2019a |    1      8 |100.0    0.0    0.0   12.5   12.5  100.0 | |-+-+-|     | sw2020a |    2     53 | 66.0   20.8   13.2   11.3   45.3  100.0 | |-+-+-|     | sw2020b |    1     45 | 55.6   37.8    6.7    6.7   51.1  100.0 | |-+-+-|     | sw2022b |    1      3 |100.0    0.0    0.0  100.0  100.0  100.0 | |=================================================================|     | Sum/Avg |   26    504 | 73.8   20.0    6.2    6.5   32.7   92.3 | |=================================================================|     |  Mean   |  1.2   22.9 | 74.9   21.2    3.9   16.1   41.2   90.9 | | S.D.   |  0.4   22.8 | 20.2   19.6    5.1   26.7   26.6   29.4 | | Median |  1.0   14.5 | 75.3   17.7    0.0    5.1   36.0  100.0 | `-'

Successful Completion=== Description === Author: Jaden Henry (UserID: jah2009)

Date: 3-2-2018

Purpose: desc

Details:

Results: