Speech:Spring 2014 Colby Chenard Log

From Openitware
Revision as of 18:03, 7 April 2014 by Cdr45 (Talk | contribs)

Jump to: navigation, search


Week Ending February 4th, 2014

Task

Jan 30: I am going to attempt to get as familiar with the system as possible. I will log in as root and attempt to navigate around caesar a bit and possibly try to run a train.
Feb 1: Logged in and read logs. Will attempt to run a train later tonight or tomorrow.

Feb 3: Created a full dictionary for a first 5hr train using genTrans5.pl and Eric's updateDict.pl script. Possibly attempt the first_5hr Train.

Feb 4: Logged in. Read Logs.

Results

Jan 30: Logged into Caesar root, but it appears our student accounts haven't been created yet, so I will have to wait on that before running a train. The notes specifically say not to run any commands as root. I was able to check what accounts were available by cding to /etc then using the cmd, 'more /etc/passwd

Feb 3:

 1. Colby J and I ran the updateDict.pl and it seem to run without error, this was to our surprise so we took it a step further.
 2. Then we tried to initiate a train and it seems to be running correctly! 

We made as far as phase three, however it is a 6hr train, so we aren't out of the woods yet. This is a big milestone because we seem to finally have a functional understanding of the system
Now we can start to experiment with the acoustic modeling.

Plan

Feb 3: Work together with Colby Johnson

 1.Create a dictionary against our transcription file that was generated using getTrans5.pl
 2.Using Experiment 0166 add2.txt file and Eric's updateDict.pl obtain a list of missing words from Dictionary and add to created dictionary.
 3.Once we have a full dictionary list for first_5hr Train, attempt to run Train.
Concerns

Feb 3: Our main worry is that we don't know how to correctly use the updateDict.pl script, because the documentation is a bit vague. Eric also raises some concerns about the functionality of some optional params in the script.

Week Ending February 11, 2014

Task

Feb 8: This week my goal is to run a few trains on my own as well as decode them to try and get a better grasp on how to start tuning parameters to improve our baseline.
Feb 9: Logged in.
Feb 10: Run my first train (first_5hr train).
Feb 11: Try to run a train again. Retrace my steps from yesterday to see what went wrong.

Results

Feb 10: I was able to run the train but errored out at step 7...

  Phase 7: TRANSCRIPT - Checking that all the phones in the transcript are in the phonelist, and all phones in the phonelist appear at least once
 ) occurs in the phonelist (/mnt/main/Exp/0157/etc/0157.phone), but not in any word in the transcription (/mnt/main/Exp/0157/etc/0157_train.trans)
 ) occurs in the phonelist (/mnt/main/Exp/0157/etc/0157.phone), but not in any word in the transcription (/mnt/main/Exp/0157/etc/0157_train.trans)
 ) occurs in the phonelist (/mnt/main/Exp/0157/etc/0157.phone), but not in any word in the transcription (/mnt/main/Exp/0157/etc/0157_train.trans)
 ) occurs in the phonelist (/mnt/main/Exp/0157/etc/0157.phone), but not in any word in the transcription (/mnt/main/Exp/0157/etc/0157_train.trans)
 Something failed: (/mnt/main/Exp/0157/scripts_pl/00.verify/verify_all.pl)

I think there may be an issue with my phone file. I will redo that tomorrow and see if it will pass step 7.
Feb 11: So Colby J and I ended up troubleshooting my issue together. We tried redoing all the steps.

 1. Re ran genTrans but used v5 instead of v6 because there has been issues when using v6. That didn't fix it.
 2. we added words to the dictionary from previous experiments... again no luck
 3. Re created the phones and Feat.params as well as .filler and fileids....once again still fails on step 7.
 4. But finally after multiple attempts we realized there was an error message saying it was only missing one word 'SH'
     WARNING: This word: SH was in the transcript file, but is not in the
     dictionary ([DEL: SHE HAD A FALL AND UH FINALLY UH SHE HAD UH
     PARKINSON'S DISEASE AND IT GOT SO MUCH THAT SHE COULD NOT TAKE CARE OF
     HER HOUSE SH THEN SHE LIVED IN AN APARTMENT AND UM :DEL] ). Do cases
     match?
 5. We add that and re ran the train, and it worked! So hopefully it will run over night and then I can run a decode tomorrow.
Plan

Feb 8: I will attempt to run first_5hr train, since Colby and I have compiled a solid dictionary list for that. I also know that David was working on getting a better dictionary list, so maybe I can use that to run some other trains such as tiny or mini.

Feb 11: Carefully examine and execute each step of running a train, to cut down on errors. Run a train successfully, as well as a decode.

Concerns

Feb 10: I have noticed that the wiki page on running a train could use some updating. For the first initial steps, those can be eliminated by running train_01.pl, as well as the next set of steps with train_02.pl. However I did these steps manually...For some reason I was getting this error....

 Directory: /mnt/main/Exp/0157
 miraculix Exp/0157> /mnt/main/scripts/train/scripts_pl/setup_SphinxTrain.pl -task 0157
 Making basic directory structure
 Couldn't find executables. Did you compile SphinxTrain?   ////Right here it's asking me if I compiled it. So not really sure what is happening
 miraculix Exp/0157> ls
 bin  bwaccumdir  etc  feat  logdir  model_architecture  model_parameters  wav

But when I just run train_01.pl all the dirs are created with the necessary files. So this will have to be examined further. I also noticed that it would be helpful to add a few steps to help future users run their first train more easily....

 1. Use the most up to date dictionary, which at this point is .0.7a
 2. after creating a dictionary list in your Exp/<experiment#>/etc/ dir then you need to prune it and after the prune you need to compile missing words from the dictionary using Erik's updateDict.pl script
 3. Since I am doing the first_5hr I know that it is missing certain words, since Colby J and I have ran it before. So with Erik's script you pass two params. The master dictionary which is .0.7a 
 and /mnt/main/0116/etc/add2.txt
 4. That should give you a solid dictionary so that you can run first_5hr train without it erroring out, however it may still have errors but it will run successfully.

Week Ending February 18, 2014

Task

Feb 15: logged on.
Feb 16: Logged on.

Feb 17: Because of last weeks failure, I would like to troubleshoot my errors and hopefully get passed them and run a successful train. In addition to this I would like to optimize parameters to achieve the lowest possible word error rate.

Results

Feb 18: I ran a total of 6 trains all with slightly varied parameters. Experiments 0162,0164,and 0166 are 10hr trains. Experiments 0168, 0170 and 0182 are 5hr trains.
Colby J and I wanted to see if increasing the senone value a bit higher than the recommended value, and using varying densites would achieve better WER.

  • Mixtures:
    • Experiments 0162, 0164, 0166, had densities of 8,16, and 64.
    • Experiments 0168, 0170, 0182 had densities of 8,16, and 64 as well.
    • All 6 experiments used a senone value of 5000.

All the trains ran successfully. Then I started the decodes on them and they seemed fine so I left for work. But when I came back they error-ed out for some reason.
I tried to re run them but it tells me that I don't have permission to run them. This is very strange behavior, I will have to do some more investigation to find a work around or a solution.

Plan

Feb 17: After talking with Colby we decided the best way to tackle this is by collecting as much data as possible between the three of us all running different combinations parameters and comparing the results. So what I am going to do is run several 10 hour and 5 hour trains with varying densities with the hope that I will find something worth while to investigate further.

Concerns

Feb 17: It seems that there is much debate regarding the effects of senone values, that being how much they really effect the word error rate. After some research the general consensus in our group seems to be that there is a relationship between the Senone values and the size of the vocabulary in that the larger the vocabulary the higher the senone value. I would definitely like to run some more trains of my own to investigate this theory further. My concern with the senones is that the values could be to high and we could be over training. As a caveat to that our values aren't that much higher than the recommended values so there should not be to much of a difference.

 Vocabulary	 Hours in db	 Senones	 Densities	 Example
   20	            5	          200	             8	         Tidigits Digits Recognition
   100	            20	          2000	             8	         RM1 Command and Control
   5000	          30	          4000	             16	         WSJ1 5k Small Dictation
   20000	    80	          4000	             32	         WSJ1 20k Big Dictation
   60000	    200	          6000	             16	         HUB4 Broadcast News
   60000	    2000	  12000	             64	         Fisher Rich Telephone Transcription

Week Ending February 25, 2014

Task

Feb 22: Logged in. Feb 23: Logged in.

24Feb2014 All work was done collaboratively with Colby C

  • Create a 100hr subset of the full data set
  • Learn about past sphinx training and decode parameters used
  • Attempt to run tests on the 10hr AMs using small data subset
  • Create graphs with Completed decode data

25Feb2014

  • We need to go back and try to re run the decodes again. We talked with David and found that we weren't doing it the correct way so we are going to go with his suggestions, and re run our decodes that way.
Results

25Feb2014

  • So unfortunately our decodes failed... We think that because we tried to test on different data than what we trained against, it failed, but not inherently for that reason. We think it failed because we set it up wrong.
Plan
  • Build 100hr data set from the full data set
    • Create 100hr Dir
      • 100hr
      • 100hr/train
      • 100hr/train/trans
      • 100hr/train/wav
    • Copy 1/3 of the text to a new txt file
    • Upload to server
    • Run copySph.pl to make symbolic links to the SPH files needed
      • /mnt/main/scripts/user/copySph.pl

(Now we have a 100hr data set to train off of)
25Feb2014

  • So the correct way to do this is, to go through the entire process of running the train, without actually running it.
    • Things we need:
      • Dictionary, feats and language model.
      • Then we run a decode as we normally would but change the second parameter to the experiment # to the acoustic model that you would like to test off of.
      • So we will decode against 5hr/test data, as sort of a subset, but our training data(acoustic model) was built off of the 10hr corpus.
Concerns

24Feb2014 Training:

  • Do we need OOV (out of vocabulary) words in transcript or can they be removed
  • Find where inefficiencies lie in the training process

Decode:

  • Interpreting parameter names
  • Time...(paralellization)
  • Creating a decoding with smaller data sets

25Feb2014

  • The Future
    • One of my main concerns looking forward is optimization. Right now we are averaging about 15% to 30% error rates, and 15% is well over trained.
    • After some research Colby and I found that others before us have had much better results some even as low as 7%, so that in mind I would really like to find out what we can do to make our results more optimal.
    • I think we need to look at our dictionary and try to compile that better, however there are so many variables to account for so we need to try and make it less cumbersome.

Week Ending March 4, 2014

Task

March 3rd:
This week was the perfect storm, Thursday and Friday I was out of town for training for work, then while working with Colby and David on Friday night, we managed to overload the server and shut it down for the weekend so that kind of slowed our roll. However looking forward it seems that Colby J has made some pretty valuable progress with parallelization. I would like to run a train using this to see if it works the way he says it does myself.

Results


Plan

March 3rd:

    • I would like to run 4 trains total:
    • A 5hr train without parallel processing, and default parameters
    • A 5hr with parallel processing, and default parameters
    • A 10hr without parallel processing, and default parameters
    • A 10hr with parallel processing, and default parameters

This way I will have solid data to compare... The hope is to prove that Colby's theory is correct. If that is the case then that will be a great step forward.

March 4th:
Setting up a train for 100hr of data with a clean transcription file. So apparently after conversation 3170 there is little to no overlap in the audio files. This is because they changed their collection technique and what we have been noticing is that the overlap is causing our WER to be a lot higher than it should be so this is an attempt to see what 100hrs of training can yield.

Concerns

Week Ending March 18, 2014

Task

March 16rd:
Determine if using genTrans5 vs. genTrans8 makes a significant difference on the outcome of the word error rate while also using the new dictionary which is switchboard.dic. March 17rd:

    • Create the LM
    • Run decode on 0212
Results

March 17rd:
Train ran successfully, hoping to get a successful decode.
March 18rd:
Mimicking Experiment 0209 we will generate a new acoustic model using the smae dictionary and parameters except using genTrans5.pl to compare the results.
Training: 32 Min Decode: RT = 1.08 WER: 33.8

              SYSTEM SUMMARY PERCENTAGES by SPEAKER
 ,-----------------------------------------------------------------.
 |                            hyp.trans                            |
 |-----------------------------------------------------------------|
 | SPKR    | # Snt # Wrd | Corr    Sub    Del    Ins    Err  S.Err |
 |---------+-------------+-----------------------------------------|
 |=================================================================|
 | Sum/Avg | 4659  68616 | 82.2   12.7    5.1   16.0   33.8   94.1 |
 |=================================================================|
 |  Mean   | 58.2  857.7 | 82.0   13.1    4.9   17.4   35.4   94.8 |
 |  S.D.   | 22.1  330.0 |  5.9    4.6    2.1    8.3   11.3    6.2 |
 | Median  | 55.5  813.0 | 82.9   12.1    4.4   16.7   33.6   96.9 |
 `-----------------------------------------------------------------'

So this basically proves that genTrans8 is better than genTrans5 and that the new dictionary is more robust as well.

Plan

Yesterday Colby J ran an experiment using the new dictionary 'switchboard.dic' which differs from our current best dictionary because it includes:

    • All things in brackets:
      • Incomplete words
      • Laughter
      • Words that are difficult to make out, but they transcriber made a guess at the possibilities
      • Also doesn't include lexical stresses

March 17rd:
Run the decode using run_decode2.pl and score the results.

That in mind we want to see if all those key points effect the WER vs. the old dictionary. In Colby's experiment yesterday, 0209 he used genTrans8 so we want to use the new dictionary with genTrans5 to see if there is a difference.

Concerns

If the new dictionary does not provide a better WER, we have to go back to the drawing board.
If it in fact is not better we need to find a way to complete the old dictionary and improve it to get to where we need to be.
If we do find that the new dictionary is better than it will be big step in the right direction. Because this will mean that we need to determine what is better to have our transcription file

    • I.E.
      • having or not having laughter
      • incomplete words etc...

Week Ending March 25, 2014

Task

March 23rd:
Logged in. Read Logs.
March 24rd:

  • Prep for a train on the full set of data (308hr)
    • Extrapolated a dev set
    • Then the eval set
Results

March 24rd:
Ran copySph.pl we had a few errors but after a couple sudo cmds Colby and I were able to get the directories all set up.
Now we are prepared to run the train which we will do tomorrow, while we monitor it closely.

Plan

March 24rd:
Take the full 308hr transcription file and chop off the last 2hrs, which will give us our eval set. This will be our 'unseen' data that we will decode against. The hope is that by training on the full set of data we have a better overall train, in the hopes of getting a good decode on unseen data.

Concerns

Week Ending April 1, 2014

Task

March 30rd:
Logged in.
March 31st:
As it stands right now it seems that previous semesters didn't really know that sphinx came with built in scripts to run trains create models and decodes quickly and easily. Scripts such as runTrain1.pl runTrain2.pl, run_decode2.pl, were created to ease the process of training and decoding. However with deeper analysis Colby J found the sphinx comes setup and ready to go.
April 1st:
Setup and run mini train and decode, the way sphinx intended.

Results


Plan

March 31st:
So instead of all the perl scripts we have been using we will use the one provided by sphinx which is at...../mnt/main/root/sphinx3/scripts/setup_sphinx3.pl -task 0224. So to get setup will run train_01.pl and train_02.pl then followed by /mnt/main/root/sphinx3/scripts/setup_sphinx3.pl -task 0224. But for decode there is sphinx_decode.cfg which is really great because it has many more parameters, one of which is the NPART option, which will drastically reduce the time it takes to run decodes. Our plan for dealing with language models, is to create a unique experiment directory for all our language models, that way we will never have to remake language models. This is advantageous because it saves a lot of memory and frustration. Now instead of creating new ones every time we can just reference what we already have. So overall this will allow experiments to become more streamline, the hopes is to automate running a train and decoding it in as few steps as possible, allowing us to capture as much data as possible, that we can do more analysis.

 One side note: Experiment 0249 is going to be a duplicate of 0241 using the new decode method.
 Using mini train sub corpus with a senone of 3000 and a density of 8

April 1st:
Going to run a train using setup_sphinx3.pl and then decode using sphinx_decode.cfg... As this is the way it was meant to be done.

  • The first step is to get into your directory, from there you simply run train_01 and train_02.pl and the setup_sphinx3.pl script. The nice thing about setup_sphinx script as opposed to the old way is this script will add everything you need but if directories or scripts already exist in your current experiment directory it won't overwrite anything that is already there.

Here is what it looks like

miraculix Exp/0224> /mnt/main/root/sphinx3/scripts/setup_sphinx3.pl -task 0224
Current directory not empty.
Will leave existing files as they are, and copy non-existing files.
Making basic directory structure.
Copying executables from /mnt/main/root/sphinx3/src/programs
Copying scripts from the scripts directory
Generating sphinx3 specific scripts and config file
Set up for decoding 0224 using Sphinx-3 complete

The next step is to edit the sphinx_decode file vi into sphinx_decode.cfg and make the following changes:

  • line 12 $DEC_CFG_SPHINXDECODER_DIR = '$DEC_CFG_BASE_DIR'; (when looking for the decode dir look at our exp number)
  • line 22 $DEC_CFG_SCRIPT_DIR = "$DEC_CFG_BASE_DIR/scripts_pl/decode"; (the script to run the decode)
  • line 43 DEC_CFG_LISTOFFILES = "$DEC_CFG_BASE_DIR/etc/${DEC_CFG_DB_NAME}_train.fileids"; (changed the end of this line to train.fileids)
  • line 44 $DEC_CFG_TRANSCRIPTFILE = "$DEC_CFG_BASE_DIR/etc/${DEC_CFG_DB_NAME}_train.trans"; (changed the end of this line to train.trans)
  • line 50 $DEC_CFG_LANGUAGEMODEL_DIR = "$DEC_CFG_BASE_DIR/LM"; (changed the tail end to LM do have a separate directory)
  • line 51 $DEC_CFG_LANGUAGEMODEL = "$DEC_CFG_LANGUAGEMODEL_DIR/tmp.arpa"; (tmp.arpa is used for smaller LM's as opposed to the dmp which is binary and faster, but arpa is more accurate)
  • line 60 $DEC_CFG_NPART = 2; (distribute processing)
Concerns

March 31st:
Right now we are going into this blind, all we are going off of are comments in the script files themselves and some vague references online. We know it is a large undertaking and it will be a lot of trial and error but hopefully it will pay off in the end. Another major concern is that we may have conflicting versions, Colby J found out that we may be many versions out of date.

Week Ending April 8, 2014

Task

April 5th
Logged in.
April 6th
Logged in.
April 7th
Determine a plan of action for the remaining weeks of the semester.

Results


Plan

April 7th
So after some analysis of my teams' skill set it seems that it may be a good idea to have them start doing research of literature, while I work with Forrest to continue or work on modeling.
Possible research topics

  • Trainer configuration
    • recommended senones
    • mixture densities
    • language models
    • corpuses
    • dictionary data
    • estimation statistical parameters of train data HMM's reliability
    • errors and warnings during trains
    • data anaylsis
  • Decode Parameters
    • Dev and Eval sets
    • Determine database complexity

I also may take it upon myself to clean up some of the information on the wiki, to bring it up to date with many of the modeling groups findings. This could also be another team task. But I plan on working with Colby J and David to do this, as a side task to the competition. It is really critical that we document all of our findings thus far.

Concerns

It seems that although we have had some pretty low error rates such as 15%, this may be grossly inaccurate, because of over training of data. This will definitely require more investigation.

Week Ending April 15, 2014

Task


Results


Plan


Concerns


Week Ending April 22, 2014

Task


Results


Plan


Concerns


Week Ending April 29, 2014

Task


Results


Plan


Concerns


Week Ending May 6, 2014

Task


Results


Plan


Concerns