Speech:Spring 2015 Samuel Sweet Log


 * Home
 * Semesters
 * Spring 2015
 * Proposal
 * Report
 * Information - General Project Information
 * Experiments - List of speech experiments

Week Ending February 3, 2015

 * Task:

1/30
 * Read logs from last semester of students who were in the modeling group.

2/2
 * I installed Linux and Virtual Box so I could begin writing Perl scripts to get more comfortable with the language.
 * After writing a few basic scripts I looked over the preexisting scripts that are on the Speech:Scripts page.
 * I read more on the project to have a better understanding of what the scripts were used for and why they were needed.

2/3
 * Read other students logs from this semester and last semester.


 * Results:

2/2
 * After installing Linux on my Virtual Box I did not like the speed at which it ran. I then decided to install a Linux partition on my hard drive to have a quicker operating system. Windows 8 does not allow some flavors of Linux to be installed with ease due to UEFI mode.  However I was able to tweak a Grub Bootloader made for Windows 7 to circumvent the issue.

My goal this week is to familiarize myself with the objective for the modeling group. This will allow me to have a better understanding of the tasks that need to be accomplished so I can write a proposal and come up with a schedule.
 * Plan:
 * Concerns:

My concern is that I do not fully understand the objective for my group yet. Reading past logs and the project documentation has clued me in a bit, but I think I am still missing a big piece of the puzzle.

Week Ending February 10, 2015

 * Task:

2/7
 * Read students logs from this semester.

2/8
 * Read students logs from this semester.

2/9
 * Logged on to Caesar
 * Got familiar with navigating through the directories
 * I read on the wiki about some of the scripts while I read through them on Caesar. I noticed that many of the scripts were not documented and I had to figure out why, and if they needed to be documented at a later point.
 * There is 9 GenTrans*.pl files. After reading Colbys logs from last semester I saw that they were modifying these scripts to tune them for a better output of the transcript file.  Based on his graphs I think he was on to something and I think I am going to continue to build off his work.
 * I planned on running a train today, but as I read through the tutorials on the wiki on how to do so I found myself confused. There is multiple tutorials on how to run the train and both were created last semester.  One of them is completely automated and the other is not.  The automated one does seem to be the easiest, but I do believe that we are no longer supposed to use this file.  I will be running both methods tomorrow to familiarize myself with both methods for which ever way ends up being the correct method.

2/10    cd /mnt/main/Exp
 * There was problems when trying to run the trains. I encountered paths to files were wrong.
 * Mohammad was able to fix the file paths to be in the correct locations so now when ever anyone needs to run a train the files should be in the correct locations.
 * Mohammad also informed me that he had to change some other configurations to get a train to run properly.
 * We have decided that the automated version of the train script will not be used this semester. The tutorial the David Meehan created will be the one that will be used.
 * That being said the wiki makes it very hard to find this tutorial and I will be removing the other two tutorials and adding Davids to the main Run a Train section.
 * It had been pointed out that we should be using a smaller file such as a 5 hour train for the first run instead of the 100 hour train that the tutorial suggests.
 * The results of the train will be posted here tomorrow as it is currently running.
 * Steps used to run train

mkdir 0261

cd 0261

/mnt/main/scripts/user/prepareExperiment3.pl first_5hr/train

/mnt/main/scripts/user/generateFeats2.pl

nohup scripts_pl/RunAll.pl. & After first run of the train. (Failed) Something Failed: (/mnt/main/scripts/users/scripts_pl/20.ci_hmm/slave_convg.pl)
 * Results:

Need to diagnose what this means and why it is causing the train to fail.

My concern is that there are far to many undocumented scripts. The modeling group from last year did a lot of work with GenTrans files, but I am not sure on what is different between each of the files. I am going to have to read through each script and figure out what each is trying to tune if I want to be able to build off their work from last year.
 * Plan:
 * Log into Caesar (Complete)
 * Navigate and explore the directories (Complete)
 * Research what last semesters Model group was working on (Complete)
 * Run a train (in progress) - Some errors within the tutorial and file locations caused this to be delayed.
 * Concerns:

Week Ending February 17, 2015
2/14
 * Task:
 * Read logs

2/15
 * Read logs

2/16 /usr/local/bin/sphinx3_decode.pl run_decode2.pl run_decode2.pl 0261 0261 1000 more decode.log /mnt/main/scripts/user/parseDecode.pl decode.log ../etc/hyp.trans cd ../etc sclite -r 0261_train.trans -h hyp.trans -i swb >> scoring.log
 * I created a language model based on the steps provided on the wiki page
 * The next step was to run a decode on the train that I had done a few days ago
 * The wiki for runnning a decode was one Linux command using the script slave.pl that took care of creating all the needed directories for you as well some other things.
 * Mohammad and I could not get this script to work and kept recieving errors
 * We talked to Justin Thibeault who has had success running Trains and Decodes. He also said that using the script slave.pl did not work for him.
 * He had pointed me to the correct script that is located at
 * Before jumping right into this script I decided to search through some previous experiments to see how they were running their decodes.
 * I had found a script called
 * This script takes three arguments: taskName, ExpID, Senone count
 * I used the script for my decode in the following way:
 * In this case my ExpID and taskName were the same and I used the defualt senone count of 1000
 * The script creates a log file called decode.log
 * I looked through that file
 * The log file presented much information, but much of it was not useful to just read through
 * The next step was to get the scoring from the decode
 * I used the following steps to achieve this
 * This strips out all the error and status messages from decode.log and places them into a new file called hyp.trans
 * In this step sclite appends scoring data to scoring.log. It provides a table with error rates as seen in the results section below
 * Now that I have successfully run this train and decode I will have to work on establishing a better baseline
 * I will be updating the Decode tutorial soon so others can successfully run decodes without the headaches I went through

ssh: connect to host caesar.unh.edu port 22: Connection timed out
 * Wanted to get some work done on the tutorial, but could not ssh into Caesar. Keep getting following message
 * Time is 1:44PM
 * Attempted to SSH in on my other computer to see if I was having a network issue, but same message was displayed.
 * Also attempted a log in through Serfish (a web based ssh client http://www.serfish.com/console/). Still  no luck.  Caesar seems to be offline at the moment.  I will try again in an hour or so.


 * I was able to get back into Caesar at 6:00PM
 * I wrote the first draft of the new decode tutorial which can be found here http://foss.unh.edu/projects/index.php/Speech:Run_Decode#Setup_the_Decode_Directory_and_Run_the_Decode
 * I will continue to update this tutorial as things change or people report problems with it

2/17 nohup /mnt/main/scripts/train/scripts_pl/RunAll.pl
 * Started a new sub experiment of 0261(002)
 * Ran the train and realized that I should have just copied the train data from 0261(001) into my new sub experiment since the data would end up being the same
 * I need to read up more on advanced Sphinx Configuration because setting different senone counts in the decode will not be sufficient to get a low enough baseline.
 * I read through Arols logs from the summer where he did extensive work with training and decoding.  He presented some very meaningful notes on the configuration of Sphinx.  I tried to follow what he had written about it to get another train and decode running.  However, he did not provide step by step details on what he did, but rather vague contergories of when he used certain commands.  This was aggravating to say the least as the configuration part seems to be the least documented part and the most important.
 * I went back and examined the wizard style train script. This script actually does provide input to set Senone and Density values so I decided to give it another try even though previous attempts with this script have failed.  I got through the whole script and now it was time for me to run the command:
 * This script ran for about 10 minutes and shot back an error at the end saying it could not verify specific files.
 * I repeated the last two steps many times and kept getting the same result
 * I continued on to creating a language model and running a decode to test if this error message may just be normal. I got to the last command and it became clear to me that no this was not normal.
 * I have decided that I will need to go step by step through the wizard style train script to do 1 of 2 things
 * Search through the script and fix whatever is causing RunAll to not function properly
 * Reverse engineer how the configuration of Sphinx works by investigating how they feed the parameters to Sphinx


 * Results:
 * Results after first successful train and decode

,-.     |                            hyp.trans                            | |-|     | SPKR    | # Snt # Wrd | Corr    Sub    Del    Ins    Err  S.Err | |-+-+-|     | sw2001b |   24    310 | 80.3   17.1    2.6   30.6   50.3  100.0 | |-+-+-|     | sw2001a |   19    390 | 66.9   27.2    5.9   20.5   53.6  100.0 | |=================================================================|     | Sum/Avg |   43    700 | 72.9   22.7    4.4   25.0   52.1  100.0 | |=================================================================|     |  Mean   | 21.5  350.0 | 73.6   22.1    4.2   25.6   52.0  100.0 | | S.D.   |  3.5   56.6 |  9.5    7.1    2.3    7.2    2.3    0.0 | | Median | 21.5  350.0 | 73.6   22.1    4.2   25.6   52.0  100.0 | `-' My concern is that I will not be able to updte the tutorial in time for other people to follow this week. If I do not have time I will direct them to my log which hopefully provides enough detail for the to run a decode.
 * Plan:
 * Run more trains and decode to get more comfortable with the whole process
 * Update the decode tutorial for other people to follow
 * Begin to figure out how I will try to establish a better baseline
 * Concerns:

Week Ending February 24, 2015
2/18
 * Task:
 * Found this helpful link for tuning Sphinx parameters: http://www.cs.cmu.edu/~archan/s_info/Sphinx3/doc/s3_description.html#sec_args_overview
 * Made minor breakthroughs with what is causing many of the scripts from last year to fail
 * Much of this had to do with changed directory paths

2/19
 * Exploring the verifyAll.pl to try to find why phones were not in transcripts but were in phonelist. I got to the bottom of the scripts and came across which could end up being a ery important comment
 * 1) General idea for senone:
 * 2)   10 hours = 3000 cont. 4000 semi.
 * 3)  100 hours = 8000 (cont and semi)
 * 4) Rate of increase between the two is very small.

/mnt/main/corpus/switchboard/first_5hr/clean/wav
 * It would seem that senone count has to increase when the amount of data increases. I had not seen anyone venture that high into the senones so it may be something important to look at.
 * Collaborated with Dakota to get the soft links fixed for the sph files.
 * Files at location
 * are now correct and will run properly on genTrans5.pl
 * Dakota also fixed 125hr soft links as well. However, I need to discuss some more with him about getting a distinct directory structure for all switchboard directories.  125hr is set up differently than first_5hr and requires me to manually input the path into the genTrans5.pl file.  No big deal, but consumes to much time when trying to run trains.
 * Need to clean up some more files and write up some clearer tutorials for decoding and running train

2/23
 * Checking in

2/24 Start Senone: 3000 End Senone: 8000 Increment: 1000 Data: 125hr_3170 Senone: 8000 Density: 2 - 16
 * Tried to hold off on running anymore trains this week until all scripts and directory structures get situated
 * Looked up some Sphinx Configurations. Thought senones has already been analyzed in great detail be passed semesters I have not seen anyone play around with starting and ending senones.  This will slowly increase the senones throughout the train.  This is important because as data grows the senones should also grow.  It seems like 3000 senones is appropriate for data that is less than 10 hours in length and 8000 senones should be used with data that extends over 100 hours.  I am going to run a 125hr train with these configurations.  However the current structure of the 125hr switchboard files does not follow the first_5hr directory structure which means I need to manually edit the sphinx_config file by hand to get this to run.  This could end up being very time consuming.  May end up writing a quick addition to an existing script to account for this directory structure.
 * I will be posting results of this train/decode when it is finished
 * Created Experiment 0267
 * This experiment is going to alter start and end senone values
 * Initial experiment configurations:
 * Hoping that the incrementation from 3000 to 8000 as data increases will present a lower rate of error
 * Update: My internet keeps going down so I keep getting kicked out of my SSH session. Wont be able to run the train today it seems like
 * Read up more and since we are using continuous instead of semi continuous or discrete models currently. I was curious as to why and thought that it might be a good idea to look into this.  Explanation: http://www.speech.cs.cmu.edu/sphinxman/tech2.html
 * Sphinx 3 provides no advantage in using semi-continuous models over continuous models. I will be changing experiment 0267 to focus on something else since I did not get to train on it yet.  Source: http://www.inference.phy.cam.ac.uk/is/papers/baseline_wsj_recipes.pdf
 * From everything I read it will be best to keep these configurations:


 * The focus for this semester should be pruning dictionary and generating transcripts as they are most likely causing the WER to falloff with more and more data. I will continue to look over the whole training process to determine where I can make improvements.


 * Results:
 * Terminated initial Exp 0267 due to research of continuous and semi continuous models. Found that no advantage to using one over the other and some reports said that Sphinx 3 decoder does not support semi-continuous models

My main concern is that all the work that was done last year is still scattered around and not clear what they did. Im working hard to fix the mess they made with the documentation and hopefully everything will start running smoothly by the end of this week. Sphinx configuration is still not going all to well and verify_all.pl still fails at phase 7.
 * Plan:
 * Diagnose issue with verify_all.pl file that caused verification to fail at phase 7
 * Clean up broken scripts from last semester
 * Find more Sphinx parameters than can be used to tune model building
 * Rewrite tutorials for running train, configuring Sphinx, and decode
 * Concerns:

Week Ending March 3, 2015
2/25
 * Task:
 * Fixed scripts that were wrong due to directories being changed.
 * Talked with my team and split up tasks for the upcoming week
 * We planned to have everyone run another train to verify the validity of the current tutorials.

Note to self: fix pruneDictionary4.pl. It is not creating/finding .dic file


 * Right now it is looking like it might just be easier to rewrite the pruneDictionary script
 * There is currently 4 different versions of this script. Each one is no where close to the other ones
 * The problem is (as for most of the scripts) wrong file paths, non-existent files, and in pruneDictionary4.pl's case some code that is nonfunctional
 * The script is failing to create the .dic file which is a must have for running the train. Without decoding will unlitmatley fail and in most cases lead to a segmentation fault

2/28 Cannot open file /mnt/main/scripts/user/genTrans11.pl on line 29 Found that the path /mnt/main/corpus/switchboard/256hr/clean/train/train.trans does not exists and rather /mnt/main/corpus/switchboard/256hr/clean/256hr.trans existed instead sudo cp 256hr.trans ./train.trans
 * Working on pruneDictionary4.pl to fix it not creating the .dic file
 * Ran into an issue running 2546hr train
 * Again, this is just more lack of consistency created by semesters previously. I fixed this by copying that 256hr.trans file and renaming the new copy to train.trans
 * sudo was needed because of the permissions granted to that file not be sufficient
 * Ran train now and it runs through genTrans11.pl with no errors. Run time for generating the transcript on 256hr is around 8 hours or so.
 * 256hr will take days to train and decode
 * My network card keeps going down mid way through this train. Could end up being a problem.  Will try wired connection if network card keeps acting up.

Segmentation Fault
 * Running a 5hr train and decode to try to trouble shoot the error that some other people have been having
 * I believe the error comes from training, but I will be checking the run_decode5.pl script to see if this may be causing the error (if the error does still exist)

3/2 1. Using prepareExperiment3.pl will not work with corpus data */clean because of directoy path problems. All corpus data for prepareExperiment3.pl must be */train 2. run_decode2.pl is deprecated. run_decode5.pl should be used instead as it manages the argument list better then run_decode2.pl 3. If decode only takes a few seconds to run, then decode failed.
 * Stephen contacted me and I had gave him some new information I have discovered people may not know about running a train and decode


 * My 256hr train is still running. I let the genTrans11.pl script run over night due to it taking hours to do so.  I have yet to be able to continue on passed this part to pruneDictionary4.pl and genPhones, but I hope to have that process started later today.  I expect the actual training to take quite some time.  I estimate a few days or so.
 * I have ran another 5hr train using stock configuration to check that the tutorials are up to date and can be followed. I can follow them, but I know some other people can't and I need their feedback to fix them.  There is small semantics that people figure out the more they run them and that can lead to the problems people are running into.
 * I was hoping Nick would have the tutorial he said he was going to be writing emailed to me by this point, but still not word on that.
 * From the looks of it I think the bootcamp is going to be necessary as not enough people have successfully ran a train throgh decode yet.


 * Cant SSH into Caesar (6:35PM)

3/3 -
 * Figured out what was wrong with pruneDictionary4.pl
 * Fixed it locally, but need to fix it on Caesar
 * Last steps will be to insert SIL and generate phones
 * It is unlikely master_run_train.pl will be able to be used because the pipe breaks to Caesar when it runs for multiple hours on end. This makes it impossible to get back to the phases one leaves off on when the pipe does break.  I am working on some ideas that may be useful to fix this problem.
 * I am going to write a tutorial for this whole process I think once I fix all the scripts.
 * However, running all the steps for this (future) tutorial will take hours and will not be very efficient.
 * Caesar is currently down (7:09PM)
 * Melissa unplugged Caesar from the network, but Kyle informed me its back online.
 * He said not to do anything with any files because the transfer is taking place.


 * Results:
 * fist_5hr train/decode was ran succesfully for the 5th time. Just doing them to make sure they still run fine when I modify scripts.
 * first_5hr train/decode was ran successfully again. This time I went through the whole process to help Stephen.  It ran fine so I suggested things he may have done wrong.

GenTrans GenPhones pruneDictionary
 * Plan:
 * Check that tutorial is up to date
 * Fix rest of the configuration scripts
 * Run 256hr train
 * Research more on how to get a lower WER
 * Concerns:
 * No major concerns at this time. I think people may still be confused on how to run a train.  I can follow the tutorials (because i wrote them) but it seems like a few people can't so I need to get that squared away.

Week Ending March 10, 2015
3/4
 * Task:
 * Clean has been removed from all switchboard directories
 * This has caused much problems in prepareExperiment3.pl
 * Need to look at master_run_train.pl since it most likely has broke now too.

3/8
 * I haven't had any time to work on fixing the broken scripts yet
 * Dakota emailed me the other day about a missing wav directory that seemed to be breaking genTrans11.pl
 * It seems as if wav has just been renamed and I will need to try the new directory name to see if that fixes the problem
 * Read logs today

3/9 /mnt/main/scripts/user/prepareExperiment4.pl /mnt/main/corpus/switchboard/first_5hr/train/audio/utt
 * Started my adventure into fixing the trains for a second time now
 * Started my focus on
 * It seems that the script that is causing the problems is genTrans11.pl.
 * I emailed Dakota about fixing the soft links at
 * It seems that these are needed in genTrans11.pl and they are currently pointing to the wrong location
 * Hopefully Dakota can get those fixed by tomorrow
 * Until then I am just a sitting duck. I will continue exploring and see if I find anything else.  Also going to test out master run train as well to see how broken that is

/mnt/main/scripts/user/generateFeats2.pl /mnt/main/corpus/switchboard/full/audio/utt /mnt/main/corpus/switchboard/full/train/audio/utt
 * Found a path issue within:
 * The path is supposed to point to all the utterances that exist in the switchboard but it was pointing to a dead location
 * I made the following change from:
 * to:
 * However, this didn't not fix the current problem I am running into
 * This needed to be fixed regardless so when Dakota fixes the soft links tonight or tomorrow I will not have to deal with that headache.

3/10 sudo chmod -R 777 wav /mnt/main/scripts/user/generateFeats2.pl nohup scripts_pl/RunAll.pl. &
 * May have found something important with the utt spy files. It seems as though when the wav directory is made in the experiment you are working they do not have the proper permissions.  This is causing the genTrans11.pl not to be able to read the sph files.
 * I did the following command on the wav directory to change all the permissions to give everyone read/write/execute permissions because it doesn't matter who does what with this directory
 * I needed sudo as I did not have the required permissions to change permissions on wav. I did -R for a recursive change of permissions for all files in the wav directory.
 * Not to generate features
 * This file creates soft links for sph in the wav directory and then called make_feats.pl which is a CMU file (do not edit this file)
 * Ran the train now to build the language model, decode, and score.
 * Unfortunately the train failed on RunAll.pl
 * None of the model_parameters are being created.
 * Need to look at which file is supposed to be creating these so I can fix them
 * The sph file was a huge accomplishment though and will make things a lot simpler now that all steps through prepareExperiment4.pl are working successfully.
 * Reading other peoples logs to see what everyone else has been up to this week.

/mnt/main/corpus/switchboard/full/audio/utt
 * Professor Jonas and I took a look at the whole training process and why it was failing
 * We discovered that genTrans11.pl was recreating all the sph files instead of softlinks to:

/mnt/main/scripts/user/prepareExperimentMJ.pl
 * prepareExperiment4.pl is now deprecated and the new file is:
 * prepareExperimentMJ.pl now takes in two arguments 
 * i.e: prepareExperimentMJ.pl switchboard first_5hr/train

/mnt/main/scripts/user/genTransMJ.pl
 * We fixed this file and it is now this path:
 * genTransMJ.pl now takes in a third argument 
 * i.e: genTransMJ.pl /mnt/main/corpus/switchboard /mnt/main/corpus/switchboard/first_5hr/train 008

/mnt/main/scripts/user/pruneDictionaryMJ.pl
 * pruneDictionary4.pl is also deprecated. The new file is now:
 * pruneDictionaryMJ.pl take the arguments
 * i.e: pruneDictionaryMJ.pl /mnt/main/corpus/switchboard etc/008_train.trans 008


 * Results:
 * Fixed the problem I was having with the sph files in genTrans11.pl
 * Got the prepareExperiment4.pl script to run with no errors after some adjustments
 * Trains now seem to be back up and running thank to the help of Professor Jonas


 * Plan:
 * Fix current issues with running the train (primary task)
 * Finish configuration tutorial (secondary)
 * Concerns:
 * I am really concerned that I will not be able to get the train scripts back to a working state by Wednesday. The messed up directories we discovered last week has throw major curve balls my way and I need to figure it all out.
 * I am hoping that I will at least get the 5 hour train squared away so when we do bootcamp I will have something that will be able to run 100% of the way through.

Week Ending March 24, 2015

 * Task:

3/21
 * Caesar is down so I couldn't run my 256hr train I planned on
 * Talked to Stephen about doing a Hangout to start generating ideas to improve the baseline
 * We scheduled the Hangout for tomorrow. As of now it is just Stephen and I.

3/22
 * The Hangout did not happen today due to Caesar still being down. We decided to hold off on it until we hear a responses back from the Systems group about when the server will be back up.

3/23 -
 * CAESAR IS STILL DOWN
 * Stephen and I are doing our Hangout at 5 tonight to discuss how to improve the baseline so we can come to class prepared to update our team on where we should go from here.
 * I have found some new parameters that could be tuned. They seem to not have been touched at all over passed semesters and could be promising.
 * These will be added to Bruins team log to keep it under wraps from the other team.
 * Checking in again - Caesar still down at 2:05 PM
 * Emailed Mohammad to see when this issue might be resolved

3/24
 * Caesar is still down
 * Professor Jonas got Caesar back up and running. No time today to do anything.


 * Results:
 * Caesar has been down --- No results


 * Plan:
 * Create a plan on what to test to improve the baseline
 * Run a 256hr train
 * Clean up some tutorials to make the consistent
 * Update the model group log (still empty)
 * Do some more research into tuning Sphinx


 * Concerns:
 * The major concern is that caesar is down and no one in the group will be able to train. Some members have never run a train and we have now lost a week for them to learn due to spring break and the server being down.  I will do a train with everyone in class so they can see how straight forward it is.  I will also be showing them how to tune Sphinx even more for advanced configurations which include Density and Senone values.

Week Ending March 31, 2015
3/26
 * Task:
 * Added to the modeling group log things that were discovered this semester and fixes that were made based on those discoveries.
 * Yesterday the Bruins has a very productive meeting. We went over all the scripts needed to run a train as well as what they did.  This should have gotten those who have yet to run a train up to speed on the process and why we do each step.
 * We picked some things we would like to try out to attempt to improve the baseline.
 * Strategies were formed on how to make the best use of our time and get the most results as possible.
 * I started prepartion on my 256hr train yesterday as well.
 * Generating transcripts took about a half hour
 * Generating features took around 2-3 hours
 * All Sphinx configurations and other secret spices are in the Bruins team log
 * When that all finally finished I did nohup scripts_pl/RunAll.pl . &
 * The train is still running and I expect it to for possible another day

3/27 Command not found /mnt/main/Exp/0261/007/DECODE/run_decode5.pl is a copied version of run_decode5.pl that would say "command not found" if ran /mnt/main/scripts/user/run_decode5.pl nohup perl run_decode5.pl 007 0261/007 1000
 * Ran my 256hr train this morning since the one last night failed at Phase 3.
 * Read the log file and made adjustments and now the train is running fine
 * Zach and Russ emailed me about not being able to run perl scripts the following error would pop up
 * I investigated since I was able to run scripts this morning and they were right I was no longer able to run scripts
 * I poked around to the other drones and tried to run scripts and it seemed like masterix was having problems and the directories were not there
 * I went into brutus and then tried to run the same scripts that would fail on caesar and they worked.
 * I suggested that my group runs on a drone machine for now.  Hopefully perl scripts come back to life on caesar
 * Another fix I found was to not use copied version of scripts
 * For instance:
 * However running the following command of the original file worked fine
 * Not sure what is going on, but I think systems group needs to take a look at this
 * Yet another fix is to type "perl" in front of the command
 * The perl issue went away. Not sure why, but its gone!
 * Informed my group of the good news and work can continue as normal

3/30
 * Not much done today because I could not get into Caesar
 * It was actually possible to get on using root
 * Diagnosed some issues for members of my team and came up with some solutions for the time being

3/31
 * My 256hr failed due to something with Caesar after it was running for a day
 * I started another 256hr train (annoying : | )
 * Hoping for the results to be in in a couple of days
 * Going to do some smaller trains so I can test some parameter changes i want to make before applying them to larger trains
 * Caesar was fixed today and accounts were reset
 * Still having a problem with run_decode5.pl script shooting out command not found
 * Will fix tomorrow


 * Results:
 * Waiting on results from 256hr train (failed)


 * Plan:
 * LOWER THE BASELINE
 * Run trains with preplanned parameters and values
 * Do some clean up where possible to hopefully remove errors causing WER being poor
 * Concerns:
 * I do not have any concerns as of now. All tasks for each member of my group are clear cut and very doable.

Week Ending April 7, 2015
4/3
 * Task:
 * The Bruins have there first baseline for a 256hr train (61.5%)
 * That is not good, but this was not tuned in any way
 * Professor Jonas pointed out that we should be using subsets of of the 256hr train to decode on so that the decode does not take 256 hours.
 * I am currently working on the best/easiest solution for my group so that we can use a subset to decode
 * Read up more on LDA and it could give a 25% decrease in WER or it could do nothing...Not sure what is causing the variation between peoples baselines

4/5
 * Didn't do much today besides eat food

4/6
 * Been sick and haven't got much done this week
 * Ran a train on 256 and not I need to decode
 * Talked to Dakota about fixing 125hr train since it only contains 5 hours of audio right now. Still waiting on a word back from him to see if it has been fixed
 * Trying to figure out the best process for decoding the 256 hour train so we can use 5 hours to decode instead of all 256hr. This is just so we save time.
 * I think I have found a variable value for tuning that seemed to have lowered the baseline a significant amount on a 5hour train. I want to apply this to my next 256 hour in hopes in has a similar result.

4/7
 * Russ has informed me that him and Mohammad achieved a 15% WER on a 5hr train
 * We will be analyzing what they did and build off of it to try to get that even lower and also apply it to the 256hr
 * My train will be done running shortly. I have a good feeling about it!


 * Results:
 * No results yet
 * Russ got 61.5% error on untuned 256hr


 * Plan:
 * TRAINS TRAINS TRAINS
 * Some tuning
 * and more TRAINS
 * Concerns:
 * No concerns right now. We seem to have figured out work arounds for all major issues we were having previously.

Week Ending April 14, 2015
4/07
 * Task:
 * Worked with my group today, and figured out where we should go from here
 * Zach found some good information for us to use that helped him achieve a very good result
 * We are going to search for the perfect tuning for this parameter as well as adjust our current parameters to get this even lower
 * Drone machines were assigned to us and we divided them among our teams members

4/12
 * Not much today. Was reading the intense email chain between Dakota and Professor Jonas.
 * Good to see 125hr is now fixed (need to test)
 * Will start running 125hrs as the 256hr are very time consuming

4/13
 * Trying to diagnose issues that Kayla and Morgan are having with decoding
 * Think that the transcript file in 125hr is names wrong. Need to talk to Dakota or someone else in data group about it

4/14
 * Decode for my 256hr train finished and the results are extremely promising
 * Will be formulating new plans for my next train based on these results
 * Should be able to get the WER down by at least another 5% on the next run
 * Would like to get an experiment running with the current tunings on a 125hr to see how it compares to last years results
 * Will do this when 125hr is working 100%
 * My WER keeps going down as the decode continues....good news


 * Results:
 * Results are being kept secret from this wiki currently


 * Plan:
 * Continue on with the information Zach presented the group this week
 * Run 125hr trains, and 256hr trains
 * Hopefully get a lower baseline than last week


 * Concerns:
 * No concerns currently. Everyone seems to be on track and issues that were unresolved last week are now resolved.

Week Ending April 21, 2015

 * Task:

4/15
 * We built bridges


 * Results:


 * Plan:


 * Concerns:

Week Ending April 28, 2015
4/23
 * Task:
 * Started a new 125hr train with secret configurations
 * Talked to Stephen and Zach about where to proceed from here and the strategy we should take
 * Did some research into some more classified stuff that will be disclosed at the end of the semester
 * Reran a decode on a previous train to test some changed I made to some files

4/24
 * Decode finished and I scored that
 * Results are what i expected
 * 125hr is finishing up today so i did some prep work for the decode
 * Running the decode on the 125hr when it finishes
 * Will do some midway scoring to see if the end will results may be promising

4/27
 * I tried to run a train with force align because I read about how bracketed dictionary words could be used for force align.
 * I knew that at least some words in the dictionary were bracketed so I thought this would be worth a shot as it could have led to a decrease in WER by a few %.
 * The train did run successfully, but its output into model_parameters is much different than when force_align is not set.
 * The current run_decode.pl script can not currently decode on the forc_align output
 * I will be searching online to see if there is a solution for running the decode with this output
 * If not I may try to hack together a script to account for the new model_parameters
 * Will decide if it is worth the time and either terminate the experiment or go forward with it
 * New train starting later with a few new ideas I have
 * Decoding on existing trains to get the real factor time correct

4/28
 * Going to concentrate on decoding and the LM as i believe the training portion is as concrete as it can get right now
 * Running some decodes and trying to figure out why the decode settings are saving after the initial run
 * Got up on peoples logs to see what both groups have been up too


 * Results:
 * Kept under wraps in secret document

No concerns as of now
 * Plan:
 * Run 125hr trains
 * Run decodes on previously run trains
 * Research into a new area that has yet to be looked at
 * Concerns:

Week Ending May 5, 2015

 * Task:


 * Results:


 * Plan:


 * Concerns: