Speech:Spring 2016 Matthew Heyner Log


 * Home
 * Semesters
 * Spring 2016
 * Proposal
 * Report
 * Information - General Project Information
 * Experiments - List of speech experiments

Wednesday, 2/3/16

 * Task:
 * Read Documentation/logs and cross-reference it with what the actual configuration on Caeser.
 * Consult with the Systems group to see if they need assistance or have them let us know when the DNS issue is resolved so that we can connect to foss.unh.edu again and begin using the createWikiExperiment.pl script again.


 * Results:
 * It appears the the Experiment Directory is an inaccurate representation of what is currently being generated inside the experiment folders.
 * The current experiments are standardized in the following location /mnt/main/Exp/%expNum%/%subExpNum%/ Inside each of these experiment folders are a set of generated directories that are all set up to be repositories of scripts, logs, or wav files. The following are the directories being generated, these are subject to change through the semester.

LM     qmanager trees bin etc logdir model_parameters scripts_pl bwaccumdir feat model_architecture python wav 005.html --> an html file


 * Plan:
 * Discovering how scripts interact with each other to build the experiments directories and what the directories are supposed to be used for and what they do.Potential task of repairing the inconsistencies with the documentation.


 * Concerns:
 * Not sure what direction our group is moving in just yet, but I'm sure that we can find something that needs to be done.

Thursday, 2/4/2016

 * Task:
 * Review the createWikiExperiment.pl and  createWiki_Sub_Experiment.pl scripts and see how they could be ::improved from Spring 2015. This will involve learning Perl specifically the Wiki API Morgan used to develop those scripts.
 * Results:
 * I ran the scripts on Caesar and it appears to work but has a few oddities to note.
 * when entering values the backspace and arrow keys give back ASCII equivalents rather than performing the intended actions
 * when the experiment directory is created it does not nest it within the on the Speech:ExpDir page so it must be corrected manually.  nesting issue no longer a problem.


 * Plan:
 * Dissect the script in more detail so that I can automate the process better, maybe also include the creation of the /mnt/main/Exp/%expID%/ directory so that users will only need to ::run this scripts to get the train started.


 * Concerns:
 * Lack of knowledge of Perl.

Friday, 2/5/16

 * Task:
 * Starting a train in the Exp directory 0282/001 to see how the training process works.


 * Results:
 * I attemped to run the following script

perl /mnt/main/scripts/user/prepareTrainExperiment.pl first_5hr/train
 * This failed.
 * Once I realized the following code was formatting the arguments for me. The following code snippet are from lines 13 and 14 in teh prepareTrainExperiment.pl

$corpus = "/mnt/main/corpus/". $ARGV[0]; $corpus_dir = $corpus. "/" . $ARGV[1];
 * I changed the execution of the script to the following to account for the atomated formatting.

perl /mnt/main/scripts/user/prepareTrainExperiment.pl switchboard first_5hr/train
 * Then I followed the wiki's instructions on step 5 and ran the next command.

perl /mnt/main/scripts/user/generateFeats.pl
 * I observed the flow of text on the screen and waited.
 * I did some looking into what the generateFeats.pl script actually does.
 * It appears to call make_feats.pl in the following location. /mnt/main/Exp/0282/001/scripts_pl/ which generates the .sph files required for the train. I'm not sure if this is correct or not I'll have to consult with another group member or group to see what their thoughts are.
 * Once the script was finished and I was satisfied with my understanding of what it did I followed step 6 in the wiki instructions and ran the following command.

nohup scripts_pl/RunAll.pl. &
 * The train has been started in /mnt/main/Exp/0282/001/ at roughly 7:40PM.


 * Plan:
 * I'll check tomorrow morning to see how the training has gone.


 * Concerns:
 * I logged back onto Caesar to check something after starting the train and when I used the exit command to logout it stated jobs have been suspended I hope this does not mean the RunAll.pl script has stopped, otherwise I will need to run another sub-experiment.

Saturday 2/6/2016

 * Task:
 * <8:30>Check to see if the train that I started last night finished successfully. If it has then build a language model and run a decode.
 * Reviewed some of Sam Sweets logs to see how he had to do this the first time. It appears that it was far more complex to start a train last Spring. Thanks to the work of the Spring 2015 Capstone class this years Spring 2016 class is able to successfully begin experiments on the first week.


 * <9:25>: Followed the wiki to score my decode.


 * Results:
 * <8:30>Followed the wiki on building the language model with no issues.
 * Followed the Decoding tutorial on the wiki with no issues.
 * Note
 * The last step of beginning the decoding process is done by issuing the following command

nohup perl run_decode.pl 001 0282/001 1000 &
 * Just to make sure the command was continuing to run after I logged out and logged back into Caesar I pathed to the location of my decode.log and ran the following command

ls -l
 * I checked the size of the decode.log file.
 * A couple seconds later I ran the same command and it did appear the decode.log was increasing is size.


 * <9:25>: Ran into no issues while running sclite.
 * Results from the scoring.log

,-.     |                            hyp.trans                            | |-|     | SPKR    | # Snt # Wrd | Corr    Sub    Del    Ins    Err  S.Err | |=================================================================|     | Sum/Avg | 1000  12903 | 74.0   18.9    7.1   17.8   43.8   96.4 | |=================================================================|     |  Mean   | 35.7  460.8 | 74.9   18.6    6.5   21.5   46.6   96.6 | | S.D.   | 16.3  229.0 |  7.5    5.9    3.3   12.6   14.8    5.0 | | Median | 33.5  459.5 | 76.3   16.9    5.3   17.6   43.3  100.0 | `-'


 * Plan:
 * Decipher the scoring results to get a better understanding of what the values represent.


 * Concerns:
 * This seems to be running smoothly so far... Almost to smoothly, I thought it would have been more difficult to start my first train, decode, and scoring

Sunday 2/7/2016

 * Task:
 * Understand how the scoring process works so that I get a better understanding of how the process works from the top down.


 * Results:
 * I spent some time reading the documentation on SCLITE here http://www1.icsi.berkeley.edu/Speech/docs/sctk-1.2/sclite.htm#scoring_process:Link.
 * In the document talks about the different ways you can score a decode. How the scoring works and some of the algorithms used to get different interpretations of the data.
 * I followed a link on the SCLITE documentation to see what other formats it supports here http://www1.icsi.berkeley.edu/Speech/docs/sctk-1.2/outputs.htm#output_reports_name_0:Link. I discovered with the -o parameter I could output the score in more detail. For example...

sclite -r 001_train.trans -h hyp.trans -i swb -o dtl >> scoring.dtl.log
 * The previous command created two files to my surprise. scoring_dtl.log, this was not a surprise, and hyp.trans.dtl, this was a surprise. when I looked at the scoring_dtl.log file it was useless to me. All the data I wanted was in the hyp.trans.dtl file. This appears to give the benefit of seeing what words are causing the issues in the trans files so that we can troubleshoot how those files are created during the training and decoding process.


 * Plan:
 * I plan on spending time going through the training and decoding process step by step and cross referencing it with the hyp.trans.dtl file so that I may be able to see why the scores are so bad.
 * I also plan on documenting the set of a. commands are called in a script. and b. scripts that are called. This will help me discover what exactly is happening while training and decoding and give me a more visual understanding of it.
 * Once I understand the process I can start thinking about how the experiment group can improve the creation of experiments for future years.


 * Concerns:
 * I have absolutly no understanding of what the values mean in both the sclite -o sum and -o dtl score sheets. An example of what I mean. The command in the bold is the command I used to generate the dtl scoring format which the first section is displayed below that.

sclite -r 001_train.trans -h hyp.trans -i swb -o dtl >> scoring.dtl.log DETAILED OVERALL REPORT FOR THE SYSTEM: hyp.trans SENTENCE RECOGNITION PERFORMANCE sentences                                       1000 with errors                            96.4%   ( 964) with substitions                     73.8%   ( 738) with deletions                       32.9%   ( 329) with insertions                      81.3%   ( 813) WORD RECOGNITION PERFORMANCE Percent Total Error      =   43.8%   (5650) Percent Correct          =   74.0%   (9549) Percent Substitution     =   18.9%   (2440) Percent Deletions        =    7.1%   ( 914) Percent Insertions       =   17.8%   (2296) Percent Word Accuracy    =   56.2% Ref. words               =           (12903) Hyp. words               =           (14285) Aligned words            =           (15199)


 * The next example is the sum scoring output, this is just to show the comparison of the two outputs. The bold is the command I used to get the following output.

sclite -r 001_train.trans -h hyp.trans -i swb >> scoring.log SYSTEM SUMMARY PERCENTAGES by SPEAKER ,-.     |                            hyp.trans                            | |-|     | SPKR    | # Snt # Wrd | Corr    Sub    Del    Ins    Err  S.Err | |=================================================================|     | Sum/Avg | 1000  12903 | 74.0   18.9    7.1   17.8   43.8   96.4 | |=================================================================|     |  Mean   | 35.7  460.8 | 74.9   18.6    6.5   21.5   46.6   96.6 | | S.D.   | 16.3  229.0 |  7.5    5.9    3.3   12.6   14.8    5.0 | | Median | 33.5  459.5 | 76.3   16.9    5.3   17.6   43.3  100.0 | `-'
 * What does it mean when it states substitutions, deletions, and insertions in relation to speech recognition?

sentences     = # Snt '''words         = # Wrd correct       = Corr substitutions = Sub deletions     = Del insertions    = Ins total error   = Err sentence error = S.Err

Wednesday 10, 2016

 * Task:
 * Consulted with the Modeling group to see what progress they had made in training and decoding.
 * Ben informed me that he thinks something may be wrong with the genTrans.pl script when running it on the 125hr corpus. I plan on taking the train.trans file inside the experiment they're running in 0283/002 to see if I can find any kind of inconsistencies. I will continue this work in the following days and follow up with the modeling group when I'm done.


 * Spent time reading through Colby Johnson and Sam Sweets logs again, cross referencing the work they performed.


 * Results:
 * How to alter density, senone, and other variables before running a decode.
 * Before running the decode the two .cfg files to change the settings are located here /mnt/main/Exp/%expID%/%subExpID%/etc/sphinx_decode.cfg and in the same path sphinx_train.cfg
 * Sam made a point here at 4-6-2015 about the proper way to configure the files.


 * Plan:
 * Spend the weekend running a train without scripts so that I can see how the process works from the ground up to get a better understanding.

None other than the continued learning curve this program requires. I feel confident in my Linux abilities.
 * Concerns:

Friday 12, 2016

 * Task:
 * Reading the following pages Training Acoustic Model, Training with LDA and MLLT, and Basic Concepts of Speech


 * Results:
 * Note on testing size of trained data: "A Database should have the two parts mentioned above - training part and test part. Usually test part is about 1/10th of the full data size, but we don't recommend you to have test data more than 4 hours of recordings."


 * Plan:
 * Waiting for the modeling groups train to be finished.


 * Concerns:
 * None as of yet.

Saturday 13, 2016

 * Task:
 * 
 * Emailed the modeling group with some questions regarding what I read yesterday about training and how we could get to decoding and testing configurations faster.
 * Emailed Experiment group seeing how their research has been going and sent a few proposal ideas for them to think about.


 * 
 * Responded to emails with Justin Schumaker and Peter Ferro and Meagan Wolf


 * Results:
 * 
 * Through reading the tutorials provided by the Sphinx team at CMU I am attempting to gain a better understanding of what scripts need to be run and what configurations are best to used in specific situations. I was letting the experiment group know of my findings.


 * Afternoon>
 * Peter Ferro found some inconsitencies with documentation on the Wiki about scripts that are not properlly documented. These are noted in his log here Peters Log. We are going to discuss what actions can be taken to help sort out the issues. Possibly a good project for some of our group in the future.


 * Plan:
 * 
 * Wait for replies from my emails.


 * 
 * This was just emailing back and forth


 * Concerns:
 * 
 * I feel like I'm getting confused between what the Capstone wiki tutorial says to do and what CMU's tutorial tells me to do only because Capstone has built a set of scripts to automate the process of training which hides a lot of the back end configurations. The Capstone tutorial is very good but we need to explain what is happening in each of the scripts so its not just a spray and pray every time a new person wants to attempt a train and decode.

Tuesday 16, 2016

 * Task:
 * Generate a train without the use of the Capstone provided scripts to get a better understanding about the training and Decoding process


 * Plan:
 * Followed the prepareTrainExperiment.pl line by line running each command manually and altering the configuration files.
 * sphinx_train.cfg
 * - $CFG_DB_NAME = "003" This sets the database name to use.
 * - $CFG_BASE_DIR = "/mnt/main/Exp/0282/003" This sets the location where to find all the needed files.
 * - $CFG_HMM_TYPE = .cont Set the train to continuous rather than semi-continuous.
 * - $CFG_N_TIED_STATES = 5000 Changed the senone value to 5000 samples
 * - $CFG_NPART = 6 Utilize more CPUs' for possibly a faster train.
 * - $CFG_CONVERGENCE_RATIO = 0.004 The desired number of iterations to be performed, every iteration lowers the convergence ratio until it is between 0.1-0.001 where it will stop. This setting is important to prevent over-training the sample.


 * Results:
 * Unknown the Model group is running a decode on the 125hr corpus utilizing the resources of Caesar, I will wait until they're finished.


 * Concerns:
 * Resources are limited at this point because Caesar is the only server that can be utilized for training and decoding. I feel this is going to be a setback if we don't get more servers online before the end of the semester.

Wednesday 17, 2016

 * Task:
 * Form a better idea of what projects the Experiment group is going to tackle.


 * Results:
 * We've been given three primary tasks.
 * Build a software repository and archive old experiment directories.
 * Create an addExp.pl script.
 * Create a createDecode.pl scripts.


 * Plan:
 * Delegated the work to members of the experiment group, please refer to the 2016 proposal to review task assignments.
 * Planned for a Skype meeting this weekend.


 * Concerns:
 * I will need to make sure to contact everyone before archiving old experiments so they have a general idea of when and what is happening. This applies to the software repository as well.


 * Notes:
 * I read Benjamin Leith's The Pragmatist's Week 3 Guide to Sphinx in his here. I found it immensly helpful to have a good explanation of what is actually happening during a train and decode. I plan on referring others to his log in the future.

Friday 19, 2016

 * Task:
 * Read logs
 * Send emails to each group individually to inform them of my plans of reorganizing both the software and the deprecated experiments.


 * Results:
 * I was impressed with Peter Ferro's logs on decoding and plan to ask him some more details on his findings. I would encourage others to review his logs if you have questions on the decoding process in the future.
 * Emails sent and awaiting replies.
 * prepareDecode.pl appears to be a type of version of the current prepareTrainExperiment.pl Moving it to the DELETE folder in the scripts/user/ folder and changing the name of it to prepareDecode-OLD.pl. I will consult the others in Experiment Group about my findings on Sunday.


 * Plan:
 * Work on archiving experiments either Saturday or Sunday depending on the responses I get from other groups.


 * Concerns:
 * I may disrupt work that other groups are performing.

Saturday 20, 2016

 * Task:
 * Archive old experiments.


 * Results:
 * After thinking about permissions for the archival folders I thought the best would be rwxr-wr-- where it is created with the root account and the group applied to them would be cis790. I tried doing this using the sudo command on my user account mru567 but the root password was not working for my account. Decided to try logging in as root to test the password which worked. tested the creation of folders with specific permissions as root then logged out.
 * Sent an email to the Systems group to either get their permission to login as root and create the folders or to have them create them for me.


 * Plan:
 * Wait for the System group to reply to my request


 * Concerns:
 * Not sure if the permissions I have chosen are the most ideal but I hope others will give me their opinions on it.


 * Notes:
 * Data group is currently using the /mnt/main/Exp/0271/ Leave this untouched for the week of Feb 23, 2016

Sunday 21, 2016

 * Task:
 * 
 * Read through logs and read some papers found by other groups.
 * Update proposal page to meet the proposed standard by Thomas Rubino.
 * 
 * Tele-conference with Experiment group.


 * Results:
 * 
 * Read a paper found by Brenden Collins here
 * Update of proposal went well. Explained the tasks at hand.
 * 
 * After some technical difficulties getting everyone connected we were able to start the meeting and get everyone on the same page for the current projects the Experiment group is working on.
 * Peter Ferro came up with psuedo Perl code for teh development of the mkDec.pl script we are developing.
 * Consulted with Kevin Soucey about how his addExp.pl script is going and showed him how to transfer files from Caesar onto his local machine for testing purposes.
 * Gave Meagan Wolf the job of researching and creating the software repository.


 * Plan:
 * 
 * Continue working on archiving the old experiments.


 * Concerns:
 * None

Wednesday 24, 2016

 * Task:
 * Perform the archival process within the /mnt/main/Exp/ directory.


 * Results:
 * The following experiment directories have been placed in their new semester directories as follows.

Experiments #0001-0006 ---> sp12/ Experiments #0007-0017 ---> su12/ Experiments #0018-0100 ---> sp13/ Experiments #0101-0130 ---> su13/ Experiments #0131-0142 ---> undocumented/ #Note: These were experiments undocumented on Wiki Experiments #0143-0252 ---> sp14/ Experiments #0253-0255 ---> su14/ Experiments #0256-0259 ---> fa14/ Experiments #0260-0279 ---> sp15/ Experiment #0280 ---> su15/


 * I left 0271/ for the Data group and 0272/ alone because its Prof. Jonas's sandbox.
 * I left the current semesters 0281-0286 alone so not to mess up any trains or decodes that may be in progress.


 * Plan:
 * Created a Perl script and tested it inside my root home folder with dummy directories. The following is the the script I created and the usage examples. This was a quick customized script I created quickly to do a specific task. I didn't want to spend a lot of time trying to make it reusable since it's something we wont't need performed often.

#!/usr/bin/perl $TARGET_DIR = $ARGV[0]; $BEGIN_EXP_DIR = $ARGV[1]; $END_EXP_DIR = $ARGV[2]; while ($BEGIN_EXP_DIR <= $END_EXP_DIR) { #Change the 000 to 00 if you are moving experiments from 10 to 99 #Change the 000 to 0 if you are moving experiments from 100 to 999 system("mv -t $TARGET_DIR 000$BEGIN_EXP_DIR"); $BEGIN_EXP_DIR++; }

perl archiveExp.pl    TARGET_DIR = sp12 BEGIN_EXP_DIR = 1 END_EXP_DIR = 6


 * Concerns:
 * I wish I could justify making a better more reusable Perl script but it just didn't seem necessary in this situation.

Saturday 27, 2016

 * Task:
 * Read logs
 * Re-write the Experiment groups proposal section and send an email to Jonathan Shallow to get the formatting done.


 * Results:
 * Finished the proposal.


 * Plan:
 * Had to re-schedule our meeting for tonight to Sunday Evening unfortunately.


 * Concerns:
 * Our section of the proposal isn't good enough.

Sunday 28, 2016

 * Task:
 * Meet with group members.


 * Results:
 * Updated others on how the archival job went, got caught up on where my fellow group members are in their projects.


 * Plan:
 * Continue finishing projects.
 * Updating scripts to enable them to be part of the $PATH variable allowing users to call scripts without the need to invoke the path it's located in.


 * Concerns:
 * None

Tuesday 1, 2016

 * Task:
 * Read logs
 * Research how to update the $PATH variable with all the current scripts


 * Results:
 * Got informed by logs
 * Decided a script could be used to put all the scripts in the $PATH variable


 * Plan:
 * Work on $PATH variable project tomorrow during class


 * Concerns:
 * Running out of things to work on

Wednesday 2, 2016

 * Task:
 * Begin research on adding user scripts to $PATH variable
 * Troubleshot prepareTrainExperiment2.pl with modeling group


 * Results:
 * Add the /mnt/main/scripts/user/ directory to $PATH variable directly – more insecure
 * Create a series of soft links inside the appropriate $PATH location – more secure, but more maintenance in the future. This maintenance shouldn’t be to hard to keep up since scripts don’t get generated all the time.
 * Pathing errors to /mnt/main/root/tools/SphinxTrain-1.0/scripts_pl/setup_SphinxTrain.pl hotfixed in prepareTrainExperiment2.pl by altering the path to /mnt/main/misc/root/tools/SphinxTrain-1.0/scripts_pl/setup_SphinxTrain.pl... Not sure who moved the location of the Sphinx scripts and didn't document it.


 * Plan:
 * I think creating soft links is the best course of action here. I will be performing this over the weekend.
 * Going to spend tomorrow morning working on fixing prepareTrainExperiment.pl/prepareTrainExperiment2.pl documenting the changes and putting it in version control, as well as inform others of the changes.


 * Concerns:
 * None

Thursday 3, 2016

 * Task:
 * Research the current state of the /mnt/main/root directory.
 *  Jon Shallow informed me of a prompt in generateFeats.pl that was unnecessary, it prints "Complete! Run "nohup scripts_pl/RunAll.pl . &" to begin training." The "." is unnecessary. RunAll.pl takes no parameters. "." in this context is just a shortcut to say the current working directory. quote from Jon Shallows email to me.
 *  Jon Shallow informed me of a prompt in generateFeats.pl that was unnecessary, it prints "Complete! Run "nohup scripts_pl/RunAll.pl . &" to begin training." The "." is unnecessary. RunAll.pl takes no parameters. "." in this context is just a shortcut to say the current working directory. quote from Jon Shallows email to me.


 * Results:
 * The following folders appear to not be used for anything that we are currently using and I believe can be safely moved to /mnt/main/misc/ folder at this point.
 * root/sphinx3: Sphinx 3.7 distro, the README file in here is a good reference to read if you want to read more about dependencies.
 * root/sphinxbase-0.6.1: Set of dependency scripts for Sphinx.
 * root/tools/an4: A directory setup much like our own experiment directories currently. The README file states that the /wav directory contains speech collected by the Carnegie Mellon University in 1991, this includes trained ::speech as well.
 * root/tools/CMU-Cam_Toolkit_v2: Another set of tools to use for training and decode, we can keep this in the tools directory for those that want to use it.
 * root/test/: The entirety of the root/test directory appears to be a playground installation of spinx3.7 to be used with CMU-Cam_Toolkit_v2, sclite, and the same version of sphinxbase we are using in the root directory. I ::believe this directory can be moved to /mnt/main/misc safely.
 * <Evening:> Added a comment at the top of generateFeats.pl stating
 * <Evening:> Added a comment at the top of generateFeats.pl stating

#Wrapper script for make_feats.pl
 * Removed the "." in the print statement to better reflect what command should be run.

print "\nComplete!\nRun \"nohup scripts_pl/RunAll.pl &\" to begin training.\n";


 * Plan:
 * Contact the Tools, Modeling, and rest of Experiment group with my findings to see what course of action to take on the possibly cleanup


 * Concerns:
 * No one replies to my email, and the possibility of breaking Sphinx scripts if we alter the root directory.

Saturday 5, 2016

 * Task:
 * Setup the environmental variable for the /mnt/main/scripts/user directory allowing users the ability to just type the name of the script located in the directory rather than the entire directory itself.
 * Make directories in the History directory to reflect the scripts in the root software repository directory.


 * Results:
 * Success: logged in as root and created a custom .csh file inside /etc/profile.d called scripts.csh that has the following command run on all user logins.

set path = ($path /usr/local/bin /mnt/main/scripts/user)


 * Complete: Made the following directories in History buildData, copySph, corpusSize, exp_dir_setup, dictionary, master_run_train, ParseTranscript, child_exp_dir_setup, child_exp_sphinx_config, gen_errors. I then moved the scripts into their respective version folders.


 * Plan:
 * Inform Capstone about the change and alter scripts to reflect the change so users will not be prompted to input the entire path to the scripts.
 * Make sure to keep the software repository up to date and begin documenting scripts usages on the scripts information page.


 * Concerns:
 * Hope this was done correctly.

Wednesday 16, 2016

 * Task:
 * Assisted Peter Ferrow with the development of MakeTest.pl by helping debug and flesh out some of the code.


 * Results:
 * These are excerpts from the email I sent to Peter explaining what I did.
 * I have created a versions 7 in /mnt/main/scripts/user/History/makeTest/ I haven’t updated the /mnt/main/scripts/user/ file yet because I wasn’t sure how you wanted to handle the distro of the script nor did I replace the cur version. I would suggest that you review the changes I made in version 7 and let me know what you think.


 * I added the mkdir(“DECODE”); line because this folder hasn’t been generated by the creation of the train or the LM as far as I can tell. Also another thing to think about would be making this compatible with the way we archive the experiments under /mnt/main/Exp/sp** so that users can put in the sp** as the argument for the source argument allowing users the ability to access old experiments quickly.


 * Plan:
 * Continue to assist when needed


 * Concerns:
 * None

Monday 21, 2016

 * Task:
 * Add the scriptsPath.csh script to /etc/profile.d/ on Obelix, Idefix, Asterix, and Miraculix


 * Results:
 * Decided to change the name of the script to scriptsPath.csh to better reflect the function is has.
 * Document the change by creating a copy of it in the /mnt/main/scripts/user/History directory as well as the root scripts directory.
 * Updated Wiki documentation to reflect the change.


 * Plan:
 * No plan


 * Concerns:
 * None

Wednesday 23, 2016

 * Task:
 * Cleaned up the experiment group experiment directory with Peter
 * Debug makeTest.pl with Peter
 * Debug addExp.pl with Kevin


 * Results:
 * Wiki and caesar experiment sections are cleaned up
 * Peter is going to need to create soft links in the decode directory pointing to the source directory.
 * Kevin contacted Professor Jonas to gain the contact information of the admin of Foss to get more information on how to access https through Perl


 * Plan:
 * Assist Peter with debugging makeTest.pl
 * Clean up foss scripts page with Megans document


 * Concerns:
 * None

Friday 25, 2016

 * Task:
 * Research forced alignment while training


 * Results:
 * [| Forced Alignment Explanation] They can explain it better than I can.


 * Plan:
 * Start an experiment tomorrow with the forced alignment variables turned on.


 * Concerns:
 * I wonder if the train will be a success

Saturday 26, 2016

 * Task:
 * Create experiment 0282/013


 * Results:
 * Experiment has started 3-26-2016 2:25PM
 * [| 0282/13] Description of the variables I altered for the train.


 * Plan:
 * Wait for train to finish


 * Concerns:
 * Forced alignment will fail. If it does troubleshooting time.

Week Ending March 29, 2016

 * Task:


 * Results:


 * Plan:


 * Concerns:

Thursday 31, 2016

 * Task:
 * Create experiment 0289/003


 * Results:
 * Experiment Log found here


 * Plan:
 * [REDACTED]

Thursday Night 31, 2016

 * Task:
 * Create experiment 0289/004


 * Results:
 * Experiment Log found here
 * [REDACTED]


 * Plan:
 * Begin performing trains with more core usage in the future.

Friday 1, 2016

 * Task:
 * Create experiment 0289/005 experiment log found here. This is meant to be a 145hr team ironman baseline. This was started on Obelix


 * Results:
 * Could not make remote connection toObelix, Idefix, Miraculix unsure of the state of experiment 0289/005 since I started it on Obelix. Will update on Monday since we have no access to the server room on the weekend.


 * Plan:
 * Wait for servers to come back.

Sunday 3, 2016

 * Task:
 * Research the more obscure scripts located in the software repository.
 * Clean up the scripts page and order them by the status of use and what function they perform.


 * Results:
 * A more structured scripts page for future Capstone classes


 * Plan:
 * Wait for servers to come back.

Wednesday 6, 2016

 * Task:
 * Start a train to gain a baseline for parameters chosen by the group so that we can then alter them in the future to see if we get a better WER.


 * Results:
 * The train has started under 0289/001 using nohup commands for prepareTrainExperiment.pl and generatFeats.pl.


 * Plan:
 * Check back tomorrow to see how the train is going


 * Concerns:
 * The train isn't successful

Thursday 7, 2016

 * Task:
 * Check up on the 0282/001 train that I started yesterday.


 * Results:
 * The train failed, after some research it appeard that the 001.phones file didn't get populated from the 001.dic file properly which threw a flag during the RunAll.pl script. This has been documented and will be posted later in the semester.


 * Plan:
 * Start a new experiment under 0289/002 without using nohup command.


 * Concerns:
 * The amount of time I wasted on a failed experiment.

Friday 8, 2016

 * Task:
 * Create some organizational resources for the group project.


 * Results:
 * Created a centralized location for our team to create documentation so that we can post it to the wiki at a later time. This will allow for more collaboration in the future.


 * Plan:
 * Make sure to keep the documentation clean so the transfer back to the Wiki is easily performed.


 * Concerns:
 * None

Saturday 9, 2016

 * Task:
 * Begin decode on 0289/002.


 * Results:
 * Decode started at 12:30


 * Plan:
 * Work on understanding how decoding on unseen data works. I plan on testing Peter Ferro's makeTest.pl script in my home directory to see if I can get a decode running with it. It appears that when you use "prepareTrainExperiment.pl" to generate the file structure it uses the train.trans by default to populate the <exp#>_trans.fileids file meaning the fileids don't match the ones used during the train on unseen since unseen data takes the fileids from either the dev.trans or eval.trans file.


 * Concerns:
 * Not confident with my knowlegde of how training with unseen data works.

Sunday 10, 2016

 * Task:
 * Begin decode on 0289/002 within the 0289/008 experiment.
 * Clean up prepareTrainExperiment.pl
 * Refactor genTrans.pl
 * Consulted with Peter Ferro about how makeTest.pl works and made some minor alterations.


 * Results:
 * Decode started at 7:30PM
 * I forked the cur of prepareTrainExperiment.pl into my home directory renaming it makeTrain.pl.
 * Refactored code --> Tested works
 * Added a flag to allow makeTrain.pl to be utilized on either dev.trans, eval.trans, or train.trans --> Tested works
 * Forked the cur of genTrans.pl into home directory for alterations and testing.
 * Added a flag to allow genTrans.pl to be utilized on either dev.trans, eval.trans, or train.trans --> Tested works


 * Plan:
 * Inform the class on Wednesday of the script changes and ask for iterative feedback on them. Hopefully get them into a working version and altering the current documentation to reflect the changes in scripts.


 * Concerns:

Monday 11, 2016

 * Task:
 * Have a Google Hangouts with team Iron Man to discuss how to perform specific tasks.


 * Results:
 * I think it went well, although a few errors impeded me somewhat, which later that night I discovered the cause of. Improper path in the script I was using creating dead links soft links.


 * Plan:
 * Create documentation for team Iron Man to use for future trains/decodes.


 * Concerns:

Tuesday 12, 2016

 * Task:
 * Create documentation for team Iron Man to use for doing decodes on unseen data.


 * Results:
 * The process of running a decode on unseen data is very different but also very similar to trained. Its more difficult because a lot of the known scripts are desinged for a specific function, decoding on unseen was not one of them. I spent spent some time today to refine the genTrans.pl script (v13) for this very reason, although the linkTransAudio.pl script has deprecated the genTrans.pl (v13) script because it makes more sense to make a link directly to the .mfc in this case rather than linking to the .sph files then running generateFeats.pl. I will update the documentation before class tomorrow to reflect this new script and how it can be utilized.
 * In the meantime I do believe that the version of prepareTrainExperiment.pl (makeTest.pl v2) can be used for future capstones if Professor Jonas would like to move forward with the refinement of the script. Possibly decouple genTrans.pl from it entirely?


 * Plan:
 * Discuss new decoding process on unseen data.
 * Discuss the updating of prepareTrainExperiment.pl to makeTrain.pl with the class tomorrow.


 * Concerns:

Wednesday 13, 2016

 * Task:
 * Touch up makeTrain.pl (v4), and begin writing a new script with Ben Leith for team Iron Man.
 * Test genTrans.pl (v13) to be sure it functions as intended.


 * Results:
 * makeTrain.pl (v4) runs as intended.
 * genTrans.pl (v13) works.
 * Script writting for team Iron Man needs work still.


 * Plan:
 * Make the $flag variable optional in makeTrain.pl tomorrow and put it into production by moving it into the software repository and updating wiki documentation for future capstones. I may try to integrate linkTransAudio.pl that was created by Jon Shallow, will need to send him an email if it doesn't work out for me. May send an email to see what Professor Jonas thinks about that or if he would prefer to keep it decoupled.
 * link the prepareTrainExperiment.pl directory to the new makeTrain.pl directory, This will help avoid confusion on why the previous script existed, I'll also document the change on the script repository.
 * Make the $flag optional with genTrans.pl
 * Troubleshoot the new script.


 * Concerns:

Thursday 14, 2016

 * Task:
 * Touch up makeTrain.pl (v4), and begin writing a new script with Ben Leith for team Iron Man.
 * Test genTrans.pl (v13) to be sure it functions as intended.


 * Results:
 * Tasks for 4/14/2016
 * - Soft link from prepareTrainExperiment to makeTrain --> TODO!
 * - Soft link from prepareTrainExperiment to makeTrain --> TODO!


 * - Make flag optional in makeTrain v4 --> Functional
 * - Getting error on awk for dictionary creation. --> Fixed
 * - Repaired the dictionary error --> The path the pruneDictionary.pl script was recieving was switchboard/30hr/test/trans... It should have been getting /mnt/main/corpus/switchboard/30hr/test/trans. I resolved this by creating a $corpus_path variable That makes this full path and passes it as an argument for pruneDictionary.pl --> Fixed


 * - Make flag optional in genTrans v13 -- Functional


 * Proposed updated scripts version to cur:
 * makeTrain (v5) --> copied scripts from prepareTrainExperiment directory to makeTest directory
 * genTrans (v15) --> Need to update
 * generateFeats (v3) --> Need to update


 * Notes for testing following scripts:


 * All tests were performed on 30hr/test and 30hr/train corpusi


 * Results from testing makeTrain.pl (v4) & make_feats.pl --> Works
 * - makeTrain.pl -e switchboard 30hr/test --> wav folder size run#1 = 159744, run#2 = 159744
 * - make_feats.pl --> Success
 * - makeTrain.pl -d switchboard 30hr/test --> wav folder size run#1 = 167936, run#2 = 167936
 * - make_feats.pl --> Sucesss
 * - makeTrain.pl -t switchboard 30hr/test --> wav folder size run#1 = 180224, run#2 = 180224
 * - make_feats.pl --> Success
 * Success


 * Results from testing genTrans.pl (v13) & make_feats.pl --> Works
 * - genTrans.pl -e switchboard 30hr/test --> wav folder size run#1 = 159744, run#2 = 159744
 * - make_feats.pl --> Sucesss
 * - genTrans.pl -d switchboard 30hr/test --> wav folder size run#1 = 167936, run#2 = 167936
 * - make_feats.pl --> Success
 * - genTrans.pl -t switchboard 30hr/test --> wav folder size run#1 = 180224, run#2 = 180224
 * - make_feats.pl --> Sucesss
 * Success


 * Results from testing makeTrain.pl (v4) & generateFeats.pl (v3) --> Success
 * - makeTrain.pl -d switchboard 30hr/test --> Works
 * - generateFeats.pl -t --> Works
 * - makeTrain.pl -e switchboard 30hr/test --> Works
 * - generateFeats.pl -t --> Works


 * Making the $flag optional in both scripts
 * Results from testing makeTrain.pl (v5) & gentrans.pl (v15) --> Success
 * - makeTrain.pl switchboard 30hr/test --> Both success
 * - makeTrain.pl switchboard 30hr/train --> Success --> wav folder size run#1 = 1040384


 * Full train creation using makeTrain.pl (v5), gentrans.pl (v15), and generateFeats.pl (v3)
 * - makeTrain.pl -d switchboard 30hr/test --> success
 * - generateFeats.pl -t --> success
 * - makeTrain.pl -e switchboard 30hr/test --> wav file size run#1 = 180224...This is the size of 30hr/test/train.trans
 * - repaired genTrans.pl (v15) to path to the correct trans file with flag
 * - generateFeats.pl -t --> test cancelled
 * Attempt #2 - makeTrain.pl -e switchboard 30hr/test --> wav file size run#2 = 159744 --> Success
 * - generateFeats.pl -t --> Success


 * Plan:
 * Email Capstone and state that I will be updating the documentation tonight to reflect the changes to the scripts.


 * Concerns:
 * Errors in the script

Thursday 14 (Night), 2016

 * Task:
 * Create Soft links from prepareTrainExperiment to makeTrain
 * Create Soft links from generateFeats to genFeats


 * makeTrain.pl (v5) Move to software repository root directory
 * genTrans.pl (v15) Move to software repository root directory
 * genFeats.pl (v3) Move to software repository root directory


 * Update Wiki documentation


 * Results:
 * Made soft links from prepareTrainExperiment to makeTrain
 * Made soft links from generateFeats to genFeats
 * Updated Train Setup Page to reflect the changes to the scripts. Nothing much really changed for running an existing train other than the name. and the addition of advanced features for special use cases.
 * Updated Scripts Documentation to reflect changes to scripts themselves.


 * Plan:
 * Wait for feedback on the scripts and begin working on another script.


 * Concerns:

Saturday 16, 2016

 * Task:
 * Perform some work on scripts


 * Results:
 * I got the script working, Jonas informed me about how something works and the solution we have will need to be altered in another way to achieve the desired results.


 * Plan:
 * Share results with group members and see if they want to collaborate this week.


 * Concerns:

Sunday 17, 2016

 * Task:
 * Perform research on a function within Sphinx to better understand how it works.


 * Results:
 * I believe I found some good information. I've compiled my findings into a document for others to review.


 * Plan:
 * Inform Jonas and group members about my findings and wait for a response back.


 * Concerns:
 * The section of Sphinx I was researching was not what we were looking for.

Wednesday 13, 2016

 * Task:
 * Troubleshoot addExp.pl v1
 * Perform research into using Perl to access https websites


 * Results:
 * I found this website LWP::UserAgent 5.837. This is the documentation for the Perl module Morgan Gaythworpe used to access the Wiki and the version of LWP::UserAgent installed on Caesar, well the version we have is 5.833 but they don't have it on the site.
 * I ran the following command to see what modules are installed on Caesar. I found the command here.

perl -MFile::Find=find -MFile::Spec::Functions -Tlwe \ 'find { wanted => sub { print canonpath $_ if /\.pm\z/ }, no_chdir => 1 }, @INC'
 * I then searched through the list and went to the locations mentioned in documentation that LWP::UserAgent references. It does appear we have the dependencies required to access HTTPS using Perl


 * I figure out the issue. The creation of the cookie needs to be passed a port number. It was being passed port 80 for HTTP I changed it to port 443 for HTTPS. With the change I was able to successfully create master experiment 0290 on Foss.


 * Plan:
 * Inform Tom and Kevin that the script works.


 * Concerns:
 * I had my doubts in my knowlegde of MediaWiki and Perl, Luckily Morgan commented his code well enough that it made for easier troubleshooting.

Wednesday 20, 2016

 * Task:
 * Start some tasks for team Iron Man
 * Research stuff for team Iron Man
 * Make plans to work with James Schumacher and Jon Shallow
 * Make sure addExp.pl v2 is working
 * Send email to Peter Ferro about makeTest.pl v15


 * Results:
 * Task started
 * Researched
 * Sent an email stating my availability to Jon and James
 * addExp.pl It was working when I tried it again on Caesar while at UNHM it worked as you can see by the new experiment named 0292 on foss.unh.edu.
 * Tried it at home on the VPN --> Worked
 * Tried it at home through CISUNIX.UNH.EDU --> Worked
 * Conclusion --> Not sure what the error was this morning and last night, but it may have something to do with the repairs to DNS done by Michael Jonas.


 * Plan:
 * Going to email Kevin Soucey to see what his findings are on addExp.pl when he tested.
 * Wait for responses on emails and continue research.


 * Concerns:

Saturday 23, 2016

 * Task:
 * Work on makeTest.pl v15 with Peter Ferro
 * Create genFeats.pl v7
 * Create addExp.pl v3
 * Create addExp.pl v4


 * Results:
 * makeTest.pl v15 functions as intended by generating soft links to the source experiments model_parameters folder.
 * genFeats.pl v7 now takes an optional argument to allow for a source experiment argument to be passed thus allowing feats to be generated while still using makeTest.pl v15.
 * addExp.pl v3 now takes two flags -r or -s. -r will utilize the createWikiExp sub-routine allowing the user to generate the root experiment location on Foss. -s will utilize createWikiSubExp sub-routine allowing for the user to generate a sub experiment under the desired root experiment on Foss.
 * addExp.pl v4, refactored the login portion to be called by default outside of the two sub-routines since the code was redundant.


 * Plan:
 * In regards to makeTest.pl v15 and genFeats.pl v7 - wait for a response from Professor Jonas about the email that Peter Ferro sent. Although both scripts are set up and ready to be put into production we would like Professor Jonas to weigh his opinion on the matter after our concerns were expressed. In the end Professor Jonas will be the one that utilizes these scripts so it is his decision.
 * In regards to addExp.pl v3 - Both sub-routines have been tested and work although the script needs substantial refactoring as much of it is redundant. For example there is no reason both sub-routines need to contain the connection logic to Foss, I would propose we move the connection to a sub-routine name makeConnection that will be called within the previously mentioned sub-routines.
 * In regards to addExp.pl v4 - decided to not create another sub-routine and instead just call the login implicitly before the calling of the sub-routines to create the root or sub experiments.
 * Email Kevin Soucey with the changes I've made.

None
 * Concerns:

Sunday 24, 2016

 * Task:
 * Work on makeTrain.pl v6
 * Create genTrans.pl v16


 * Results:
 * makeTrain.pl v6: Allow the call to pruneDictionary.pl to accept a variable named $trans_id that would pass either the _decode.trans or _train.trans to pull the required data from.
 * genTrans.pl v16: Will now generate a _decode.trans and _decode.fileids based on whether the flag is used in the argument or not. If a flag is used then the user is generating a decode experiment so the naming gets relayed that way.


 * Plan:
 * Wait for iterative feedback on the script alterations and resolve any errors that may be caused by them.


 * Concerns:
 * None

Tuesday 26, 2016

 * Task:
 * Work on addExp.pl v5


 * Results:
 * Completed addExp.pl v5, Added error control for root experiment detection and sub experiment detection. refactored the login functionality of the script.


 * Plan:
 * Emailed Kevin Soucey... Emailed Tom Rubino to relay the progress of addExp.pl v5 in class for me tomorrow.


 * Concerns:
 * Have not gotten a lot of communication from some people in capstone which makes it difficult to collaborate on projects.

Wednesday 27, 2016

 * Task:
 * Get caught up with the merger that occurred today in the class that I missed.
 * Start systematically decoding the experiments that are tainted in 0289. This would be all of our unseen decode experiments.
 * The experiment planned will be using the AM and LM of 0289/019 to decode on the unseen transcript /mnt/main/corpus/switchboard/30hr/test/train/dev.eval


 * Results:
 * Spent a few hours reading through Captain Americas logs, learned some things that I wish we had tried.
 * Started [ https://foss.unh.edu/projects/index.php/Speech:Exps_0294_003 0294/003] the experiment logs tells all.
 * Results were not good WER of 52.8.


 * Plan:
 * Relay my experiments results to team capstone.

None
 * Concerns:

Friday 29, 2016

 * Task:
 * Scored 0294/004 single core decode
 * Start 0294/004 multi core decode
 * Start 0294/005
 * Review and use [sp16Decode] module that Jon Shallow created.


 * Results:
 * 0294/004 single core WER = 29.0%
 * All details about 0294/005 can be found here... [0294/005]
 * I reviewed the work that Jon Shallow performed on the creation of a way to alter and add arguments to sphinx3_decode. I found it easy to use and the instructions were straightforward. I used the module to run 0294/005 with the LW set to 25, I confirmed that it did change the LW to the desired amount.


 * Plan:
 * I sent an email to team capstone about the 005 experiment stating the reasoning of it and what steps I did to start 005.
 * Regarding sp16Decode module, I liked it and I sent an email in response stating the need to start standardizing run_decode.pl by unifying some of the other versions that are floating around. This could come in the form of extra modules within sp16Decode that gives us more utilities and leave run_decode.pl decoupled from them. I'll wait and see what others think about that suggestion.
 * Wait for decodes to finish for scoring.

None
 * Concerns:

Saturday 30, 2016

 * Task:
 * Restart 0294/005 decode (again)...
 * Research how modules work in perl.
 * Create a decode_util.pm module that can be built onto and used.


 * Results:
 * 0294/005 is started and set up properly finally.
 * Modules in perl have a lot of potential but requires things to be set up in a very specific manner to implement using best practices.
 * Jon Shallow and I were creating our own versions of decode_util.pm at the same time. Using Jon Shallows version my version is not plausible due to the nature of how run_decode.pl is set up I have found.


 * Plan:
 * Consult with people on what we need to do next


 * Concerns:
 * 0294/005 had a WER of 74.5% really bad...

Tuesday 3, 2016

 * Task:
 * Start experiment 0294/006.
 * Test sp16Decode Module on 0294/006.


 * Results:
 * Started 0294/006 on Idefix using 8 cores which should be done within an hour or so to test the LW at 13, which is the setting that CMU states is within the optimal range this link explains some of it.


 * The link to the experiment is here 0294/006.


 * Plan:
 * If the WER goes down on the unseen than I would like to try the Word Insertion Penalty that they state in the link goes along with it.


 * Concerns:
 * Worried the WER will not improve.

Wednesday 4, 2016

 * Task:
 * Talk with Benjamin Leith about how to proceed with the final report for each team.
 * Talk with Jon Shallow on how we can push the run_decode module and prepare it for production usage.
 * Create 0294/007 with the LW = 6 rather than 13 from 0294/006. This will give us a comparison between the values that CMU recommended both the low end and the high end.


 * Results:
 * Ben said we should think about one more 300 unseeen train and decode this weekend to see if we get a good WER with the most optimized settings. I emailed the class to get input on this experiment and to see if anyone wants to take the lead on it. If not I'll probably start it this weekend and hopefully get a result by the end of Saturday.
 * Jon is taking the lead with altering run_decode to not need any parameters and moving it to /mnt/main/scripts/user/ I'll make sure to create the version control folders once he states everything is ready for code review.
 * Waiting for the results of the decode.


 * Plan:
 * Wait for the class to weigh in on the 300hr train/decode, until then work with Ben on the the final report for both captain america and iron man.
 * Wait for Jon
 * Wait for decode results


 * Concerns:
 * None, because sp16 is awesome.

Friday 6, 2016

 * Task:
 * Score 0294/007
 * Research Word Insertion Penalty
 * Email team capstone with results


 * Results:
 * WER was 59.8% with the LW set to 6, a 6.7% increase from when it was set to 13.
 * Found an interesting forum posting about WIP here and a post about the difference between Sphinx3 and Sphinx4 methods for handling WIP here . The second link gives good insight on where to find how the WIP is calculated into the equation with the Language Weight (LW).


 * Plan:
 * Email team capstone with the results of 0294/007 and go from there.


 * Concerns:
 * None

Saturday 7, 2016

 * Task:
 * Respond to Meagan's email
 * Transfer the Experiment groups final report to the Wiki that Kevin, Peter, and Meagan worked on


 * Results:
 * Meagan emailed me with a revised copy of the final report. I liked it and responded.
 * Formatted the final report to mesh with the formatting that the rest of capstone decided to move forward with. Added the documentation section to explain where to find the work that we performed.


 * Plan:
 * Begin writing the documentation for running unseen decodes for the 2017 capstone class.


 * Concerns:
 * None

Tuesday 10, 2016

 * Task:
 * Work on the final team report.


 * Results:
 * Plan:
 * Wait for others to look at the final report and see what their thoughts are.


 * Concerns:
 * Unsure what table we should use to portray our experiment results.

Task

Work on the final team report.

Results

Wait for others to look at the final report and see what their thoughts are.

Concerns

Unsure what table we should use to portray our experiment results.