Speech:Spring 2017 Jacob Sprague Log


 * Home
 * Semesters
 * Spring 2017
 * Proposal
 * Report
 * Information - General Project Information
 * Experiments - List of speech experiments

Week Ending February 7th, 2017
2/3 - Log into ceasar, get to know addExp.pl command
 * Task:

2/5 - Checking in.

2/7 - Log into ceasar, create sub experiment 001 on wiki, and attempt to understand how to run a train

2/3 - Logged in, had trouble using addExp.pl because I was inputting my ceasar password instead of the AD one, worked with nick to create a root experiment on the wiki and ceasar.
 * Results:

2/5 - Checking in.

2/7 - Created sub exp on wiki, do some reading on trains and running the experiment on caesar.

2/3 - Add sub experiments next time.
 * Plan:

2/5 - Checking in.

2/7 - Learn what a master data corpus and the user generated sub corpi are.

2/3 - All is well. 2/5 - Checking in.
 * Concerns:

2/7 - Little confused on how to run the makeTrain.pl command.

Week Ending February 14, 2017
2/8 - In class we went over what needed to be done by our group and were given some guidance by Jonas. Our first task was to remove the Wildcat domain option out of the Wiki login page as well one of the scripts. We also need to fill out our proposal all the way because that is likely what we will be pulling our day to day tasks from.
 * Task:

2/12 - Checking in.

2/13 - Checking out the wiki and any updates by my group.

2/8 - Vitali was able to quickly remove the Wildcat domain option from the Wiki login page. As a group we dove right into the addExp.pl script and removed the option to specify the domain you were logging into so that it automatically associates you as an AD user.
 * Results:

2/12 - Checking in.

2/13 - Checking in.

2/8 - Take a look at the tasks that we need to complete as a group, and finish up the proposal.
 * Plan:

2/12 - Checking in.

2/13 - Checking in. 2/8 - The overwhelming amount of scripts that we have to be able to understand.
 * Concerns:

2/12 - Checking in.

2/13 - Checking in.

Week Ending February 21, 2017

 * Task:

2/15 - In class we looked over our draft proposal and took some suggestions from Jonas. We looked into Make train, Make Decode, Copy train, and Copy decode and got a better understanding of them so that we can make them more clearly documented and usable in the future. Our main goal is to make the lives of people running experiments easier! I helped Cody test his changes to the addExp.pl script, which was to automatically create a sub experiment when new root was created. We also looked into some misc wiki stuff, and some Perl commands that we can use.

2/17 - Check logs.

2/19 - Today I ran an experiment from start to finish. I did a 30-hour train which took around 2 hours I think. I was able to get a better idea of what the scripts were doing and now we finally have the experiment files in our sub experiment directory that we can take a look at next class probably.

2/21 - Check logs.


 * Results:

2/15 - Made some good progress on addExp.pl, some research and code reading of the other scripts that we need to work with. Was going to see if we could make the addExp script automatically create the experiment directories on Caesar but that didn't happen. Determined that I would be running a train over the weekend and getting a better grasp of the process.

2/17 - Check logs.

2/19 - 30-hr train executed. Sub experiment directory populated.

2/21 - Check logs.


 * Plan:

2/15 - Run the train, read up on the scripts and Perl.

2/17 - Check logs.

2/19 - Go through the files in the sub experiment directory while looking at the wiki to understand them. Probably in class as a group maybe, i'll throw the idea out.

2/21 - Check logs.


 * Concerns:

2/15 - There are quite a few existing scripts which I know nothing about. Also, we are going to want to clean out the Exp folder soon.

2/17 - Check logs.

2/19 - The experiment ran well, but it would have been good to know how to log off while it was running which i'm 99% sure you can do.

2/21 - Check logs.

Week Ending February 28, 2017
2/22 - Look at some documents on wiki, figure out plan for the week.
 * Task:

2/26 - Checking Logs.

2/27 - Today my task was to figure out exactly what is needed of the two scripts copyTrain and copyDecode. I spent time figuring out what these scripts would be doing and came to the conclusion that they should be used to copy out specific parts of an experiment (train files for copyTrain, decode files for copyDecode).

2/28 - Get a great base for writing the copyTrain/Decode.pl scripts.

2/22 - Look at some documents on wiki, figure out plan for the week.
 * Results:

2/26 - Checking Logs.

2/27 - Got a solid understanding of copyTrain/copyDecode. Got to know makeTrain.pl more in depth and determined that there would need to be updated paths for the copy scripts as they are using absolute paths to reference each other. Looked into some more perl commands in an attempt to grasp the language better, specifically commands in the relevant scripts that I'm working with.

2/28 - I was able to create and test a very generic copy script which invokes the unix command for copying files and folders recursively. Everything is set up and ready to be expanded upon, which i'll discuss in the plan section.

2/22 - Make the copy scripts.
 * Plan:

2/26 - Checking Logs.

2/27 - Make a generic copyExp.pl script as a base for both the copyTrain and copyDecode scripts and have it ready for 3/1.

2/28 - Next step is to add functionality to the copy script to update symbolic links in the .cfg files so that the experiment files know how to reference eachother after copying. Another near future goal is to copy specific parts of the experiment file system, specifically files from the training process, and files from the decode process can be separately copied using flags in the script. 2/22 - None.
 * Concerns:

2/26 - Checking Logs.

2/27 - I'm concerned that there will be a good amount of small details needed to make the sure the experiment files can still reference each-other after being copied.

2/28 - It could be annoying to find all the links to files that need to be changed during the copy process.

Week Ending March 7, 2017

 * Task:

3/1 - In class, continue working on copyTrain.pl, with main focus being updating links in config files.

3/3 - Checking in.

3/5 - Checking in.

3/7 - Finish up the updating of links in the .cfg files.


 * Results:

3/1 - Was able to learn about the unix command sed, which is used to modify, add, or remove lines in a specified file. There are two files that contained values that needed updating, sphinx_train.cfg and sphinx_decode.cfg. The sed command we used takes a flag for the edit type, and a supplied regex for substitution. The tricky part is using file paths in the regex, because the regex command uses backslashes to separate the arguments the substitution (ie. s/findTxt/replaceTxt/) and since there are slashes in the paths we are modifying, there was some trouble making the regex command happy. We left off by adding escape characters before each backslash in the paths.

3/3 - Checking in.

3/5 - Checking in.

3/7 - Abandoned the escape character idea because Nick and I learned that you are able to use any character as a delimiter in a substitution regex expression. So, we simply used s:findTxt:replaceTxt: instead of s/findTxt/replaceTxt/ and it worked nicely. The next issue we ran into was the quotations inside the line we were looking to modify. We were able to solve that with some character escaping. Another issue we found was the spaces inside the string we were looking to replace. Spaces are acceptable in regex, however the sed command did not like it when we used spaces because it uses spaces to separate parameters and assumed the regex was ending prematurely due to the space inside it. But we were able to fix it by placing the command inside single quotes. Finally, in the second file, sphinx_decode.cfg, the line that needed replacing was very similar, but it used single quotes instead of double quotes, which caused a lot more trouble than it should have. When we finished, we had successfully updated the links in both the files.


 * Plan:

3/1 - Finish up updating the file references in the cfg files.

3/3 - Checking in.

3/5 - Checking in.

3/7 - Rename the files that have the experiment number in their names. Ex: 001_train


 * Concerns:

3/1 - Still need to separate between copyTrain and copyDecode, and which files need to be copied for each.

3/3 - Checking in.

3/5 - Checking in.

3/7 - If there are more links that needed to be updated, and testing the train/decode after fully copying to make sure everything still works.

Week Ending March 21, 2017

 * Task:

3/8 - We need to rename some files that are copied over because they contain references to the old sub experiment number.

3/19 - Check logs.

3/18 - Check logs.

3/21 - Continue renaming files


 * Results:

3/8 - Since there are a lot of files that need to be renamed, we will need to used a unix command or a perl command to do this recursively through the file-system. We did some research on the best way to do this, but as of today we have not found a great way of doing this.

3/19 - Check logs.

3/18 - Check logs.

3/21 - Nick found a command which he tested on this ubuntu virtual machine, but unfortunately it uses a command that is not native to unix, so we are not able to use that as our solution.


 * Plan:

3/8 - Keep looking for a command that works for what we need

3/19 - Check logs.

3/18 - Check logs.

3/21 - keep looking for a solution that will recursively rename files in a directory and all sub directories using a find and replace
 * Concerns:

3/8 - Not sure if its smartest to do this using unix commands executed through our perl script, or by using the built in File library in perl

3/19 - Check logs.

3/18 - Check logs.

3/21 - Same as before.

Week Ending March 28, 2017
3/23 - We need to change rename the files that have the old sub experiment name in them to have the destination sub experiment number.
 * Task:

3/25 - Check logs.

3/27 - Check logs.

3/28 - Set up the functionality of the flags to copy only specified files


 * Results:

3/23 - After hours of messing with many different unix commands, we finally found one that worked for recursively renaming files using find and replace. $cmd = "find $dest -name \'*$sourceSub*\' -type f -exec bash -c \'mv \"\$1\" \"\${1/$destSub/$sourceSub}\"\' -- {} \\;"; find - command used to locate every folder and contained file name - precedes the string to look for files with by the find command, which is the variable $sourceSub type f - defines that we will first look for files, and later run the exact same command with type d for directory exec - open a separate process for each instance of found file bash -c \'mv \"\$1\\" \"\${1/$destSub/$sourceSub}\"\' -- {} \\;" - on each found file, execute this command which does and rename paired with a find and replace of the source and destination directories.

3/25 - Check logs.

3/27 - Check logs.

3/28 - Added functionality to the flags -t and -d where it looks for specific directories.. I believe that the decode section only creates three new files so it is pretty simple. I was also thinking today about the way we went about this script, and if there could have been a smarter way to do it. What would have been the benefits of simply copying over the bulk of the experiment data, while regenerating the configuration files? Lost configuration details would have to be noted and copied, but it would simply quite a lot of the process that we had to do in a rather brute force way. Depending on if the script works as planned, I would like to explore this option.


 * Plan:

3/23 - Make sure

3/25 - Check logs.

3/27 - Check logs.

3/28 - Test script.
 * Concerns:

3/23 - The files in the feats folder still contains some instance of the source sub exp number, and although it is just coincidence and not intended that they change, they are not being changed and probably should be due to the recursive nature of the command used.

3/25 - Check logs.

3/27 - Check logs.

3/28 - Script seems to be finished but there will probably be a number of unforeseen issues before it is really completed.

Week Ending April 4, 2017

 * Task:


 * Results:


 * Plan:


 * Concerns:

Week Ending April 11, 2017

 * Task:

4/9 - Check Logs.

4/10 - Check logs.


 * Results:

4/9 - Check Logs.

4/10 - Check logs.


 * Plan:

4/9 - Check Logs.

4/10 - Check logs.
 * Concerns:

4/9 - Check Logs.

4/10 - Check logs.

Week Ending April 18, 2017
4/16 - Check logs.
 * Task:

4/17 - Check logs.

4/18 - Begin looking into making one compact script responsible for creating an experiment, training, and decoding.


 * Results:

4/16 - Check logs.

4/17 - Check logs.

4/18 - Created template and info for a script that serves this purpose, called createExp.pl. My goal is to turn this into a speedy way for advanced users to create a script, but still allow for customization options. The only issue is that since it is combining so many different things, it will require a ton of command line parameters which will be confusing, prone to typos, and just not something the user wants to deal with. Another option is to take input while it is running, which is much nicer, but it prevents the user from just running the script and walking away while it finishes. So I'll figure out the approach myself and my team members want to take later. Another option is to make three different scripts that compile the steps of each of the main experiment creating steps (train, LM, decode), as that would still compact things pretty nicely.


 * Plan:

4/16 - Check logs.

4/17 - Check logs.

4/18 - Figure out direction to go with this script and how to organize it.
 * Concerns:

4/16 - Check logs.

4/17 - Check logs.

4/18 - Again, i'm not sure the best way to make this script as it will require a ton of parameters.

Week Ending April 25, 2017
4/23 - Check logs.
 * Task:

4/24 - Check logs.

4/25 - Make progress on the createExp script.

4/26 - Finish up the train part of createExp.

4/23 - Check logs.
 * Results:

4/24 - Check logs.

4/25 - Edited the script to remove a library that wasnt found on the server. This library was used to change the current directory to a user inputted directory for the experiment files to be created. So to fix this, I attempted to make the script globally reference-able. However, I was not able to get that working tonight. But, I have confirmed that the script does work as intended, except for the directory that it is supposed to run in.

4/26 - Changed the script to use a hard link to the sub experiment directory for now, and now it correctly changes to it, or creates it THEN changes to it if it doesn't exist. Had an issue where genFeats.pl wasn't working correctly, but that appears to be fixed now. I haven't got an opportunity to run the nohup script to finalize the training process but I'm 95% sure it will work fine. Tested with a 5hr train. Still need to figure out the most user friendly way to execute the script(as convince is the real reason behind it anyway), but currently we are getting inputs(dir, train size) line by line.

4/23 - Check logs.
 * Plan:

4/24 - Check logs.

4/25 - Figure out how to call the script directly from a sub experiment folder, or find an alternate way to change to the correct sub exp directory

4/26 - Find most user friendly way to use the script, work on getting the LM and decode parts working 4/23 - Check logs.
 * Concerns:

4/24 - Check logs.

4/25 -.

4/26 - How to combine ~10 different commands, some requiring multiple inputs, into 1 and make it user friendly.

Week Ending May 2, 2017
4/30 - Check logs.
 * Task:

5/1 - Check logs.

5/2 - Add most of the code to run the make language model and make decode scripts.

4/30 - Check logs.
 * Results:

5/1 - Check logs.

5/2 - Filled out the methods stubs for makeLM and makeDecode with all(?) of the commands and everything needed to run the scripts correctly. Still need to test and finish up a few things tomorrow morning.

4/30 - Check logs.
 * Plan:

5/1 - Check logs.

5/2 - Finish up script and test by creating a full experiment with all the parts working.

4/30 - Check logs.
 * Concerns:

5/1 - Check logs.

5/2 - Still need to find the best way to make this scrip user friendly and convenient.

Week Ending May 9, 2017

 * Task:

5/6 - Check logs.

5/7 - Check logs.

5/9 - Make script more user friendly.


 * Results:

5/6 - Check logs.

5/7 - Check logs.

5/9 - Added nicer way to get user input that doesn't require the user to stay there and watch and wait for it to ask for more parameters for the scripts.


 * Plan:

5/6 - Check logs.

5/7 - Check logs.

5/9 - END OF SEMESTER!
 * Concerns:

5/6 - Check logs.

5/7 - Check logs.

5/9 - ...