Speech:Spring 2015 Zachery Boynton Log


 * Home
 * Semesters
 * Spring 2015
 * Proposal
 * Report
 * Information - General Project Information
 * Experiments - List of speech experiments

Week Ending February 3, 2015

 * Task:
 * 1/31
 * Read logs/scripts to get a better idea of what we're going to be doing this semester.
 * Started doing a bit of research on Perl and Unix.
 * 2/1
 * Played around more with Perl, did not have a chance to read much more on the project.
 * 2/2
 * Did some reading on running trains, creating language models, and running decodes.
 * 2/3
 * Talked with my group about how to split up the work and the best way to go about creating a schedule.


 * Results:
 * 1/31
 * The logs give some indication as to the scope of work we will be doing but not much in terms of context and specifics (not totally unexpected).
 * Got "Hello World" working in Perl inside a VirtualBox session.
 * 2/1
 * Got more basic scripts working, starting to understand the flow of a couple of the project scripts a bit more.
 * 2/2
 * I feel I have a better understanding of the exact processes we will be going to be performing.
 * 2/3
 * We are not entirely sure of a lot of the specifics surrounding our project, but we feel we have a good idea of where to start.


 * Plan:
 * 1/31
 * I will likely be focusing more on the 'Information' section in my readings tomorrow to get more familiar with the technical aspects of the project.
 * Need more time to get comfortable with Perl and Unix. This will be a large focus over the next couple of days.
 * 2/1
 * Still need to do more reading and see if I can start drafting some comments/cleanup for the scripts.
 * 2/2
 * See above.
 * 2/3
 * We plan on meeting tomorrow to try and gauge the amount of time each task will take to the best of our ability.
 * Concerns:
 * 1/31
 * I think all of the scripts are un-commented. Seriously.
 * 2/1
 * N/A
 * 2/2
 * Though I feel I do have a better understanding of the general processes, I am still unclear on many of the specifics.
 * 2/3
 * We are afraid that our limited exposure at this point to the required tasks will make it very difficult to project completion times for each part of the project.

Week Ending February 10, 2015

 * Task:
 * 2/8
 * Read logs
 * 2/9
 * Read through logs/documentation
 * Spoke with my group a bit about how best to go about constructing a schedule
 * Played around with Unix
 * 2/10
 * Logged into Caesar from home
 * Did a bit of browsing to acquaint myself with the structure of the folders
 * Read over the documentation on running trains now that I have more context with the environment


 * Results:
 * 2/9
 * Density and Senone values seem significant, but I can't seem to find anything in either the logs or the Spinx documentation that indicates what these are.
 * Getting more comfortable with Unix commands
 * 2/10
 * I feel much more comfortable using the system and am ready to start running trains


 * Plan:
 * 2/9
 * I'll have to either do more digging or see if Professor Jonas can give me some direction
 * I'm going to switch one of my computers to run Linux so that it becomes more intuitive
 * We will probably meet before our COMP725 class tomorrow to get a more solid handle on how our schedule should look
 * 2/10
 * Will start running trains tomorrow


 * Concerns:
 * 2/9
 * The Sphinx documentation seems to assume a certain level of knowledge that we do not have, while the logs are very high-level in their process descriptions.
 * 2/10
 * Though I have a fair grasp on the procedural process of running trains, there are a couple of specifics I need some confirmation on

Week Ending February 17, 2015

 * Task:
 * 2/14
 * Read logs
 * 2/15
 * Read logs
 * 2/16
 * Am going to attempt to follow Sam's instructions to run my own train and ensure that I can get it to work.
 * Created experiment page on wiki.
 * Created project directory.
 * Ran RunAll.pl. Train appears to be running. Anticipating failure in slave_confg.pl, which is known to be problematic.
 * 2/17
 * Apparently, the process for running a train still needs to be fleshed out a bit. The scripts are an absolute mess.

Can not create models used by Sphinx-II. If you intend to create models to use with Sphinx-II models, please rerun with: $ST::CFG_HMM_TYPE = '.semi.' or       $ST::CFG_HMM_TYPE = '.cont' and $ST::CFG_FEATURE = '1s_12c_12d_3p_12dd' and $ST::CFG_STATESPERHMM = '5'
 * Results:
 * 2/16
 * Train appeared to finish running without errors, but ended with this information, which I'm not sure about:
 * Sam's log mentions creating a model, but this appears to be after the train has already been run.


 * Plan:
 * 2/16
 * I am unsure of whether or not my results are typical and am unclear on how to proceed. I will have to speak to Sam for some clarification.
 * 2/17
 * Talk to Sam and Garret tomorrow and figure out how we're going to clean these scripts up.


 * Concerns:

Week Ending February 24, 2015

 * Task:
 * 2/21
 * Discussed the proposal with Garret and got a better idea of how we want to organize our task list.
 * 2/22
 * Wrote the rest of our task list.
 * 2/23
 * Read logs
 * 2/24
 * Read through Sam's logs to stay current on a conceptual basis.


 * Results:


 * Plan:
 * 2/21
 * Will draft a version of the proposal a bit later tonight or tomorrow morning for the other groups to review at their leisure.
 * 2/23
 * Talk to Sam tomorrow about drafting a new train tutorial, or at the least figure out what steps he is taking so I can do it myself.
 * 2/24
 * Still need to speak to Sam. Tomorrow, we will attempt to get a more concrete idea of how to proceed as a group.


 * Concerns:
 * 2/24
 * Sam was not in class today and we still don't have an up-to-date set of instructions on how to run a train, which makes it difficult to get much context in regards to the issues with the latter parts of the project.

Week Ending March 3, 2015

 * Task:
 * 2/28
 * Read logs
 * 3/1
 * Read logs
 * 3/2
 * Scanned the following link to evaluate the merits of semi-continuous model: http://www.inference.phy.cam.ac.uk/is/papers/baseline_wsj_recipes.pdf
 * Sam asked me to run a 125hr train
 * 3/3
 * Experiment 0264 is redundant. I'm going to delete the contents and attempt to use it to run a 125hr train. I will post the results here as they are available.


 * Results:
 * 3/2
 * The link states that continuous models provide a lower error rate than semi-continuous, but all of the numbers cited in the article are of semi-continuous models. There is no data presented on a continuous model for comparison. I looked at the cited online sources and none of them seem to have any of this data either. I'm not entirely sure what that's about.
 * 3/3
 * Hit a permissions error when trying to generate a transcript.


 * Plan:
 * 3/2
 * Continue researching continuous models and see if I can find any hard numbers that confirm the conclusion of the article.
 * Clear up the sub-experiment structure (and possibly implement this into the documentation somewhere), clean up experiment directory 0264, and run a 125hr train.
 * Concerns:
 * 3/2
 * I am not entirely sure of how sub-experiments are meant to be structured as I cannot find any documentation on it.

Week Ending March 10, 2015

 * Task:
 * 3/4
 * Waiting on Data Group to fix file references before running trains
 * Worked with Garrett to fix the appearance of the Experiment logs
 * 3/7
 * Read logs
 * Took a look at GenTrans11.pl to get a feel for how it functions. Did some browsing on Caesar to check what paths it references.
 * 3/8
 * Read logs
 * 3/9
 * Worked on fixing GenTrans11.pl with Garrett today. Changed a path reference to ../full/train/audio/.. from ../full/audio/..
 * Results:


 * Plan:
 * 3/7
 * Going to start actively debugging GenTrans11.pl either tonight or tomorrow.
 * 3/9
 * Going to test GenTrans11.pl tonight
 * Concerns:

Week Ending March 24, 2015

 * Task:
 * 3/21
 * Read logs.
 * Read a bit about speech recognition science.
 * 3/22
 * Re-designated experiment 0269 as a 'start-up' experiment for the inexperienced members of our group to run their first experiments and sent out an e-mail explaining this.
 * 3/23
 * Read logs.
 * 3/24
 * Server still down. Going to try doing some research and see if anything stands out.


 * Results:
 * 3/21
 * Found out some interesting stuff about speech recognition, but nothing that seems immediately helpful.


 * Plan:
 * 3/21
 * Going to run a 5-hr train to make sure I can get it to work.
 * Gonna do more reading.
 * Concerns:
 * 3/22
 * Servers are down. Probably has to do with the campus move.

Week Ending March 31, 2015

 * Task:
 * 3/27
 * Ran a 5hr train under 0264/001 and successfully decoded
 * Communicated with Adam to figure out what parameter we should try testing (details in group log)
 * Successfully prepared a 256hr train. Changed proper settings and ran generateFeats.pl
 * 3/28
 * Copied the phonelist from a previous experiment and re-ran. Seems to be running fine.
 * Stephen's train apparently also had an issue during Phase 7. Looked around his experiment and nothing obvious stood out to me.
 * 3/30
 * The TIME attribute of the training script doesn't seem to be updating. The process may be dead.
 * Looks like there's some authentication issues with Caesar.

,-.     |                         hyp.trans                               | |-|     | SPKR    | # Snt # Wrd | Corr    Sub    Del    Ins    Err  S.Err | |-+-+-|     |=================================================================|      | Sum/Avg | 3506  42940 | 74.9   18.5    6.6   17.1   42.2   94.1 | |=================================================================|     |  Mean   | 43.8  536.8 | 75.3   18.6    6.1   19.6   44.3   95.0 | | S.D.   | 20.2  247.5 |  7.4    5.8    2.8    9.9   12.6    6.3 | | Median | 40.0  486.0 | 76.1   17.5    5.4   17.5   42.9   97.1 | `-'
 * Results:
 * 3/27
 * 5hr train:
 * 3/28
 * 256hr train failed at RunAll.pl:

Phase 7: TRANSCRIPT - Checking that all the phones in the transcript are in the phonelist, and all phones in the phonelist appear at least once Something failed: (/mnt/main/Exp/0273/001/scripts_pl/00.verify/verify_all.pl)


 * Plan:
 * 3/29
 * I'm going to do some investigating on Stephen's experiment.
 * Concerns:
 * 3/28
 * Phonelist is blank? Not sure if I should just manually insert them, or if something else may have messed up too.
 * 3/29
 * There may be an issue in the setup process for 256hr trains.

Week Ending April 7, 2015

 * Task:
 * 4/4
 * Read logs
 * 4/5
 * Checking on my train from last week. Going to start another if it failed.
 * 4/6
 * Starting to decode my train, only 1000 audio files at the moment.

|=================================================================|     | Sum/Avg | 1000  16473 | 72.0   20.9    7.1   14.5   42.5   97.1 | |=================================================================|     |  Mean   | 45.5  748.8 | 72.8   20.6    6.5   16.1   43.3   97.4 | | S.D.   | 21.1  342.5 |  7.1    5.3    2.8    8.2   11.0    2.6 | | Median | 39.0  628.5 | 73.4   19.4    5.6   13.8   39.7   97.5 | `-'
 * Results:
 * 4/5
 * Last week's train did complete.
 * 4/7


 * Plan:
 * 4/5
 * Going to start another train if the last one failed.
 * May start some 5hr trains with some other parameters.
 * 4/7
 * Going to decode another couple thousand files.
 * Going to try some other values
 * Concerns:

Week Ending April 14, 2015

 * Task:
 * 4/10
 * Decoding another 1,000 files on Traubadix to check its speed.
 * 4/11
 * Getting ready to copy my train into multiple directories to test different tunings for the decode at once.
 * 4/12
 * Looking into altering the decode script so I can run multiple decodes with different settings on the same train at once.
 * Was initially going to alter the decode script to allow me to run multiple decodes on a train at once, but things became a bit too complicated. Opting to copy my train to multiple sub-experiments.
 * 4/14
 * Checked my simultaneous decodes. They all seem to be running on the same configurations for some reason. Going to do more investigating.


 * Results:


 * Plan:
 * 4/11
 * I think two decodes on Traubadix and one on Caesar is a good goal. If it looks like more members of my group are setting up to run trains, I will dial it back a bit.
 * Concerns:

Week Ending April 21, 2015

 * Task:
 * 4/17
 * Read logs
 * 4/19
 * Read logs
 * 4/20
 * Began running some 5hr trains to verify an issue with decoding
 * Experiment page can be found here: http://foss.unh.edu/projects/index.php/Speech:Exps_0278#Description
 * 4/21
 * Confirmed that this is a problem. Going to attempt to figure out where the issue is.


 * Results:


 * Plan:


 * Concerns:

Week Ending April 28, 2015

 * Task:
 * 4/24
 * Read logs
 * 4/25
 * As per Professor Jonas' suggestion, began looking into the language model.
 * 4/26
 * Read logs


 * Results:
 * 4/25
 * I found some documentation on the language model toolkit, but it doesn't seem terribly useful for this point in the semester. Still, though, this may be something to note for future semesters.
 * http://www.speech.cs.cmu.edu/SLM/toolkit_documentation.html


 * Plan:


 * Concerns:

Week Ending May 5, 2015

 * Task:
 * 5/1
 * Spoke with my group about including some vital information in the report that hadn't been considered yet.
 * 5/2
 * Testing a specific setting on 5hr trains with some control variables to infer its effect on a 256hr train.
 * 5/3
 * Working on getting our lowest result run on a drone for real-time factor.
 * 5/4
 * Got the first draft of the report from Kenneth. Going to review it tomorrow.


 * Results:
 * 5/1
 * With this newly considered information, I'm pretty sure our report is going to be airtight. I think we've got this in the bag.


 * Plan:


 * Concerns: