Speech:Spring 2017 Gregory Tinkham Log

From Openitware
Jump to: navigation, search


Week Ending February 7th, 2017

02-Feb

Task
From our first team meeting we were tasked with organizing our group (Tucker, Alex, Vitali and I), finding modes of communication for our group as well as with other groups, and to start gathering ideas for our rough draft proposal due next week. By the end of class we were also given the task to have Vitali run our first experiment, 0295.
Results
Yesterday was our first class meeting, splitting off into our modelling group and seeing where everyone is in terms of modelling experience and familiarity with the system. We also setup our modes for correspondence (Slack.com).
Vitali and I have a small amount of prior educational experience in data modeling, so we as a group were able to start talking about possible changes to the Acoustic modelling techniques and practices such as data sampling and Acoustic model parameter tweaking.
Vitali started our experiment(0295) after class, which has carried into today.
Plan
From our initial meeting we set a deadline of Saturday night to come up with initial ideas for our rough draft. Areas that we are actively looking into using Linear Discriminant Analysis with Maximum Likelihood Linear Transform for feature transformations, Vocal tract length normalization to improve the model's adaption to different speakers, boosted Maximum Mutual Information models, speaker adaptive training, as well as maximum likelihood linear regression and constrained maximum likelihood linear regression. All of these would have large implications for the other groups in terms of data (data group), dependencies (tools & system groups), and how experiments may be run (experiment group).
We will look to finish our first experiment in the next few days.
We will also transfer the knowledge of our first experiment to the next group and guide them through the process.
Concerns
There are no major concerns for now. There are plenty of opportunities for our modelling group to explore this semester in terms of searching for new modelling techniques which could possibly have large implications for the other groups as well. This will be expanded on in our proposal and further logs.


04-Feb

Task
Complete our rough draft proposal, start other experiments when either the other groups are all done completing their first experiments or the drones have been setup to be used for experiments.
Results
A general "goals" section has been outlined by me.
Our first test has been completed by Vitali.
Research has been done into parameter tweakings of both the current language and acoustic model, as well as other options for both of these model types.
Plan
Continuing work on our rough draft proposal through the weekend into next week.
Narrow down our points of interest.
Concerns
The only concerns I have are finding the right balance between enough work that will lead to good progress for both this semester and next as well as finding tasks that are doable within the semester.


06-Feb

Task
Continue collaboration on slack towards our rough draft completion.
More specifically, create a timeline of actions and also look into the possibility of creating an environment in which we can test many model types.
Results
Progress is being made on the proposal. I think our team is getting a very knowledgeable understanding of speech modelling.
Plan
Continue work on the proposal. Finish and refine the details.
Concerns
Same concerns as previous log, getting the right fit for our proposal and timeline. There are many possibilities. A good problem to have.


07-Feb

Task
We are currently still finishing our rough draft proposal.
Start thinking about how to accomplish the next tasks that will be laid out on the proposal. This will be speaking with other groups to discuss ideas and start running more experiments.
Results
The proposal is near complete for our group. I think the goals and other things we'd like to accomplish are clearly laid out and there seems to be a viable path to reach those goals.
I think we've done a great job in relation to past years modelling groups in laying out specific action items and being able to quickly find areas of interest to explore. Last year's group giving a heads up to different acoustic models can be attributed to this and it shows how much good documentation and looking to the future can help in an environment where the personnel turnover is one hundred percent.
Plan
The plan for now is to continue to drill down the details of our proposal through the next twenty-four hours.
More communication must be made with the other groups to fill them in on the findings we have made and our proposed ideas. This can include the different software we have looked at for the tools group, the possibility of using gpu's with the scipy libraries which would involve both the tools and systems group, and some clarification/discussion about the cleanliness of the data we're using as well as looking into some more exploratory analysis as I point out in my last concern below.
Concerns
I am less concerned about our nailed down action items for the semester. They include tweaking both the language and acoustic models that are currently used, researching and implementing a recurrent neural network model to use as the language model, and utilize linear discriminant analysis feature transformation in the acoustic model. The concern with this is the timeline. Do we setup exploratory model building with the current acoustic and language models to look at all possibilities of parameter tweaking?
Along with the above point, I am a little concerned at throwing other groups too much work based off of our ideas. Our ideas require the systems and tools group to research and setup the environment to use these tools.
One more general concern I still have is the lack of exploratory analysis done originally with this Capstone Project. This would include statistics about the data itself or visualizations of the features in the model. This could be something we explore with LDA transformations because if you graph the features before and after the transformation you will see the improvements the model looks for when assigning classes based off of the feature values.

Week Ending February 14, 2017

09-Feb

Task
At yesterday's class meeting we received new tasks. Some of these were things we came up with in our proposal and some were things we thought of in the class meeting.
Investigate Recurrent Neural Network Language Models. Jonas gave me this video to watch [1]. The tools group will look at the necessary actions for implementation.
Linear Discriminant Analysis is embedded in the Sphinx toolkit. But, Python and specific packages such as scipy and numpy are needed to run it. We will investigate the status of this on Caesar.
Gather results and information on the parameter values that last year's modelling group used to get their best models.
Results
We finalized the action items above from yesterday's class meeting.
Our Proposal is now 90% done.
Established a baseline for our semester to work from. In other words, move on to LDA and RNNLM.
Status of RNNLM: toolkit is supported by Sphinx but needs a C compiler.
Plan
In the coming days we will look to divide the workload of the tasks above.
For me, I will look into the RNNLM toolkit and neural networks in general. They're a step above in complexity compared to other models.
Gather information on LDA, learn why it will actually help improve the model. From my understanding LDA is used as a dimension reducer which allows for easier classification. This helps in the classification of an utterance from the 39 features.
Gather information from the past year on their best model run, the parameters they used.
Concerns
Our proposal timeline is still not completed, although we have developed some tasks for the near future.
Finding the best way to divide up work.
The status of the drone machines and therefore the availability to run experiments. Right now we are sharing Caesar with other groups.

11-Feb

Task
Tasks are still the same as before. Vitali has taken up recreating last year's model. Some issues with testing the model on unseen data have come up.
I have been focusing this portion of the week understanding the RNNLM toolkit and neural networks in general.
Results
We have started to divide up the tasks for this week.
A better understanding of neural networks and recurrent neural networks has been made.


Plan
Drill down the education of neural networks to the RNNLM toolkit specifically and how it improves over the current language model.
Do the same for Linear Discriminant Analysis.


Concerns
Only concern for this week is still planning out the long term tasks and timeline of the semester.
Being able to communicate this new knowledge to the other groups.
The status of the other machines.


13-Feb

Task
Tasks are still the same for the rest of this week. This week's action items were a sizable amount.
I am currently trying to help Vitali with figuring out training on unseen data.
Results
Watched a great tutorial on neural nets in general created in python using the same packages that the RNNLM toolkit uses. This was along with a lot of other readings on neural networks.
Group members are working on individual tasks.


Plan
Finish testing out on new data. See if this matches up with last year's best work.
Documenting this work will be crucial as it is the next step to creating better models. We will now be creating models to be used in the real world, not just on their own training data. This will be crucial when comparing results from new parameter tweaks and other toolkits, model add ons and or changes.
Try out the lda_train.py scipt in the experiment scripts folder to see if there is an error and to find out what will happen.


Concerns
Still planning out concrete action items for the week to come.
Laying out the implementation for the RNNLM toolkit.
Formulating this past week's results into a status update.


14-Feb

Task
Research Linear Discriminant Analysis and why it is useful.
Research RNNLM toolkit. How it is used, why it's useful for language models(and neural networks in general), and what is need for the toolkit.
Replicate last year's results.
Results
Linear Discriminant Analysis
-Linear discriminant analysis is a dimension reduction feature that transforms features into a smaller dimensional space. Because of this, the classes that are used in the model are more easily divisible. Because of this, the model can more accurately predict parameters. The improvement over Gaussian Mixture Models can be thought of in a general sense as changing the distribution of the data so that the tails of the different gaussian distributions are not mixed.
-I tried to run lda_train.pl from within the scripts_pl directory of the sub experiment 004 in 0295. I received an error that there was a missing requirement at line 49. This line holds the path for the sphinx config file. This will need to be looked into further.
-There is also an option in the sphinx_train.cfg file to turn on LDA and MLLT transformations. This should also be looked into.
Recurrent Neural Network Language Model Toolkit
A great video series explaining neural networks and how to code one in python can be seen here [2]. This acts as a good base to then look into the RNNLM toolkit.
Recurrent neural networks are neural networks which can look backwards. A great paper on the toolkit can be found here [3]. Also, this faq on the toolkit is extremely helpful. [4]

From watching this video [5] I was able to gather these small tips on using the toolkit. The toolkit also implements n-gram models, the ones currently being used for language models.

- More nodes for more data. decreases performance.
- Uses a cross-entropy error cost function. The hidden layer size adds performance boosts as well.
- l2 regularization tweaking does not make much of a difference. Could this be different for adding in with acoustic models?
- Training is only done on a single core (Does this matter if neural nets are faster with more data? Are these models big enough to where this would have a significant impact?).
Recreating Last Year's Results
As a group we have had trouble finding the exact parameters last year's group used to create their best model.
But, we have found exactly where to tweak our acoustic model parameters in sphinx_train.cfg, so now we can start to build our own ideas.
One issue that Vitali ran into was running our model on unseen data. This raises questions about the language model seeing new words (and possibly the acoustic model). A small thought I had would be to create a giant language model, one used by google or microsoft, and then add ours into it to tailor it to phone conversations. Something to look into.


Plan
Create action items tomorrow in class meeting to divide the work more to gain deeper knowledge and more usable information. If spread across too many things we'll each take longer to gain a deep enough standing to actually make decisions on.
Coordinate with the tools group the RNNLM toolkit. Possibly look into the lda_train.pl script and brainstorm on the error.
Come up with a timeline for parameter testing on the models we're using now.


Concerns
Getting down all of the information I have learned into a written format. I think this really calls for a visio like map with explanations.
Laying out concrete goals on all of the gathered information for the next 13 weeks.

Week Ending February 21, 2017

16-Feb

Task
From yesterday's team meeting we will be focusing on our own tasks starting this week. The main course of action the modelling group will take this semester is implementing LDA for the acoustic models and recurrent neural networks for language models. Tucker and Vitali are going to work on LDA while Alex and I focus on RNNLM.
For this week the tasks are to establish the baseline experiment and start to work on installing the necessary packages (numpy and scipy) for LDA to work.
Adding into the proposal a proposed task of mapping out the current state of our research and results and what the world class baseline is. This would be a great exercise to map out in a visio type format with links describing the different processes (different models, techniques, data, etc.).
Results
We have been having issues come up with running decodes on our drone machine. It may be a path issue within the decode scripts but there is something else odd going on in that in order to run sclite we have to do so by explicitly stating the path. We're currently troubleshooting that now.
Beyond these issues I believe our group will be comfortable creating new experiments to move onto the real task of improving the models.
Plan
Finish our proposal for next week. I will be putting dates on our tasks in the next two days for the rest of the group to review and then most likely Jonas to review. This is a hard part because a lot of our tasks rely on other groups.In order to split up tasks to make them more efficient, LDA will be implemented first, which leaves Alex and I to work on improving the models. These roles will switch when Alex and I work on RNNLM and Tucker along with Vitali then work on improving the models. However, there will be great overlap between these because we must all be conscious of what the other members are doing in order to stay current.
So, for this week we plan to establish a baseline experiment that tests on unseen data on our drone machine. Moving beyond that would be ideal.
Concerns
Planning out action items down the line.
Making sure tasks are reasonable in their timelines.
The issues we're currently having on our drone machine are quite troubling at this point. I'd like to be past the point of just testing on unseen data and really running some different model tests.

17-Feb

Task
Same tasks as before. I've updated our proposal for the group to review and the timeline as well.
Focus on debugging the decode in experiment 0295/004. Run more experiments if possible.
Results
Updated proposal is waiting to be reviewed again by Jonas.
I found a log of an experiment from last year's modelling group which had the same error as me (segmentation core dumped) when trying to decode. I also checked my decode.log file and noticed there was a path issue with the run_decode.pl script. I changed the path to where sphinx3_decode is on Idefix (our drone machine) but still saw the same message in the log.
Plan
I'm going to look through the other experiments to see what the logs look like throughout the process to try and compare with mine.
Start looking into the transcripts to gain a better statistical understanding of them. Distributions of cutoff/filler words, the implications a better language model would have on using all data, etc.
Concerns
The only concern I have right now is getting comfortable with running trains on unseen data. I'm skeptical of the documentation on this side of training and decoding. Either way, from this point forward using unseen data to test is the only kind of experiments that should be done.

18-Feb

Task
Tasks are still the same for this week.
Results
I have only made progress on the proposal. I fixed some of the date issues to align them more with the other groups.
I'm still receiving the same path type error when trying to run a decode.
Plan
Possibly look into other scripts from past years where I know experiments were done on a drone machine.
Email Jonas to have another look through our proposal.
Concerns
Fixing this error in time to run some more experiments this week.

21-Feb

Task
The current tasks being worked on right now are the installation of the numpy and scipy library on idefix, trying to fix the problems causing us errors in the decoding process, and making sure the team as a whole has their part of the proposal set.
Fix the issues that are stopping us from completing our decodes which is stopping us from establishing our baseline model for the semester.
Results
The proposal is now complete. We were able to clearly define and plan out our goals for the rest of the semester. I also think we have laid our goals and timeline out as the modelling group in a way that future semesters can look to when implementing new tools and ideas.
The setups for all of the drones don't seem to be set to run experiments all the way through.
Plan
The plan for tomorrow is to get a better idea from the other groups about the problems that are occurring from the experiments. From here we can then establish action items to fix the issue and put out a plan to create more models. First, if allowed, finish the unseen data test on Caesar and then start again on the drone machines.
Concerns
I am very concerned about not only me not being able to finish an experiment but a few others from the team as well. Not only were other people not able to get through running a train, they failed at different points than I did. The only successful experiment run this semester has been on train data on Caesar. So, as a team we need to be able to run an experiment using unseen data on Caesar successfully, then start the process over again on the drone machines.Everyone's work could be affected by this because without being able to run through experiments no matter what is changed in terms of data, tools, or scripts none of it can be verified with results from the models.

Week Ending February 28, 2017

23-Feb

Task
The tasks for this week are as follows:
Install Numpy and Scipy packages on Idefix (Tucker, Vitali)
Install RNNLM toolkit on Idefix (Greg, Alex)
Use settings from experiment 0288 011 to recreate the experiment to act as a baseline for this semester (Tucker, Greg, Alex, Vitali)
Results
I'm currently working through replicating last year's model for testing on unseen data. While we as a group have been slow to get to this point I think we'll make a lot of progress once we get the process down (and document it correctly).Using Viatli's and John Schallow's (2016) logs are going to be a big help. I'm very familiar with splitting up training and test data, and even coming up with complicated mechanisms to do so in order to ensure the quality of the data, but how it is done here seems a bit convoluted.
We were having trouble as a group running experiments on Idefix earlier in the week but I think we are in the clear. All we needed to do was to copy user/local over to Idefix from Caesar, without it the training ran fine because the paths are hardcoded but the language model building would fail.
Plan
Finish running my experiment tomorrow, debug the errors if they occur.
Look into what it will take to get the RNNLM toolkit onto Idefix.This will involve some collaboration with my teammates to see what the best solution is. Beyond that, if it's a simple enough install, then I'll look into how we could go about incorporating the toolkit into what we are using now. In other words, what changes to the transcripts do we need to make in order to use the toolkit and what changes do we need to make to the output language model to run decoding.
It would be very exciting to have both of these tasks accomplished soon so that as a group we can start running different models to experiment with.
Concerns
At this point the only concern I have is finishing up the unseen experiment. Beyond that, I don't see anything holding our group up for very long at all.
Looking into the inputs and outputs of the RNNLM toolkit could lead to some unforeseen issues. That's why I have put it as a high priority in the next day or two so that I may bring up any major concerns I find.

24 Feb

Task
My main task at this point is to recreate the testing on unseen data experiment.
Results
I am towards the last few steps but am receiving an error when I try to score (Segmentation fault, core dumped). I think my issue is copying over the experiment (train and language model) as I was having trouble finding the train.trans file when trying to score.
Plan
Continue working through my errors, which are different than the ones that Vitali had previously.
Concerns
Finishing up this test. I think it will be pretty smooth sailing once the initial run is done and a working concept is in order. We have already gone through and understand the tweaks that we can make throughout the process.

25 Feb

Task
Continue with the process of figuring out testing on unseen data. I was running into numerous issues throughout the first half of this week with trying to get the 0295/004 model to test in the 0295/009 experiment.
Investigate the requirements for placing the RNNLM toolkit onto Idefix.
Results
I was finally able to get the decode to run in 0295/009. If all goes well, this shall act as the starting point for the rest of the semester. The problem was that I was not using the right flag when calling the makeTest.pl script. I was trying to use -d when I should have been using -t. This also means that using the -d tag does not contain the train_fileids file needed for genFeat.pl. I will have to look into this more as well. This coincides with what I found in John Schallow's log (2016) that the 30hr training transcript contains everything that is needed for a test.
Plan
Continue scoring this 0295/009 experiment tomorrow. If all goes well I will document the changes that need to occur in the wiki to better reflect and more explicit tell what is going on through the process. The process should be tailored toward testing on unseen data, not trained data from this point forward. This will also set the stage for building different language models in different experiments and different acoustic models in other experiments, then copying them into a new directory to which we can then test out on.
Concerns
Finding the correct process for our group to install the RNNLM toolkit. Also, we could be in for a few surprises as far as the model requirements and output from the toolkit. Although these toolkits use standard practices there could be subtle differences that we need to be aware of if they do exist.
Excited to be able to start putting some real changes into the models and testing them.


28 Feb

Task
Based on the results of the past few days the task at this point is to complete a few different trains on some new models.
These models will include parameter tweaks and build off of the baseline that I completed over the weekend.
Results
The test on unseen data over the weekend was a success. After getting through the flag issue with the makeTrain.pl script and I copied over the train.trans file from the training experiment (0295/004)the decode ran well. So, the steps to complete a test on unseen data (on 30 hours of data) are as follows:
.Run addExp.pl
1. Run a training experiment on training data the way you normally would on training data. The documentation here is confusing as it gives a step to make a train for test data but this should be included in the "run test on decode" instructions.
2. Build the language model as you normally would.
3. Create a new directory for your unseen data test.
4. Follow the instructions on the "decoding on unseen data". You will need to use the -t flag instead of -d flag.
5. The instructions after this command list out how to do what the makeTest.pl script does by hand. These should be hidden by default or on a separate "makeTrain.pl errors" page.
8. The next step is to then run "genFeats.pl -d"
6. CD into the etc folder.
9. Run "nohup run_decode.pl <train exp> <test exp> <senone count> &"
10. Transform the decode.log file to hyp.trans file by running "parseDecode.pl decode.log hyp.trans"
11. Copy over the <Exp#>/etc/<Exp#>_train.trans from the training experiment.
12. run "sclite -r <exp#>_train.trans -h hyp.trans -i swb >> scoring.log"
Along with the results I got, as a group we were able to install many packages this week. Numpy, Scipy, GCC, and the RNNLM toolkit were all installed. So, we finished all of our tasks for this week.


Plan
The plan is to create a new training model tonight with different parameters. Along with critiquing the above instructions for testing on unseen data we need to as a group map out the model as it stands now, the changes that are being made to the parameters (and effects they may have on it), and think about the directory structure for running tests on unseen data (and changing the instructions to treat this as the standard).
Concerns
There are no major concerns right now. Linking up the RNNLM toolkit into our process and maintaining a good routine of constant experiments is something I'm looking forward to figuring out.

Week Ending March 7, 2017

March 2

Task
The tasks for this week include:
Implement LDA on the Idefix drone machine (Tucker, Vitali)
Build language model as a proof of concept (Greg, Alex)
So, my job this week will be to help build a language model as a proof of concept through using the RNNLM toolkit.
I also took on the job of retesting our baselines experiment because we were able to successfully load all of our software onto idefix last week.
Results
The baseline experiment was successful last week and the building off of that baseline to changing the initial_num_densities to 2 from 1 and final_num_densities to 16 from 8. This decreased the word error rate to 43% from 44%. The reason why our baseline experiment for this semester was not created with the exact same settings as last year's best model is because they were using higher density values than what should be used on 30 hours of data. This is because they were testing on 300 hours of data.
Plan
The plan for this week is to retest the baseline again by tomorrow night.
I will then work with Alex to build a language model over the weekend using the RNNLM toolkit. I believe building a basic language model using the toolkit will be an easy process, so it is successful then I would like to try to build one using our transcripts. From there the process would be to incorporate it into a train and test.
Somewhere in between the language model building and Vitali & Tucker's usage of LDA in a train I would like to work on continuing to improve the models through more experiments.
Concerns
My only concern for now is managing the transferring modelling group's improvements in software and models to the rest of the groups. This will be easier when we split up into teams but also presents the problem of both groups having to use idefix to get better results if what we have done actually improves the results.


March 3

Task
I'm continuing the plan of retesting the baseline after our software upgrades. As a group we got a bit side tracked with analyzing and proposing changes to the experiment directory structure.
After I complete that retest I'm going to focus on building a language model with the RNNLM toolkit.
Results
I have a decode running in the 0295/015 experiment folder for testing on unseen data. It seems to be running without problem and if all goes well will lead to a nice clean process for running experiments as a whole for the rest of the semester.
Plan
After finishing this experiment tomorrow I will attempt a language model using the RNNLM toolkit. If all goes well I will try to incorporate the language model into an experiment.
Run more experiments while changing other parameters. So far we as a group have only tested out changing the num_densities both initial and final. Senome counts would be a good step up from here.
Concerns
No major concerns at this point in the week. Looking forward to using the RNNLM toolkit and utilizing neural networks in the experiments.

March 5

Task
Continue to use the RNNLM toolkit.
Build a model using LDA/MLLT (both are turned on as one switch in the train configuration file).
Results
Training is running okay on the LDA experiment. I used the same initial and final num_sequences to 2 and 16. We'll see if LDA drops the WER below 43%. Trying to decode this attempt resulted in an error of mder file not found in the decode log. I think this may be due to not building the training model first, then training it again with the LDA feat transformation turned on. The features must be created before they can be transformed in sphinx.
Plan
Try to fix experiment 017 in the coming days.
Continue work on the RNNLM toolkit, whether it be converting the binary language model into something useful or using it to rescore our current language models.
Concerns
Although we have time, I'm a little worried about the lack of documentation I'm finding around the web on some of these toolkits, sphinx included.
I think it would be helpful if we as a group were running more experiments.

March 6

Task
Work through the two sets of errors we're having running experiments. The first deals with running trains after installing Miniconda. We receive an error in decoding that a mdef file is missing. Looking through the script shows that the the config file isn't being pointed to correctly and that there are some files from the run_all script that are not being populated. The second issue comes with the RNNLM toolkit and fitting the new language model into our experiments that is still ongoing.
Results
Training stops at the decoding step right now, whether it is on seen or unseen data. This helps to verify that the issue is with sphinx and not in a supporting script written by someone in Capstone. The proposed problem right now is the installations of Python, although I was able to run trains after successfully, but there were no changes to the system since then. Somewhere in that line there is an issue.
Plan
To fix the first issue Vitali is attempting to run a clean train where he doesn't point the Python to miniconda. This way we can rule out any miscellaneous practices.
Look through the decode script to trace down a possible error for LDA.
Continue to look into converting the binary format of the language model from RNNLM to either arpa or DMP. This may be accomplished using the cmu toolkit but it's not totally clear if that will work.
Try to run a few more experiments as a group this week.
Concerns
I think tracing down this error may be a bit complicated. More complicated than the installs because it may deal with how sphinx is utilizing Python in its scripts many layers down. This would require a lot of digging.
I'm a little worried about the rest of the capstone team being versed in model training and testing.
Hopefully our group can accomplish our tasks in the next few weeks, possibly ahead of schedule, to allow more time to work within our new teams we will be put on Wednesday to focus more on experiment running.

Week Ending March 21, 2017

March 8

Task
Implement the RNN Language Model into the 0295/016 decode.
Continue to debug the LDA experiment 0295/017.
Results
Steps to build a Recurrent Neural Network Language Model.

1.make an LM directory. cd into LM 2. cp -i /mnt/main/corpus/switchboard/30hr/train/trans/train.trans trans_unedited 3. /mnt/main/corpus/switchboard/dist/transcripts/ICSI_Transcriptions/trans/icsi/ParseTranscript.perl trans_unedited trans_parsed 4. Copy in the validation file (FIND VALIDATION FILE) 5. rnnlm -train trans_parsed -rnnlm model.bin -valid trans_valid -hidden 50 - binary 6. Run lm_create.pl

While trying to implement the RNN Language Model again Tucker and I ran into an issue where we couldn't get the RNN model to fit into the sphinx decoder because sphinx is expecting an ngram model. Subsequent to that, sphinx 3 (according to a forum post) has a known bug and cannot decode on unigram models. For this reason we started to look into using RNNLM to check the ngram model. This would still offer many benefits to using neural networks but would also require the use of the SRILM toolkit.

Steps for combining ngram and RNN model (started)

rnnlm -train trans_parsed -valid trans_valid -rnnlm model -hidden 15 -rand-seed 1 -debug 2 -class 100 -bptt 4 -bptt-block 10 -direct-order 3 -direct 2 -binary
Plan
The plan is to take a look at the requirements for the SRILM toolkit, download it to my own personal laptop, and to look at what sort of file RNN takes from SRILM to use for model combination. This would be to possibly use the ngram model we produce currently to add together with the RNN model.
As far as the LDA issue is concerned, we have narrowed the problem down to miniconda. Testing into the python version being used and at what point of the decode will be needed.
Concerns
Figuring out the RNN situation.
Fixing the python situation for LDA to occur.
Aside from these two issues I am very confident in building better models for the rest of the semester, especially if these two tweaks to the system are fully implemented.

March 17

Task
Continue new trains with the number of senomes at 4000. This is in experiment 0295/018. According to sphinx, this along with the final_num_sequence set at 16 should be the ideal parameters for 30 hours of data.
Continue to look into the RNNLM toolkit and the issues we're having as a group with LDA implementation.
Results
It seems that at this point in the semester it will be better to look into using RNNLM for rescoring using the SRILM toolkit. That toolkits dependencies are already all on Idefix.
I'm thinking this increase in senome count will bring us to 41%. It's hard to see what other parameters are left to tweak on 30 hour training data beyond getting to RNNLM and LDA.
Plan
Finish the 0295/018 experiment.
Look at how SRILM works with RNNLM on my virtual machine. See if it reproduces a language model in a format that is easier to use.
Concerns
Metrocast's infrastructure throughout the state of New Hampshire.

March 20

Task
Finish Experiment 018. I restarted it to see if I had made any erros in running the train as this was my first experiment back from break.
Continue working on using the SRILM toolkit with the RNNLM toolkit to retrain n-gram models using recurrent neural networks. This would still offer better results over just n-gram models because the recurrent neural network would allow the ngram to still go backwards. It remains to be seen what difference in word error rate this would be. Whether it would decrease by less than using RNN's alone or even more so.
Results
Trying to decode 018 on unseen data brought an error in the decode.log file.
Upon further investigation the train will fail on iteration 1 with an error in /mnt/main/Exp/0295/018/scripts_pl/20.ci_hmm/slave_convg.pl
Plan
Finish decoding experiment 018 tomorrow.
Finish using SRILM and RNN together on a virtual machine to see the output of the RNNLM toolkit. This will mostly be to see if it will just edit the arpa file created by the cmu toolkit.
The python issue will have to be resolved before anything else to continue. THere's a really good chance this had to do with our Python updates.
Concerns
Things have been slow to pick back up after spring break.

March 21

Task
Successfully run 0295/018 with parameter tweaks.
Continue to work on RNN and LDA.
Results
Successfully ran a train and what looks like to be a decode on 018, although I had to change the parameters to defaults. My guess is that I either did something wrong along the process or the amount of senomes I had (4,000) created an error in creating features (Maybe creating too many for a script down the line in the process to use? hence the missing file?).
The same problem persists with the model output from RNN for language models.
Plan
Run another train and decode tomorrow morning into the afternoon to see if it was my error in 0295/018.
Concerns
Getting RNN and LDA to work.
Getting the groups back on track. It would also be nice to see what results we could get with updated transcripts.
I'm not sure what to focus on for the rebel group as a whole. It seems like at this point maximizing the tweaking of current parameters would be best if RNN and LDA are a stretch. I think a lot of gain will come from tweaking the ngram language models which have not been tweaked in the past.

Week Ending March 28, 2017

March 23

Task
From the modelling group meeting we established the tasks of using miraculix to try and get RNN and LDA to work.
The rebels group will set out individually to run our own trains right now and document our findings.
Results
We have yet to start our upgrading (if needed) of Miraculix. We need to find out if it is a 64-bit operating system so that we can more easily install Scipy and Numpy using wheel and also install the SRILM toolkit to use alongside RNN.
I just finished off experiment 0295/018 with some pretty good results.I changed multiple parameters within the config file including number of sequences, skip states, and senomes.
Plan
I will continue to tweak these along with convergence ratios and the language model weight. I'm very skeptical about how good the language model is at this point as we are dealing with telephone conversations, limited vocabularies, and some pretty messy data but it does seem like there are enough parameters to tweak for the rest of the semester.
This weekend the modelling group needs to get in contact with systems to figure out if 1)Miraculex is 64-bit and if not reinstall red-hat so it is and 2) start working on LDA and RNN on it.
Concerns
I'm still concerned with the utilization of RNN. It is not so much the RNN toolkit but how rigid sphinx seems to be. Ideally we would like to have RNN create an arpa file. Apparently, RNN's cannot be put into arpa format because the contex that they use can be infinite surrounding a word. This is according to this paper here [6]. This is something I'll be looking into more.


March 24

Task
The tasks at this point have moved for me to figure out how to follow these guidelines to use build n-gram models from RNN models. This I think would be the ideal situation as sphinx cannot use the models from the RNNLM toolkit directly and installing the SRILM toolkit seems a bit much to not be able to fully take advantage of RNN anyway. These are the steps:
Alternatively, one can approximate the RNNLM model by

n-gram models. This can be accomplished by following thesesteps:

• train RNN language model
• generate large amount of random sentences from the RNN model
• build n-gram model based on the random sentences
• interpolate the RNN model approximated by n-gram model with the baseline n-gram model
• decode utterances with the new LM
Results
These are the commands I ran to create an RNN model, generate sentences from it, and change it to the trans_parsed script to use for building the n-gram model. Note that I used the same transcript to validate this model as I did to build it so there is some bad practice there. Ideally you would want the exact correct transcript or to run it without it.
• rnnlm -train trans_parsed -rnnlm model -one-iter -hidden 50
• rnnlm -gen 1000000 -rnnlm model >> rnn_trans
• rm -rf trans_parsed
• cp -i rnn_trans trans_parsed
• ./lm_create.pl trans_parsed


Plan
I am starting the decode of this experiment (0301/001) now. I will continue to work on it tomorrow as it seems very exciting because I am basing it off of the 0295/018 model which got a score of 37%.
I will also look into the missing words in decoding that Jonas had brought up with us and the data group this past week in discussions.
Concerns
This is finally looking promising. A lot of parameters would open up if this works correctly. It may take some testing out to see how much benefit could actually be gained by using RNN.

March 25

Task
Continue working on experiment 0301/002 to use an RNN model with no validation file to build an n gram language model off of.
Continue to help with LDA implementation.
Document the RNN process more and at the same time include better documentation for running decodes on unseen data (the flags in the wiki for the first command is wrong).
Look into the missing words or substituted words from the decodes and gather stats on them.
Results
Experiment 0301/001 was a success with using a recurrent neural network. However, it brought the WER all the way back up to 50% rather than 37%. There could be a number of reasons for this that I am looking into this week.
Plan
Finish 0301/001 this weekend. Based on the results look into other areas to improve the RNN models and subsequently the language model. This is where I see the most improvement coming from in the coming weeks.
Concerns
My only concern left now is that we as a group get LDA to work. I think then we could say from our tools standpoint we have had a successful semester. Hopefully the language model can be of greater use as it looks like we can generate some great results from the acoustic model with some further tweaks.

March 27

Task
The tasks for this week were to finalize the use of LDA and RNN.
We also have the added task of looking into missing words in the transcripts.
We did not need to reinstall the operating system on Miraculix. This was a great achievement as we were able to accomplish our goals for the semester of our added tools. It is now time to use them.
Results
We were able to successfully use both LDA and RNN in our model building. LDA brought in positive results to WER while RNN brought worse results. However, it is expected between the way the language model is used and the configuration of the RNN that this was the case. It remains to be seen still how effective RNN will be in the configuration Capstone is set up in.
Plan
I'm going to start looking into the transcripts more in the coming days and next week. While I want to continue to build RNN models I want to make sure I am headed in the right direction as there are other parameters to consider (LDA and MLLT).
Concerns
I am concerned with how the transcripts are being utilized in the language building process and scoring process. They seem to be the same file, which would not be good for scoring purposes. It remains to be seen, but I thought I remember hearing that one of the teams from the last two years was having trouble running experiments on unseen data because they would get an error of seeing a word that the training model had not seen before, so they added in the transcript from the test to the train.

Week Ending April 4, 2017

March 29

Task
The tasks this week include:
Helping the tools group install RNNLM Toolkit and Miniconda for LDA.
Start to delegate more time to our different groups for modelling and parameter tweaking.
I have made a list of subjects I would like to explore.
These include LDA dimensions, convergence ratio, using a different dictionary (larger), hidden layers in RNN model, using a validation file in RNN, and the language weight of either n-gram or RNN-ngram for decoding. A lot to look through in a few weeks.
Results
I'm currently waiting to decode 0301/002 that I re-trained with LDA. This was from my 0295/018 experiment that got 37% WER. I'm thinking with LDA it will go much lower. Then, I would like to explore more with RNN and the other parameters.
Plan
I think leaving the language model parameters for last would be best as it will take the most exploring. There will be more things to look at than tweaking a single parameter such as convergence ratio. There are many things to consider and being able to use a different dictionary adds a lot of opportunity and complexity to that process.
Concerns
My concern for now is having enough cpu time to look through all of these parameters in the coming weeks. OUtside of the language model features which could take around 10 experiments, these other parameters will require around five trains to get close to the best estimates I presume. So, this is a lot of cpu time for training and decoding.
One other thing that will be important to keep in mind is taking the parameters from these 30hr trains and applying them to 300 hours. This will be very important when looking at how RNN is utilized on larger dictionaries and datasets.

March 30

Task
I'm currently working on training experiment 0301/005. I'm looking at increasing the senome count as my next parameter tweak.
Results
I have found some good documentation on the sphinx site and other various sources for Sphinx on different ranges of parameters to use. For instance, the language model weight is typically used between 6-13, with an insertion penalty typically between .2 and .7. I know all experiments have been decoded using 13 for a model weight but I am not sure of the insertion penalty. I think this may be the next thing I look into. I'm still searching for information on the convergence ratio as far as real values to use.
Plan
The plan from here is to experiment with whatever language model tweaks I can use with the n-gram model and then try to move onto using a different dictionary or RNN based models.
Concerns
There are no concerns for now. I hope I can continue to run trains as often as I have been. I think that will be needed to get through all of the testing that I want to.

April 1

Task
Document the steps needed to run a train and decode using LDA.
Start the 0301/006 train. This will be to finalize the senome counts for 30 hour experiments.
Results
The result of the 0301/005 experiment was very bad. I had forgot that I needed to add Python 2.7 to the path for the root user on idefix before executing the train and decode. This brought the word error rate to 91%. I did try a very high senome count but I think turning on LDA and not having the correct Python dependencies added more to this poor score.
Plan
I think the rest of this week can be used to finalize the correct senome count to use.Then next week look into some other smaller parameters of the acoustic model and then switch the focus to RNN and new dictionaries.
Concern
Using LDA is a pretty complicated process. Having to add Python 2.7 to the path every time as a root user to run trains and decodes adds a lot of work to just run the same scripts. Hopefully the tools group and the modelling group can figure out a way to make this easier.

April 4

Task
I'm currently redoing experiment 0301/006 to test higher senome counts.
Go over with the Rebels group about the results thus far and their thoughts on where to go from here.
Start working on the poster with the modelling group. I think the largest focus should be on LDA and then an introduction to Neural Networks/Recurrent Neural Networks/Long Short Term Memory Models.
Results
I will have to wait until tomorrow to decode this train. I have been fumbling around trying to find the correct way of using the right Python installation for training and decoding. This process really needs to be documented for the groups to use for the rest of this semester before everything is set up on Caesar by the tools group.
Plan
Decode experiment 0301/006 with the higher senome counts from my previous successfull models and go over the results with the rebels group.
Start work on the URC poster including our results possibly. A lot of the posters I have seen in the past don't delve too far into the results and what the models are actually doing. This could be a good way for our modelling group to document some of our changes in a way that will make it easier for future groups to understand.
Concerns
My only concern at this point is documenting this arduous way of using the correct Python.

Week Ending April 11, 2017

April 6

Task
The tasks for this week are largely the same as last week.
First, some more experiments need to be run by the Rebels group.
Secondly, the modelling group will be completing our poster this weekend. We'll be explaining how LDA and RNN have been implemented and the results they have given us. For this reason, more needs to be done with RNN to determine its true effectiveness at this point.
Results
I started the train of 0301/006. Here are the configurations I changed. I also tried starting 0301/007. Since there are a total of eight cores on Idefix I think we can get away with training more than one experiment at a time. However, this didn't seem to work as I started 007 and there are still only two processes running. So, this is the configuration for 007.
num_sequences_initial:1
num_sequences_final:16
number of states:5
skip state?:yes
senome count: 5000
LDA:yes
LDA dimensions:32
convergence ratio:0004
Plan
I need to narrow down this senome count by the weekend. Also, I plan on trying out some 5hr experiments to work with RNN a bit more for both the experiments themselves and the poster. I want to see if I can't get close to getting some decent results (or even better than n-gram) to show for the poster and to see something that may point me in a better direction for the larger experiments.
Concerns
A concern that has come about in the last few days is the lack of tweaking to the features in the model I see in Sphinx. What I mean specifically is using regularization to either change or eliminate which 39 features get used in creating the model. I don't see this supported in PocketSphinx either.

April 8

Tasks
All tasks are still relevant at this point of the week. The poster is coming along nicely and is about 95% done. I'm just waiting to finish an experiment with trying out some new things with RNN that I'll highlight below. Both the poster and RNN are the tasks for this week so I am getting both done at once by scoring an experiment with RNN implemented in it.
Results
Something went wrong with the train the first time for 0301/007 so I am retraining it. It seems that the trains are taking much longer than usual, probably due to the other changes to different parameters. I was playing around with the RNN toolkit and came up with a way to use the validation feature while training an RNN model. This was done by looking at the total number of lines in the train.trans file, separating those files with an 80/20 split (80 for train, 20 for validation) and running the train with 80 hidden layers and 5 for the backpropagation. These are two parameters I would like to explore more if there's time. The generated words seemed to make as much sense as the transcripts themselves which I was quite surprised about. I'm curious to see the results after retraining this model. It's also based off of the 31% score with an increase in senome and slight decrease in convergence ratio so that will be interesting as well.
Plan
I'm going to finish experiment 007 tomorrow morning. This will be using the new validation technique with the RNN toolkit uses 80 hidden layers, a backpropagation of 5, and generating 1 million words to form the ngram. I'm really not sure what the results of this will be. Hopefully something in the right direction. I would also be curious to see what using the 300hr transcript would do for results and if this could be seen as a form of cheating (although in a speech recognition system you would never be given the transcripts before, but you would definitely have a language model to use).
Concerns
I'm a little concerned with how long these experiments are taking, especially because many of the experiments we have performed as the modelling group have run into errors so they have been scrapped. Hopefully our documentation on LDA is solid enough to not run into these issues anymore.

April 9

Task
Finish the URC poster today.
Finish experiment 007 to see if the RNN based experiment brought any added results.
Results
These are the steps I took to build an RNN-based language model. I did a roughly 85/15 split of trans_parsed to build the RNN model. I then generated 1 million words that I used as a new transcript. Part of me wonders if it is even possible to beat the n-gram model based off of the transcripts themselves because they are a perfect distribution of what is going on.
split -1 19000 trans_parsed
nohup rnnlm-train xaa -valid xab -rnnlm model -hidden 80 -rand-seed-1 -debug 2 -bptt 5
rnnlm -gen 1000000 -rnnlm model >> trans_parsed
cp -i /mnt/main/scripts/user/lm_create.pl .
./lm_create.pl trans_parsed
Plan
I will finish the poster and experiment tonight and also start a new experiment. This will probably be another RNN based experiment. What I do will largely depend on what I see form the results of 0301/007. A positive results would be something in the 30% range of WER. It's hard to just against the experiment that failed which brought up a 90% WER.
Concerns
By Wednesday this week I would like to run another RNN based experiment and also rerun 006 as I think I can crack 30% without using RNN. I think I'll achieve this by utilizing a different convergence ratio along with a higher senome count.

April 11

Task
To run RNN based models.
Start putting together the rebels paper for our 300hr experiments.
Results
From the 5hr RNN model the results of the WER were 5% worse as compared to only 3% worse for the 30hr. I am very hopeful that maybe we can get positive results out of RNN for 300hr with enough tweaking. I think this would also mean that the more unseen data the model sees the better the results.
Plan
Continue to use models for RNN to try and get positive results out of it on 30 hours. Maybe run a baseline RNN on 300hr?
Concerns
Not being able to run 300hr experiments regularly.

Week Ending April 18, 2017

April 12

Task
To run a 300hr experiment for the rebels group.
Start collecting thoughts down for the rebels group paper.
Coordinate a google hangouts session with the rest of the rebels team to go over the different parameters that can be tweaked.
Results
I started a 300hr experiment (0301/011). This experiment is based off of 0301/006 but I changed the final states to 32 and senome count to 8000 to better reflect the 300 hours over the 30 hours. This also uses LDA so it will be nice to see what our baseline is. I will have to look for other parameters to change, and I would like to start a baseline RNN 300hr once this one is done to see if by the default I have used it could be worth it.
Plan
Finish the 0301/006 experiment as soon as possible. Relay the information to the rest of the rebels team. Post some of my results/findings to slack to prepare them for the hangouts session we will do together.
Concerns
At this point I'm only concerned with having enough time to run all of the 300hr experiments I would like to.


April 15

Task
The tasks are still the same for this week. This will be to complete experiment 0301/011, the 300hr experiment to act as the Rebel's group's baseline.
The paper for the rebel group needs to be started
Results
The experiment is still in progress. It has kind of stopped other experiments from running for me as team members are utilizing our other machine.
Plan
When the 0301/011 experiment is done I will use a standard n-gram model.
I will also start the paper this weekend. We were supposed to have a meetup on google hangouts but the plans fell through. Hopefully we can make this up somehow.
Concerns
I'm a little concerned with the lack of work that's been done on our paper. I'll have to start posting some of my research onto slack and have my team members read my logs.

April 16

Task
Continue the 0301/011 experiment.
Complete the rebels group paper for the competition.
Results
I started the decode of 0301/011 today. Everything seems to be going well. Hopefully it finishes as expected so we can look to take two more shots at running new experiments. One with an adjustment of the parameters we have now and another one I think should be trying RNN.
Plan
Score the 0301/011 experiment as soon as it finishes decoding.
Get feedback from the group on our next steps.
Concerns
Having enough time to run more week-long experiments this semester.

April 18

Task
Look into the results of last year's group.
Start another 300hr experiment tomorrow.
Coordinate with the rebel's team for a hangouts session this week. Many of these tasks will be the same throughout the rest of the semester, but some concerns have come about from the modelling group concerning last year's results and ours.
Results
Vitali ran a baseline 300hr based off of last year's best results. The WER was significantly better and we are currently trying to figure out why. In fact, my experiment, 0301/011 got 28% WER and his baseline (without LDA) got 25%. Concerning. Everything looks fine in terms of transcripts and scoring, but, I did notice that I can't find where last year's group was actually getting their WER score from. When I go into the experiments or even look at the logs they claim numbers that don't match the fields from the sclite scorer. Hopefully I can find out more tomorrow during our reports.
Plan
Find out the discrepancy between last year's results and their claimed results. I also want to look into the tag issue a bit and see if that has been taken care of. It's not something we have looked at as I never saw any s tags in any of my transcripts.
Concerns
The WER situation between baseline experiments is concerning, but there could also be human error somewhere along the way. I would still like to run an RNN model on 300hr if I am confident upon my research into our scoring abilities.

Week Ending April 25, 2017

April 21

Task
The tasks for this week will include running our second 300hr experiment as the Rebel's group. This time we're going to go with last year's best train while only changing LDA. The Rebel group will also be putting together their experiment document this week.
Results
There aren't any results to announce right now, other than that our last experiment did not do better than last year's baseline, which is a little concerning considering we used LDA. But, skip_state was turned on and the number of states was 5. It could be that this made the model overfit to the training data.
Plan
I will continue to monitor experiment 0301/013 as it finishes training in the next few days. I'm going to run some RNN tests with removing the s tags on our other drone machine. I think this could offer some really promising results as with the s tags being put into the hyp.trans will only make the results worse the better RNN builds the distribution of words generated for the n-gram model.
Concerns
I don't have any concerns for now. Figuring out the s tag situation will be paramount to having clear results.

April 22

Task
The tasks remain the same to this point. Finish out the experiment 0301/013. I restarted the train as I forgot to turn LDA on.
Results
The trains are running as planned. No hiccups with Python as I have been sshing in as root on idefix. Also, Alex was able to rescore Vitali's experiment that got 25%. It returned 33 percent so I think that our LDA experiment is valid at 28 percent. That is pretty good.
Plan
I will finish 0301/013 as soon as I can to try out one more test. I will be starting RNN work tomorrow and Monday so that I can really see what difference the s tags make for it.
Concerns
No concerns for now.

April 24

Task
For the next two days the tasks will be as follows. I will continue work on 5 hour trains with RNN to try and see the difference in s tag removal for scoring. I will also look to do this for a 30hr baseline train as well.
Results
I'm having a bit of trouble rescoring the experiment 0301/009 after removing the s tags from the hyp.trans file and the 009_train.trans file. I'm wondering if I have to keep those same naming conventions in order for sclite to work properly. It is nice not having to retrain the experiment to try out different results.
Plan
I'll continue the language model creation and decode of the rebel group's second 300hr experiment tomorrow.
Tomorrow I will also continue my work with RNN models.
Concerns
At this point I am only concerned with having enough time to see the results of RNN maximized. I think that next year's group could really take off with it and it would have great implications for capstone in the future.

April 25

Task
The running tasks are still to finish up the language model creation and decode experiment 0301/011. I'll need to quickly start another one in order to fit it in time for the competition.
I'll also rescore our old experiment as the Rebel group without the s tags once I figure out how to successfully.
Results
I have been having trouble rescoring experiments after removing the s tags. Although the s tags are properly removed, whenever I try to score the sclite output stops after running through the speaker list.
Plan
Create the language model and decode the 300hr experiment tomorrow.
Look over the s tag issue with vitali tomorrow and get a rescore on the RNN models.
Concerns

Week Ending May 2, 2017

April 26

Task
The tasks from last week have extended into this week. The decode of the Rebel group's 2nd 300hr train is running right now. I also have the task of finishing my research on RNN without the s tags.
Results
The 300hr experiment is running well. I'm glad to see that there are no issues running them with LDA.
As I can tell from the 5hr RNN experiments compared to the baseline 5 hr, it seems that removing the s tags actually did not make much of a difference. I still want to try with the 30hr experiments but I think the reason they are still the same is both language models were created without the s tags, so it is not a matter of building out a different distribution with them but rather just scoring. I hope to finalize these findings with the 30hr experiments.
Plan
Continue to monitor the 300hr experiment. I would really like to try one RNN based 300hr to see if I can't nail down an improvement with it.
I'll continue to run 5hr trains with RNN models to try and get a good feel for what parameters to use for the 300hr experiment.
Concerns
I'm only concerned with being able to fit about 2 maybe 3 tops 300hr experiments left. Not much time.

April 27

Task
The tasks are still the same. I am in the middle of the task of completing the 3rd 300hr experiment for the Rebel group. This requires a subtask of finding an optimal setting for an RNN based language model.
Results
The experiment 0301/016 is running. I need to find an RNN model that does better than 3-4% worse than a normal model on a 5 hour train of unseen data.
Plan
I plan on experimenting with RNN in between running the various steps for the 300hr because I need Idefix to use the RNNLM toolkit.
Concerns
No concerns for now. I am weary of finishing this train this week though. PLaying around with RNN could take a while as recurrent neural networks can be quite sophisticated. Especially when trying to apply them to a dataset magnitudes larger than what you're training on.

April 30

Task
I am still in the middle of experiment 0301/016. There is still the task seeing the scores of the other 300hr experiments with the s tags removed. I also have the task of trying to figure out a better approach to bring RNN language models to a 300hr model.
Results
I have been having trouble getting this 300hr experiment to run. This has mostly been due to the fact that some things such as generating features or running the makeTrain.pl scripts takes so long. I have a good feeling about the results I'll
Plan
When the decode of this experiment is running I will very quickly have to do some work with RNN on idefix, although I have been thinking of doing that on the next round to see if my change in convergence ratio makes a big difference.
Concerns
I hope that there is still enough time to get this experiment done by Wednesday. I'll need some time to experiment with RNN more as well.

May 1

Task
Tasks are still the same. This will be a lighter log as I am in the middle of a 300hr experiment and need that machine to do the other work I would like (RNN). The other large outstanding task is to finish the Rebel's group paper.
Results
I have been working on the Rebel's group paper. One thing that we have done this semester is use the test/train transcript. While the name suggests it is training data it is still unseen data
Plan
The plan is to have a rough draft of our report done Wednesday. I believe this is possible. It's looking like we'll get one more experiment in after this one. I think I'm going to make the switch to doing RNN last as I'll have more time to do 5hr trains next week and we can possibly get a better score with a decrease in convergence ratio over our last model.
Concerns
No concerns for now. I do want to keep reminding team members to go back and document and change the documentation of things they have noticed over the semester.

Week Ending May 9, 2017

May 3

Task
These are the tasks for the final week. I will finish the 0301/016 experiment. I also started another decode on 011 with the -e eval transcript. I also will have to do the s tag removal for this.Besides these individuals tasks the Rebel group must finish the team report and the modelling group will start a bit on the final report.
Results
The results of the 0301/016 experiment after removing the s tags was 33%. I believe this will be the same when decoding on the -e transcript as both are unseen. As all I changed on this experiment was the convergence ratio (0.0004 from .004) I only expect a marginal improvement or loss in WER.
Plan
I plan to have all of my RNN experiments and 0301/016 run by the weekend.I will also finish the decode on 016 on evaluation data. This way I can start the RNN 300hr in time for Wednesday.
Concerns
I am a little concerned for being able to start the last train in time. Also, finding a useful configuration of RNN will also be challenging in such a short time period.

May 4

Tasks
The tasks are still the same. I do have a side task of getting the feel of using all three different test flags, -t -e -d, to try running decodes and scoring on them. I'm doing this so I can score our 300hr experiments on a few different unseen scripts.
Results
I struggled at first using the other transcripts but the 5 hour experiment (0301/017) but it is running fine now. I am curious to see the results of even a five hour train using a different unseen transcript to see if it's relatively close to the other one we have been using.
Plan
I plan on letting the tools group use Idefix as soon as the 0301/016 experiment is done running I'm going to let the tools group use Idefix to test out the GCC installation. After that I will have to decode the experiment fast to get the results for Wednesday. I plan on finishing the write up this weekend also for the Rebels group so that we then just have to plugin our final results.
Concerns
I'm not sure at this point if I'll be able to fit another 300hr experiment in. Idefix is the only machine right now that is working with Python 2.7.

May 5

Tasks
The main task at this point is to finish the 0301/016 experiment. I'm having a really hard time getting the scoring to run right. I keep getting errors about missing reference files or that the smp format of the hyp.trans file is not usable. Odd errors so I reran the decode.
Results
I'm really not sure what the issue is. I'm going to take a look into the transcripts in more depth tomorrow and try out some other 5 hour experiments. The 0301/016 experiment is still running on Idefix so it looks like an RNN 300hr train for the contest isn't going to happen. I'd still like to run one after just to see the results.
Plan
Tomorrow afternoon and evening I will run some 5 hour experiments from start to finish to try and get this flag issue resolved. I'll also finish the writeup for the Rebel group.
Concerns
Still concerned about the eval and dev transcripts not wanting to work. It's almost as if they are too similar to the training set.

May 7

Tasks
I am still working on the 0301/017 experiment. I am also working on 0301/018 and 0301/016. 0301/018 is a new experiment I'm trying to decode on the dev transcript. 0301/016 is the 300hr that will most likely be used for the final results Wednesday. I still have the task of finishing the Rebels report. It is almost complete at this point.
Results
The final report is almost complete. The experiment 0301/016 finally finished training. I'm going to try and start the decode tomorrow but I'm not sure if there is enough time because the tools group needs to use Idefix. I have yet to try and rescore 0301/011 on the eval transcript. I will try that tomorrow as well.
Plan
Tomorrow morning I'll build the language model for the 0301/016 experiment and then start the decode as soon as I hear from the tools group. I really think this experiment would give better results, maybe 31% with s tags removed, over the last one (33%).
Concerns
No concerns