Speech:Spring 2018 Brian Barnes Log


 * Home
 * Semesters
 * Spring 2018
 * Proposal
 * Report
 * Information - General Project Information
 * Experiments - List of speech experiments

Week Ending February 4th, 2018
Tuesday, January 30: Formed groups in class and discussed group roles. Learned about the focus of the Modeling group and discussed plans with team members.
 * Task:

Thursday, February 1: Just checking in to read other students' logs and check out more of the wiki.
 * Results:


 * Plan:


 * Concerns:

Week Ending February 25, 2018
Tuesday, February 20: Knowing now that we can put anything and everything in these logs, it seems like a good time to double down and actually fill them out. I hadn't written anything because I haven't seen any results. I guess failures should go in here, too. With that in mind, here's my current status. Wednesday, February 21: Met with the Data team to discuss experiments that they have run, and that we (the Modeling team) should replicate. They have worked out three different experiments using different combinations of scripts, and would like us to run 30- and 300-hour tests using those three combinations. After meeting with them and figuring out what their scripts were doing, I ran a test based on Tri's instructions to see if I could replicate his results. From here, I'll come up with the commands to run the corresponding 30- and 300-hour experiments, and pass those along to my teammates.
 * Task:

Sunday, February 25: For the end of this week, I didn't have so much of an active role in our tasks, as Steve and Hannah took care of most of the actual experiment running. However, I maintained contact with the Data team in order to make sure that their scripts were ready to go, and helped Steve prepare to run our experiments. Great!
 * Results:


 * Plan:

Tuesday, February 20: It's week five, and I honestly am still not sure what it is I'm supposed to be doing. Today in class, Prof. Jonas said that we (the Modeling group) should get together with Tri and Rose to see about running experiments on their changes to the data model. My understanding is that "run experiments" means the same thing as "run the train/decode process." I hope I'm right about that.
 * Concerns:

On that note, I haven't yet been able to run a successful train/decode. Maybe it's on me for not following directions correctly, or not knowing what the results from each individual step should look like. I'd ask for help, but I'm not sure who to ask. Prof. Jonas seems to know, but he also said not to ask him "bad questions," with the implication that anything we're supposed to "learn for ourselves," including experiments, falls under the "bad questions" category. So, that doesn't seem to be a valid avenue.

Week Ending March 4, 2018
Saturday, March 3: Met with Steve and discussed his recent results and issues he's been having with unseen data experiments.
 * Task:

Saturday, March 3: Steve has been trying to run unseen decodes by copying language models from older experiments and running decode on them. Occasionally, he would hit an error in decode where it complained about him having no mdef file. After some research, I found that since he was copying data from older experiments, the file names would not be as expected. (ie. generated files would have a prefix of 001 instead of 006, the actual experiment number he was working with.) This caused the program to not be able to find the files, as it was looking for the wrong ones, which resulted in an unhandled exception and the decode quitting prematurely. We fixed that issue, and it looks like we got the results we were looking for. Steve has more info about those.
 * Results:


 * Plan:


 * Concerns:

Week Ending March 11, 2018
Tuesday, March 6: We learned in class today that our decodes have had issues all along. Great! (It's not actually great.) For some reason, the decode script quits after only a few sentences, acting as if it had completed successfully. Priority one this week is going to be figuring out why that happens, and what needs to be done to fix it. Is it a problem in our training process? Doesn't seem like it, since as far as I can tell, we train on the entire corpus of whatever size we specify. And we have definitely never tried to train on a corpus of only six sentences. I'll be looking into that starting today. Additionally, so that the Systems group knows what we need for our experiments, we need to know how our LDA process is executed. Hopefully we can figure that out before the end of class today. Shouldn't be too hard.
 * Task:

Saturday, March 10: Talked with Steve to get to the bottom of what data needs to be copied from a previous experiment in order to run an unseen decode. Dan Beitel pointed us to makeTest.pl, which looked like it does exactly what we're looking for. Would have been nice to find this earlier, but I suppose it's good that we found it at all. Thanks, Dan! I also tried again to get an LDA train to work, but I can't make heads or tails of the error that it gives.

Sunday, March 11: Today's efforts focused on helping Steve solve his unseen decode woes. I used the directions from [Run Decode Unseen Data], which had me use makeTest.pl to copy over data from a previous experiment to decode. I used my 0303/042 experiment, since I know that it ran to completion. The decode finished without any problems, but scoring proved problematic; the hyp.trans file from my new decode didn't match up at all with 0303/042's train.trans file. Tomorrow I'll be focusing on figuring out why.

LDA executable: My research took me to some python scripts in our experiment directories, and then from there to the Sphinx project website on Github. According to their documentation, the LDA module requires Python 2.3 or higher, and also requires the NumPy and SciPy modules. We have Python 2.6 on our machines, and each machine has NumPy and SciPy installed in its own /mnt/main/root directory. However, Camden said he's looking into getting those installed on a shared directory so that they're the same on each server.
 * Results:

Decode sentence count: It looks like people have been assuming that the decode process is quite fast, when it in fact is not. This is due to the fact that people have been running the decode script using nohup, which runs the process in the background and drops the user back to their command prompt. So, decodes have been running as expected, and scoring has been done on incomplete decodes. I'd like to write up some experiment instructions that include some of the problems we've run into so that future semesters don't have issues like these.

Saturday, March 10: When trying to train using LDA, I get this in the output:

Phase 2: Flat initialize

mk_flat completed

init_gau FATAL_ERROR: "main.c", line 98: Failed to read LDA matrix

This happens in the "MLLT Transformation" step, which runs right after the "LDA Transformation" step. It seems that the LDA step completes successfully, but does not write its output to wherever the MLLT step expects it to. I'm not sure why, and I'm still investigating it.

LDA executable hunt: To start with, I went to find the configuration value that gets changed to enable LDA. (It's $CFG_LDA_MLLT in sphinx_train.cfg.) From there, I'll search online and through files on the server to see what I can find referring to that config value.
 * Plan:

Sunday, March 11: Tomorrow evening, I'll be looking more into why my transcripts don't match up after an unseen decode. Hopefully I can get to the bottom of it and actually get some verifiable unseen results. I swear to god, next week we'll be able to give a confident report on this issue. Tuesday, March 6: It looks like there is a "python" directory, where the LDA script resides, that gets copied into some experiments. I do not believe we have been copying in that directory at all, so I should run an experiment with LDA and verify whether or not that directory needs to be there (I suspect it does) and what the results of an LDA-enabled experiment look like.
 * Concerns:

Week Ending March 25, 2018

 * Task:

I lost much of this week to sickness, but I still managed to get a bit of work done. First, after talking with Prof. Jonas in class, we figured out that Asterix might not in fact have the correct resources to run LDA trains. He suggested that I run on Miraculix instead. Lo and behold, the train did in fact work. Great! My goal was simply to have a successful LDA run, so I stopped there.
 * Results:

Secondly, Hannah and I were able to run a copy of last spring's 0301/011 experiment; strangely, our run yielded a better word error rate than the one from the spring. We're not sure why it happened, and it didn't help our goal of meeting the baseline from last year, but at least we were able to run the decode. We didn't know about the "run_decode_lda" script, so we were kind of running around with our heads cut off trying to figure out why the decode process wouldn't work.

The week feels unproductive, but also I was sick, so I think that'll be my excuse. Works for me, at least.


 * Plan:


 * Concerns:

Week Ending April 1, 2018
This week I was mainly working on team stuff. Should I put that in this log? Is that how it works? I'll assume it is. Anyway, my goal was to find fun and interesting parameters for the decode process. Basically we're just looking for values to change to get better results on a decode. Kind of a throwing-stuff-at-the-wall approach.
 * Task:

Hannah and I spent some time going through the Sphinx documentation and found a page full of decode parameters. Great! However, we couldn't find where these parameters are actually set on our machines. I'll write more under the "concerns" section about this. It could be a problem.
 * Results:

Figure out how to actually change some of these parameters. They look like even if they don't help us get better results, they'll give us an idea of what won't work, and help us refine our search. So, here's the thing. Experiment directories have a sphinx_decode.cfg file in them. It's full of decode parameters. Great! We can change these to see differences in our decodes, right?
 * Plan:
 * Concerns:

Nope! This file isn't actually ever read by the decoder. Why? Who knows! Presumably there's a copy of this file somewhere that's treated as a "master" copy, which is actually read by the decode script, but the individual experiment ones aren't used. Part of getting these configs to work will be finding where the settings are actually stored, and maybe even telling the decoder to read our individual ones. Hopefully we can find that!

Week Ending April 8, 2018
Tuesday, April 3: The first task for the modeling group this week is to contact the data group and ask them for their updated scripts. I think those are finally ready? Hopefully! That way, we can run our 300-hour experiments using those updated scripts and find out which one gives us the best results. We'll also have to run our previously successful experiment on trained data just to make sure it'll get better results, as we expect. Fingers crossed!
 * Task:

Wednesday, April 4: Okay, actually the first thing we're going to do is try and recreate last spring's 28.4 word error rate seen decode.

Friday, April 6: Having completed our last experiment, Hannah and I are now running a 300-hour train and decode using the modified parse script which removes brackets from the language model. Tri has given us some helpful instructions, so I'm following those. Hopefully this will finish soon enough to get another 300-hour train and decode in before Tuesday, but I'm not holding my breath.

Friday, April 6:On Wednesday night, Hannah and I decoded a copy of last spring's 0301/011 experiment. Our goal was to recreate their 28.4% word error rate on a seen decode. There were no issues to speak of in running the decode, and by today it had finished. From looking at our results, we did in fact get exactly a word error rate of 28.4%. Great! We recreated the old experiment! That's exactly what we were looking for.
 * Results:

... except, for some reason, the decode only included 4034 sentences. That seems on par with a five-hour decode. I was under the impression that we were decoding a 300-hour data set. Maybe I'm wrong on that, but it's concerning nonetheless. I'm going to ask Prof. Jonas about it on Tuesday. Hopefully it's actually supposed to be that many, and there's no problem.

Sunday, April 8: I tried to run a 300 hour train using a modified dictionary/language model last night, but could not get it to work. It seems the dictionary doesn't end up matching the transcript after the transcript is processed to remove extraneous punctuation marks, even though I was under the impression that I was running the proper scripts to make sure they all match up. The error message I'm getting just says "something failed," and there's no output in the training log directory. However, the output to the console would indicate that the issue is in fact a mismatch between the modified transcript and the dictionary. At first, I thought that my issue was simply one of file names; I didn't name the new transcript <###>_train.trans. However, even after renaming it to match that pattern, the train still didn't work, and it still quit at exactly the same point. I'm out of ideas for today. I'm going to have to spend some more time on it tomorrow, but I'm not expecting to stumble upon the solution.

Sunday, April 8: I guess my plan at this point is to meet up with the data group (maybe over Discord tomorrow in advance of our class meeting on Tuesday) to ask for help running this train. They've done five-hour ones before, so they should at least have an idea of what my issue is. Hopefully it's as easy as plugging in "300hr" where "5hr" is in their procedure. Sunday, April 8: I really had hoped to have our two 300hr trains done by this point. I'm pretty optimistic about getting them done in the next few days, but it's still disconcerting that we haven't been able to do it yet. I guess that's on me, though. At least we recreated the baseline experiments, though! That's a good thing.
 * Plan:
 * Concerns:

Week Ending April 15, 2018
Tuesday, April 10: Met with the Guardians team in class and worked on planning and troubleshooting. Secret stuff!
 * Task:


 * Results:

Tuesday, April 10: I've got some Guardians stuff to do, so I won't talk in any great detail about that. However, I'll also be contacting the Data group for help with scripts that they have. Also, I'll need to research the fixed-mismatch seen baseline. I'm not sure exactly what the "mismatch" in question is, but I'll, uh... figure it out? Something like that.
 * Plan:
 * Concerns:

Week Ending April 22, 2018

 * Task:


 * Results:


 * Plan:


 * Concerns:

Week Ending April 29, 2018

 * Task:


 * Results:


 * Plan:


 * Concerns:

Week Ending May 6, 2018

 * Task:


 * Results:


 * Plan:


 * Concerns: