Speech:Spring 2017 Vitali Taranto Log

From Openitware
Jump to: navigation, search


Week Ending February 7th, 2017

2/2 and 2/3

Task

2/2: run the first experiment for the modeling group

Results

2/2: ran a train successfully, created the language model successfully. Was able to get to the last step (running sclite) before getting an error message. UPDATE 2/3: Succeeded. https://foss.unh.edu/projects/index.php/Speech:Run_Decode_Trained_Data

Error Message received: Segmentation Fault (core dumped).

investigation of the files involved revealed that hyp.trans is empty (it shouldn't be). This possibly indicates a problem with parseDecode.pl, or user error.
UPDATE 2/3:(this was because LM was in the wrong place, so user error)

Other items of note:

the addExp.pl script is the scripts used to add experiments to the wiki; not createWikiExperiment.
the LM directory should be created in the sub-experiment rather than the base-experiment.
Be careful with copy-paste, its probably better that you just type everything out the first time to avoid mistakes.
The wiki "scripts" page claims that user scripts are on the path, but this does not appear to be the case (I had to run some scripts with the full path).
Update: The above only applies to the addExp.pl script.
The 30hr corpus should be used to minimize processing time on Caesar, since we all have to share for now. It looks like there are old corpora that are smaller, perhaps those should be updated.
BEFORE running a train, use the "top" command to view processes and make certain nobody else is currently running a train.
there are 2 resource heavy steps. training should take about 2 hours on the 30 hour corpus. run_decode seems to take 2 hours as well.


Plan

2/2: In order to rule out user error, we should attempt to run an experiment from the beginning. If the same error is encountered, then the parseDecode.pl script seems like a decent place to start looking.

This is major.
UPDATE 2/3: it was user error.

addExp.pl should be updated so that AD is the default domain.

This is minor.
Concerns

2/2: Obviously, I need to figure this out.

Also, from here on out, I am writing each update as a separate log.

2/6

Task

Log in and look at logs.

Result

Read logs. Many have not started yet. Tools group has some interesting plans.

Plan

Now that I have run an experiment, next week I will look into replicating the results of Spring 2016 to establish a control.

Concerns

People are not writing logs, and from people who did write logs, I see an overall confusion as to what to do. This is understandable, as it is the first week; but this could still lead to issues down the road.

2/8

Task

Checking in

Result

Checked in.

Plan

To meet.

Concerns

None

Week Ending February 14, 2017

2/11

Task

My goal was to attempt to recreate the results that Spring 2016 had achieved, which was a WER 48.4% according to the final report located at this link: https://pubpages.unh.edu/~jax472/capstone/Competition_Report/report.txt

In order to do that, I need to figure out 3 things:

  1. How do I use non-default values while training and decoding?
  2. How do I run experiments on unseen data?
  3. What values were used to get those results (unfortunately this was not listed in the final report)?


Results

In answer to the first question, when you first run the makeTrain.pl script, inside of the etc file a sphinx_train.cfg file is generated which contains the values to be used during training. Some of the possible values that can be changed are $CFG_FINAL_NUM_DENSITIES and $CFG_N_TIED_STATES. I chose those two as examples because Spring 2016 used them.

The second question is where I ran into some trouble. According to the wiki's instructions, running makeTrain.pl -t switchboard 30hr/test is the proper thing to do to set up an experiment that will run on test data rather than train data. But if I do so, the Train will fail claiming that a 001_train.fileids does not exist (which makes sense). attempting to first run makeTrain.pl switchboard 30hr/train and then makeTrain.pl -t switchboard 30hr/test will ask the user if it wants to overwrite files. No matter your choice, the Train will fail claiming that a word was not in the dictionary.

I checked the Reunited Captain America + Iron Man teams experiments for the results in their final paper so that I could attempt to replicate them, but I was unable to find an experiment that was as successful as the one reported in their final report. I think we might have to go track down one of Spring 2016's key players to find what parameters they used. (Well, it is less critical that we copy their results and more important that we know how to modify parameters; but still).


Plan

Looking under the scripts folder I see a makeTest.pl script. Perhaps it is analogous to makeTrain.pl? This is what I will look into tomorrow.

I need to update the wiki with my findings once I figure this out. It is unacceptable for there to be incomplete instructions for running Experiments on unseen data. I will ask Jonas if it is necessary for me to save the current page somewhere before editing it.


Concerns

Only what can be inferred from reading the above. This is just a status update.

2/12

Task
Reading logs of my group members.
Results
Well, I read the logs.
Plan
Go do what is detailed in the log above this one.
Concerns
makeTest.pl is used during the decoding process in Step 3, not the training process. This is going to be annoying.
Bonus

OK, I changed my mind. If what john shallow wrote here was correct, then the train.trans I (and everyone else) used before should be fine, since it already had the testing data cut out of it. (That leaves me to wonder why a train.trans is present in the "test" directory, but w/e.)

https://foss.unh.edu/projects/index.php/Speech:SampleTrans.pl

But I am paranoid and don't want to run into trouble later, so I think I will yank down copies of the relevant files and use control-f to search for repeat lines in the files.

If I don't find repeats, I will start a train as normal (with config alterations) and then follow the relevant steps in part 3 to adjust for testing.

More Bonus

My hypothesis was correct dev.trans and train.trans (in the train folder...) do not contain overlapping content. I am going to try to duplicate experiment 0294 006 now. It meets the requirements of editing cfg options and running on test data. Still wish I knew which settings were used for the experiment in the final report.

This should go well, but we will see if I run into new problems.


2/13

Task
Read Logs. Also I found the experiment used to get the numbers in the final report. 0288 011. Thank you John Shallow, and thank you James for keeping good records.
Results
Logs were read.
Plan

I think I am going to finish out the week by looking into LDA. Why we want to use it, and what it will take to start using it (specifically, the python versions and packages). "The Modeling group needs to figure out as soon as possible what machine the '16 Modeling group used and what, if any, work they did configuring that machine." I think I will also do that.

Concerns
That solves my biggest concern. I found an experiment that can be used as a baseline to measure future gains.


2/14

Task

First, finish the experiment with non-default values run on unseen data. Second, determine the benefit of LDA and the requirements. Third, figure out what machine the Spring 2016 modeling group used.

Results

After decoding experiment 0295 003, I ran into an error when scoring using sclite. “Not enough reference files loaded, Missing:” with no entries after the error. I attempted the steps listed in the instructions for when this error is received, and had no luck. Examining the files using “uniq -d” revealed that there were no duplicated lines. So I have no idea what caused this error. With that said, it isn't a total loss. Whereas before I thought only the sphinx_train.cfg contained changeable parameters, but it turns out the decode can be modified as well (so if you wanted to change the weight of the language model, that could be done.) I also found out a great deal more about what these scripts are actually doing. I want to go update the wiki, but I will wait until I figure out why I am getting the above error.

The benefit of LDA is as follows: "First of all, it can dramatically reduce the word error rate (up to 25% relative in some of our tests). Second, it also makes the decoder slightly faster since it reduces the dimensionality of the features, and also reduces the size of the acoustic model." http://cmusphinx.sourceforge.net/wiki/ldamllt


How does it do this? Well, speech recognition is a branch of data science, and one commonly recognized challenge of data science is the "curse of dimensionality". Basically, as we add more features, we need exponentially more data to avoid over-fitting. It isn't uncommon for speech to have some 50 to 100 features per 10 milliseconds in it's acoustic model, which means our paltry 300 hours simply isn't enough. At the same time, we really do need those 50 to 100 features to distinguish between similar phonemes. LDA lets us "compress" the dimension while keeping the features using "mathmagic". The trade off is that training becomes much more complicated. http://www.datasciencecentral.com/profiles/blogs/about-the-curse-of-dimensionality

Speaking of using it, the tools requirements are simple. We do not need to upgrade our python, but we do need numpy and scipi. Thank god for the shoulders of giants.

As for what the previous modeling team has done with LDA... I cannot find any mention of it in any of their logs. It also seems to be the case that the modeling group used caesar for training before they broke into groups.

Plan
Meet with other group members tomorrow. Do a class meeting and set out task for that week.
Concerns
Somewhat annoyed that my unseen test failed. I will have to ask Greg how his went.

Week Ending February 21, 2017

2/18

Task

Check Logs

Results

Checked Logs

Plan

Install Scipy and Numpy on Idefix

Concerns

None

2/19

Task

Check Logs

Results

Checked Logs

Plan

Install Scipy and Numpy on Idefix

Concerns

None

2/20

Task

Install Scipy and Numpy on Idefix.

Results

Found the versions of scipy and numpy to use with the version of python (2.6.6) that exists on Idefix. Given that we do not want packages other than scipy and numpy, I have elected not to use a package manager and will instead build scipy and numpy from source if possible. Once I downloaded the packages I need, I then transfered them onto idefix via flash drive. One thing I noted was that the flash drive actually had to be manually mounted to idefix before I could copy the contents(guess I have been spoiled by personal computers).

Plan

Now that the files are on Idefix, I will not have to physically travel to UNHm tommorow to complete the installation. But that is the plan. I will also ask Greg if I can copy some files from Caesar to Idefix in an attempt to get his decode working.

Concerns

I hope that the source compiles without issue. It will make the tools group's job much easier in the future.

2/21

Task

Install Scipy and Numpy on Idefix.

Results

Scipy and Numpy will not compile from source on idefix without the installation of several programs which are not needed. So we have 3 options. Option 1.) Compile Scipy and Numpy somewhere else, and then push the compiled scipy and numpy to idefix. Option 2.) Install Scipy and Numpy via their python wheels. We would have to compile wheel from source or update python to do this, so either way we get the drawbacks of option 1 or option 3. Option 3.) Install Scipy and Numpy via a distribution like Anaconda. This would be the easiest option for us (the modeling group), but the tools group might hate us if we pick this option. I will discuss with the group so that we can reach a consensus.

Plan

Meet in class, review objectives. I imagine that Greg's difficulties with decodes on Idefix will be the top priority.

Concerns

I will be honest, I probably could have done more this week. But I can't say I have made no progress.

Week Ending February 28, 2017

2/22

Task

Fix idefix so we can actually run experiments on it. In our meeting, attempting to build the language model resulted in an error that was not replicated on Caesar.

Results

The error was caused by broken symlinks on idefix. The symlinks were broken because we used scp to copy mnt/main/local to idefix. scp breaks symlinks. rsync doesn't break symlinks. I deleted idefix's copy of usr/local and used rsync to grab a new copy from caesar. Doing so fixed (or to be more accurate, preserved) the symlinks. The language model was build without issue after that.

Plan

That should allow us to run experiments on idefix. Next step is Unseen decodes.

Concerns

None at this time.

2/27

Task

install python 2.7 on idefix so we could get numpy and scipy, so we can do LDA.

Results

First I attempted to install numpy and scipy on python 2.6 using pip. That failed claiming python needed to be greater than 2.7. Then I tried installing older versions of numpy. That failed claiming I needed development versions of python to build numpy.

Given that installing a new version of python was inevitable, I then attempted to search the centos repositories (I configured red-hat to check the centos repositories) for compiled versions of python 2.7. No dice there either. So I attempted to build it from source following the tutorial written here. (and in the process I installed gcc). https://www.digitalocean.com/community/tutorials/how-to-set-up-python-2-7-6-and-3-3-3-on-centos-6-4

The long and short of it was that I found that many of the requirements needed to build python didn't exist in the centos repositories. Of course getting them manually may have been possible, but that was not the route I ended up taking. I sincerely doubt that would have been the last of my problems had I continued down that road.

I decided to try to use a miniconda distribution (Anaconda stripped of everything but python and the conda package installer) instead following this tutorial. Worked perfectly. pip installed numpy and scipy and it should be good to go. https://www.atlantic.net/community/howto/install-python-2-7-centos-anaconda/

Our initial reason not to use an all in one installer like anaconda was that it would make the Tools group life more difficult. Well, I promise their life will be a hell of a lot easier with a precompiled python binary rather than trying to build it from source (and track the changes of every single required dependency). My other concern was that Anaconda might not be free for use by organizations. Anaconda's business model is based on a split between anaconda and anaconda-pro which contains more features. So we are fine there as well. https://www.continuum.io/anaconda-subscriptions

Plan

I need to figure out a way to put miniconda's python on the path permanently. For some reason, path alterations don't seem to stick. run a train and decode using miniconda's python, and compare to see if differences exist. Try out LDA.

Concerns

Miniconda is not open-source. That could cause issues for the tools group at some point, depending on exactly how they want to track the changes to files. I installed lots of things on idefix in my attempt to get python to build. It may just a drone, but still.

2/28

Task

run a train and decode using miniconda's python, and compare to see if differences exist.

Results

I started up a train. I will check on in tommorow

Plan

I need to figure out a way to put miniconda's python on the path permanently. For some reason, path alterations don't seem to stick.

Concerns

None at this time.

Week Ending March 7, 2017

3/4

Task

Checking in

Results

Checked in

Plan

To do things

Concerns

Not doing things

3/5

Task

Reading Logs

Result

Read Logs

Plan

Learn to run experiments with LDA and RNNLM.

Concern

Possible problems

3/6

Task

Figure out why my decode on unseen data failed.

Result

After examining the decode.log, I found an error that indicated that a file that was supposed to exist under model_parameters did not in fact exist. I needed to confirm that this was due to the fact that I had run a train using miniconda rather than the python 2.6 present by default. So I started up a train using the default python and sure enough the proper file was generated. I still haven't decoded this experiment, but I am confident it will succeed. This points to a new task, figure out what the heck is wrong with using the new python.

Plan

At this point, all of my efforts will be focused on debugging the new python so I can locate the source of this problem. I am betting it is a problem with our user generated scripts, but if we are unlucky it could be an compatibility issue between our sphinx and our python.

Concerns

I am glad I now understand why Greg was able to run an unseen test while I was not, but now we need to figure out what is wrong.

Week Ending March 21, 2017

3/19

Task

Checking In

Results

Checked In

Plan

Organize the team.

Concerns

I missed the last class, and need some explanation of things. I asked on slack, but nobody answered. Will ask in person.

3/20

Task

Checking In

Results: Checked In

Plan

Organize the team.

Concerns

Can't do anything because we don't have servers.

Week Ending March 28, 2017

3/25

Task

First, run a sample experiment on asterix to make sure all is well. Second, get LDA working (first step is finding the bug).

Results

I ran a 5hr experiment on asterix and it failed. I ran a 30hr on asterix and it succeeded. I think I screwed up the first time, and that the 5hr is fine. I ran a 5hr experiment on idefix with miniconda but no LDA and it worked. I ran a second experiment with LDA on and I ran into an error during decoding. This was due to the fact that the output of training is slightly different for LDA/MLLT than it is for normal training. I copied the run_decode.pl script to my computer, made a small adjustment, and sent it back up as run_decode_lda.pl so it didn't overwrite the original script.

Plan

I haven't actually looked at the score other than a sanity check (i.e. does it exist?) I need to run an experiment to compare the performance of LDA enhanced experiments to non-enhanced experiments. I should install needed components for LDA and RNNLM on asterix.

Concerns

Actually None. I am so happy right now.

3/26

Task

Check logs

Results

Checked logs

Plan

See previous

Concerns

See previous

3/27

Task

Look at the score for the 5hr LDA compared to 5hr non-LDA. Start a 30hr train for LDA.

Results

On train data, use of LDA improved the score by 8% (34% compare to 26%). Now I need to do 30hr, and make sure to check on test. Started the 30hr train.

Plan

I need to finish running an experiment to compare the performance of LDA enhanced experiments to non-enhanced experiments. I should install needed components for LDA and RNNLM on asterix.

Concerns

Still None.

2/29

Task

Check Logs

Results

Checked Logs

Plan

Meeting

Concerns

None

Week Ending April 4, 2017

4/1

Task

Run an lda on both seen and unseen to compare to 30hr without lda.

Results

First, an Ida assisted experiment on 30hr seen data almost showed no improvement vs a non-lda experiment on 30hr seen data. I was running the unseen data experiment, but I realized at decoding that I had forgotten to turn on lda like an idiot, so I deleted that experiment and started fresh.

Plan

I will check on that experiment tomorrow. I will also test lda on asterix.

Concerns

None at this time.

4/2

Task

Decode experiment 024/025, and test LDA on asterix.

Results

I decoded the experiment on 30hr unseen data with LDA and compared the results to 30hr unseen data with no LDA. LDA improved the score by about 3.5% absolute value. This may improve with parameter tweaking, but that is the effect of LDA on a default experiment.

I wanted to start an experiment on asterix to test LDA, but I can't get to asterix for some reason. I suspect it is similar to the problem Maryjean had getting into miraculix, but I don't remember how to fix that.

Plan

When I can get into asterix, test LDA on there. As soon as I know it works, start changing default values.

Concerns

Can't get into asterix.

4/3

Task

test LDA on asterix

Results

Started a train on asterix using LDA. Will check on it tommorow.

Plan

check on it tommorow.

Concerns

None.

4/4

Task

Check on my experiment.

Results

Experiment failed due to not having numpy and scipy. Installed those python libraries on asterix. Retrained and decoded experiment.

Plan

Meeting tommorow.

Concerns

None

Week Ending April 11, 2017

4/8

Task

I had two tasks. First task is to run a train with 2016's best parameters for training and add LDA to that. The second task was to edit the LDA column for the URC poster

Results

I ran, decoded, and scored the train for 0300 006/007. I got a WER of 8%. I think that is due to an error, or some kind of anomaly. I edited the URC poster and added information about LDA, and I found a nice picture to go along with the explanation

Plan

I am going to attempt further experiments to see if the results of 0300 006/007 are an anomaly or what the heck is going on.

Concerns

None.

4/9

Task

Check Logs

Results

Checked Logs

Plan

Logs

Concerns

None

4/10

Task

Check Logs

Results

Checked Logs

Plan

Logs

Concerns

None

4/11

Task

Check on 300hr experiment. Look into 8% results from 0300/007.

Results

For some reason, LM script errors out for 300hr experiment. Started a 5hr experiment with same config options to see what happens for 8% experiment.

Plan

Go to class meeting tommorow.

Concerns: None

Week Ending April 18, 2017

4/16

Task

Check Logs

Results

Checked Logs

Plan

I have some ideas

Concerns

Will write tonight

4/17

Task

Check Logs

Results

Checked Logs

Plan

I have some ideas

Concerns

4/17 2

Task

Decode 300hr experiment on caesar. Start new train testing some of my own config options.

Results

The 300hr experiment on caesar performed extremely well. Too well. The spring 2016 experiment that used the same config settings did not perform nearly that well, so something must be messing with the score. I trained this experiment on Caesar, and as far as I know caesar has not yet been changed.

The experiment I started is still training. The config setting I am using appears to take a great deal more processing power than I anticipated.

Plan

Check in on my experiment when I can. Look into why the results on 300hr decode are too good to be true.

Concerns

Something might be deeply wrong with our experiments. I really hope this is not the case.

4/18

Task

Decode my experiment tested using my own settings.

Results

0300 012/013 was an interesting experiment. I got a WER of 7.9%, which is .2% better than my last experiment. But I can't see any difference in file generation, which means I can't know if that was due to chance. In any case, I deem my secret weapon a failure, so I will reveal it for the future.

MMIE. I don't even know if Sphinx 3 really supports it.

Plan

Continue running experiments with different testing parameters.

Concerns

Something still isn't right about the results we are getting. I don't know what is going on but I don't like it.

Week Ending April 25, 2017

4/22

Task

Check Logs

Results

Checked Logs

Plan

To check logs

Concerns

None

4/24

Task

Solve the mystery of the deflated WER.

Results

I had the opportunity to speak to a member of the Spring 2016 group today. Based on our discussion, we were able to work out a few things.

  • The 300hr tests on seen data scored in the mid 20's. That is suspiciously similar to where mine scored. I think it would be prudent to triple check that I am running unseen tests correctly.
  • The "s" tags are definitely something we need to check and make certain are being processed correctly for scoring. After looking, I have determined that they indeed do make a difference, and we have not been accounting for it. 0283 019 has instructions for removing tags, we just need to update the wiki instructions so future classes will know.
Plan

I am going to attempt to use VTLN to estimate warp parameter and improve WER. I don't know if I will succeed in the limited time I have left, but this is my only shot at beating the Rebels. I might as well take it.

Concerns

None

4/25

Task

Check Logs

Results

Checked Logs

Plan

To check logs

Concerns

None

Week Ending May 2, 2017

4/30

Task

Check Logs

Results

Checked Logs

Plan

Edit final report with score and methods

Concerns

Can't access final report.

5/1

Task

Edit final report with score and methods. Test more config options

Results

Started editing final report with results and methods. Started up a 30hr train to test config options one at a time.

Plan

Test all the config options

Concerns

None

5/3

Task

Finish report so it is in a workable state in case it is due today. Decode my experiment.

Results

I edited a useless parameter that determines how many processes forward-backwards estimation is done in. Thus I got no change in WER. I will go test something else. Maybe the senones.

I finished the report, it is in a workable state. I still want other people to look at it if its due next Wednesday.

Plan

Test all the config options, and settle on one final experiment.

Concerns

None

Week Ending May 9, 2017

5/5

Task

Start the final 300 hour train

Results

I have experimented with a few final config edits along with alex, but they either didn't decrease WER, unacceptably increased training time, or simply broke the experiment. But we discovered an error in the last 300 hour experiment we trained (we edited the semi continuous instead of continuous config set) , so we could still get a better result.

So I started the train.

Plan

Check on this on Sunday.

Concerns

None


5/7

Task

Check on my 300 hour experiment.

Results

I fucked up and trained a 30 hour instead of a 300 hour experiment. I rebuilt it but at this point it is not going to finish in time for our results. I should have built some redundancy into the effort. Damn.

Plan

Check in on the experiment I am training. Do final edits to report.

Concerns

None.