Speech:Spring 2016 James Schumacher Log


 * Home
 * Semesters
 * Spring 2016
 * Proposal
 * Report
 * Information - General Project Information
 * Experiments - List of speech experiments

2/3/16

 * Tasks:

Get together with group members and try to run an experiment. Also, share contact info.


 * Results:

On Wednesday, our group attempted to run an experiment (lead by Ben who has prior experience with running experiments). We were unsuccessful for the first four times; however, after reviewing the Perl code, it was discovered that one of the commands we were trying to execute wasn’t written correctly. One of the arguments for the prepareTrainExperiment.pl was incorrect (wrong path value). After looking at the tutorial, we verified that the first four experiments did indeed fail due to human error (wrong path value). After rectifying this by modifying the command, the experiment started running successfully (remains to be seen if the experiment is successful, as it takes several hours to complete).

On my own, I perused through Caesar’s contents, getting familiar with the project’s file structure and files. Also, while we were looking for the “bugs” in the Perl scripts that we thought were causing the issue mentioned above, I became a bit more familiarized with how the Perl scripts work.

We shared contact info.


 * Plans:


 * Concerns:

2/4/16

 * Tasks:

Meet up with Jon to attempt to fix an issue that Ben emailed us about. The issue dealt with the scoring not running successfully.


 * Results:

After taking a closer look, we realized, again, that human error was most likely the reason why the scoring was unsuccessful. For more details on this, refer to the group log here (04 Feb 16): link.


 * Plans:


 * Concerns:

2/7/16

 * Tasks:

Review logs for the current week.


 * Results:

After reading Peter Ferro's log (Week Ending February 9, 2016), I fixed my issue with logging into cisunix.unh.edu. Now, I can log into caesar either through cisunix or the VPN. In addition, reading Matthew Heyner's log was quite helpful. He had some links in there that I went to and read up on regarding scoring (under Week Ending February 9, 2016).


 * Plans:

Read through the Spring 2015 Project Report again, specifically the modeling group and start to get an idea of where our group this semester can head in terms of project goals.


 * Concerns:

2/8/16

 * Tasks:

Read through the modeling section of the Spring 2015 Report and read through the student logs of the modeling group of the Spring 2015 semester.


 * Results:

After reading through the modeling section of the Spring 2015 Report, the group as a whole made the following suggestions for future semesters to follow up on:
 * 1) They mentioned that the language model over the past semesters has been neglected a bit. More focus was put on the acoustic model, which is also important; however, they found that when they tweaked with the language model, they were able to generate better results. They mention that there are many variables that can be tweaked when creating the language model; however, they didn't have a great deal of time to tinker with all of these variables. So, for this semester, it would probably wise to investigate further into the language model, particularly since the 2015 modeling group was able to get some better results even with there limited time tweaking with the language model.
 * 2) They mention that enlarging the dictionary could result in better results. In addition, they mention that one possible substantial source of error is from the audio transcripts containing words that are cutoff at the end (like: examp-). At this point, it's a little unclear how increasing the dictionary will help with this; however, the problem remains that there are a lot cutoff words in the audio transcripts (people aren't always going to speak perfectly), so doing some research into being able to recognize a cutoff word and make sense of it would be worth looking into for this semester.
 * 3) These are two more suggestions from the group: find an optimal value for the convergence ratio (they changed it from 0.04 -> 0.004; however, the lower the convergence ratio, they say the worse the real time factor is, so finding a nice balance is important), experiment with density (they say a higher value has a better outcome on seen data, but they suspect that a higher value with have a negative impact on unseen data). For these reasons, it's likely a good idea to experiment with these values and perhaps other values that we find to have importance.

After skimming through the student logs of the modeling group from the Spring 2015 semester, I gained an appreciation for what they were able to accomplish. They had to fix a lot of the documentation for running experiments and had to fix several of the scripts that are used to create and run experiments. I also found Samuel Sweet's logs to be quite detailed and informative, which potentially makes his logs a solid reference if we run into any issues with running experiments in the future.


 * Plans:


 * Concerns:

2/10/16

 * Tasks:

Communicate with the group and determine our tasks to perform for the immediate future. Run a 125 Hr train to determine a baseline.


 * Results:

With some help from Ben, I ran the decode on the 0283/002 sub experiment. I ran into an issue with permissions (permission denied). To fix this, I had to login in as root and use the following code: 'chmod g+w '. I logged out of root and then logged in as my user again. The permission issue was resolved after this. The decoding was successful and the scoring also ran successfully. The results can be found here: link

Started running the generateFeats script
 * Convergence ratio set to 0.004 -- $CFG_CONVERGENCE_RATIO
 * Density set to 64 -- $CFG_FINAL_NUM_DENSITIES
 * Senones set to 8000 -- $CFG_N_TIED_STATES
 * Other settings are default


 * Plans:

Wait for the generateFeats script to complete.


 * Concerns:

None, currently.

2/11/16

 * Tasks:

The generateFeats script has completed and it's now time to run the train.


 * Results:

Read the following experiment log. It highlights the process I went through: Experiment 0283/003


 * Plans:

Wait for the 125 Hr train to complete (should be done around 11:30 PM on 2-13-16). Also, experiment some more with Linux commands and their options. Also, look into some of the Sphinx train configuration variables.


 * Concerns:

Hoping that the initial train that failed didn't mess with the currently ongoing train.

2/13/16

 * Tasks:

Write up a rough draft of the modeling group's goals for the Spring 2016 proposal. Also, learn about permissions in Linux.


 * Results:

I added a rough draft of the modeling group's goals to the overall group proposal Word document that Thomas created.


 * I also gained a much better idea of how permissions are set in Linux.
 * So there are three permission sections:
 * Owner   Group    Other
 * rwx     rwx      rwx


 * The maximum value for any section is 7 or 111 in binary and the minimum value is 0 or 000 in binary.


 * Example of how to set permissions:
 * Let's say you want to set set rwx for Owner, rwx for Group, and only r for Other:
 * Owner   Group    Other
 * rwx     rwx      rwx
 * 111     111      100 <--Binary values: the 1's set (turn on) and the 0's reset (turn off)
 * 7       7        4


 * Linux command:
 * chmod 774 // Add a -R right after chmod and before the permission value if you want to set this permission for every file and directory under the directory you're changing the permissions on.


 * A good resource: chmod


 * Plans:

Still waiting for the train to complete. Only a matter of hours now.


 * Concerns:

Still hoping that the initial train that failed didn't mess with the currently ongoing train.

2/14/16

 * Tasks:

Create the language model and get the decode running.


 * Results:

I created the language model, and after some discussion with Ben as to what values to use for the decode, I started the decode (audio files copied over: 10000; senomes: 8000).


 * Plans:

Wait for the decode to finish so I can score it.


 * Concerns:

None, currently.

2/17/16

 * Tasks:

Do the score on the decode for experiment 0283/003.


 * Results:

I performed the score. See the experiment log for 0283/003: link

I also made a hypothesis as to why the decode took so long. Here is a snippet from the experiment log.

Jonas told me that the decode should be around real time. This decode had roughly 12 hours of audio files to decode and this decode took over 48 hours. About four times real time. However, I set the senomes value to 8000 instead of 1000 like we did on our previous successful experiment. The decode on the previous successful experiment took about 30 minutes. So, this is my hypothesis concerning the senomes value effect on real time.
 * 1000 Senomes will result in 1/2 real time
 * 2000 Senomes will result in real time
 * 8000 Senomes will result in 4 times real time


 * Plans:

Investigate why the senomes value effects the real time factor of the decoding process.


 * Concerns:

None, currently.

2/18/16

 * Tasks:

Add hyperlinks to the Introduction section to make accessing individual logs and the various group sections in the proposal easier.


 * Results:

I added the hyperlinks with no issues.


 * Plans:

Read logs from the modeling group and also the data group, since we will likely work with them in the future.

None, currently.
 * Concerns:

2/20/16

 * Tasks:

Read the logs of the modeling group, data group, and experiment group. Also, see if any improvements to the proposal in the modeling section can be done.


 * Results:

I read through the logs mentioned under Tasks and found some helpful information from Ben (link) and Brenden (link). I also made an update to the modeling section of the proposal.

Ended up helping Ryan out a little bit with the process of creating a language model and starting a decode. The primary issue Ryan had was starting a decode. I looked into the decode.log file in the DECODE directory and saw that it was incredible short and at the end there was a system error, which essentially said "can't open this file." So, after reading this, I looked at the root of the sub experiment directory to see if everything looked right and I noticed that the language model directory wasn't capitalized (was lm and not LM). This was causing the issue. I renamed lm to LM using 'mv lm LM' and asked Ryan to start the decode again and it worked! Now, we wait to see the results.


 * Plans:

Communicate with the rest of our group about whether or not improvements can be made to the proposal. Read the information found in the links above from Ben and Brenden.

How well are proposal will go down with Professor Jonas.
 * Concerns:

2/22/16

 * Tasks:

Read the information found in the links from the previous log (2-20-16). Also, see if any information can be found on the effect that the senone value can have on the RTF of the decoding process.


 * Results:

I read the paper titled, RESEGMENTATION OF SWITCHBOARD. It details a few causes of higher Word Error Rates (WER) due to the switchboard data. These include a large variation in pronunciations, the high frequency of monosyllabic words, and acoustic model mismatch. They highlight the following steps to improve switchboard data transcripts, which yields a better WER through a better acoustic model: improve the quality of the segmentations and transcripts of the training data.

I also read through a lab document from MIT. It contains information on some of the parameters used in the process of training, such as the following: statesperhhm, skipstate, gaussiansperstate, n_tied_states, convergence ratio, and maxiter.

I couldn't find any information on the effect that the senone value has on the RTF of the decoding process. However, my earlier hypothesis is looking good. Our decode on the 0283/004 experiment took roughly 5:24 hrs and my hypothesis predicted 5:45 hrs. Pretty close.


 * Plans:

Run another decode on the 0283/003 experiment using only a 1000 audio files and 8,000 senones, instead of our initial decode of 10,000 audio files and 8,000 senones. This is so we have better comparisons with the upcoming experiments.

None, currently.
 * Concerns:

2/24/16

 * Tasks:

Determine the duration of how long our 0283/003 experiment took compared to our 0283/004 experiment. Investigate into increasing the dictionary size.


 * Results:

Helpful Links:
 * Information Regarding the Dictionary:
 * C Programs
 * The CMU-Cambridge Statistical Language Modeling Toolkit v2
 * Building Language Model
 * Speech Recognition Toolkit


 * Determining duration of trains (example):
 * Go to /mnt/main/Exp/0283/004/logdir
 * Take the time stamp of the last file and subtract the time stamp of the first file

We found that the lm_create.pl script uses a c program called wfreq2vocab which has a -top option that essentially caps the number of words you can have in the dictionary. Right now, in the lm_create.pl script, it's not specified. We would like to increase the dictionary word cap to 30000, so we would add "-top 30000" to the command. This value coincides to the recommended dictionary size : total audio hours ratio.
 * Increase max dictionary cap:

Refer to the following experiment log: Experiment 0283/005
 * New Train:

Wait for generateFeats.pl to finish and then modify the sphinx train configuration to the same specs as Experiment 0283/004. Then, start the train.
 * Plans:

Getting our proposal up to snuff by Sunday.
 * Concerns:

2/27/16

 * Tasks:

Create the language model, start the decode, and score the hyp.trans file when the decode finishes.


 * Results:
 * 0283/005 Experiment Log
 * I went through all of the steps to create the language model, except executing the lm_create.pl script.
 * Awaiting feedback from the group for my proposed changes to the script.
 * My proposed changes have been accepted.
 * I've run the lm_create.pl without any issues.
 * Next, I started the decode. It's currently running without issue. Will be able to score in under 6 hours from now.
 * The decode finished and I scored it. See the experiment log above for details.

Work with the group to finalize our proposal tomorrow.
 * Plans:

Getting our proposal up to snuff by Sunday.
 * Concerns:

2/28/16

 * Tasks:

Meet up with Jon and Ryan to work on the Implementation Tasks section of our section of the proposal.

We divided up the work ahead of us and came up with an effective Implementation Tasks section. Also, I received an email from Ben and he wanted to know how adding the -top 30000 option affected our tmp.vocab file in the LM directory of our Experiment 0283/005. So, I checked the line count of each file by using $ wc -l tmp.vocab in 0283/005 Experiment 0283/005 and Experiment 0283/004. Under the results section of the experiment log, you will see that I added a comparison concerning the line count of each tmp.vocab file. Roughly 3,500 words were added to the tmp.vocab file in 0283/005 Experiment 0283/005 compared to the tmp.vocab file in Experiment 0283/004.
 * Results:

Consider which parameter of the language model to tweak next and which value to use.
 * Plans:

Fingers crossed our proposal is a letter grade or more better than the first draft.
 * Concerns:

3/1/16

 * Tasks:

Catch up on logs.

Read logs.
 * Results:

Meet up with group on Wednesday to determine what our next experiment will be. We'll likely tweak the gt parameter for the language model creation.
 * Plans:

Fingers crossed our proposal is a letter grade or more better than the first draft.
 * Concerns:

3/2/16

 * Task:

Communicate with the Data group about creating a new corpus to train on. Reasons to create the new corpus:
 * The current data we are training on has errors in it that the Data group discovered
 * To generate a world class baseline, it doesn't make sense to train on data that has errors in it


 * Results:

After communicating with the Data group, we decided to create a new corpus called "fixed 30k" in which we will copy over the first 32,000 utterances (in audio/utt), first 32,000 lines in the transcript file (trans/train.trans), and first 247 conversations (in audio/conv) (there isn't a one-to-one relationship between utterances and conversations -- a conversation has many utterances in it). Jon determined the exact number of conversations to copy over. Can't remember exactly how he did it.

After creating the new corpus, we tried running prepareTrainExperiment.pl. Unfortunately, it fails for reasons unknown for now.

See our modeling group log entry on 3/2/16 for more information on creating this new corpus.

UPDATE: I received an email from Jon and he noted that there was a path issue. After investigating the file structure of /mnt/main/, I discovered that someone (likely Jonas) moved the root directory from /mnt/main to /mnt/main/misc. As a result, the prepareTrainExperiment.pl needs to be modified so that the correct path is used or else running trains won't be possible.

UPDATE: Jon has created a prepareTrainExperiment2.pl script to hopefully fix the issue.

UPDATE: After I texted him one issue in his prepareTrainExperiment2 script and Matthew from the Experiments group emailed him the same issue (one more line needed to be modified to fix the path issue), Jon got the script working again. He is currently going through the steps of running a train on our new corpus.

UPDATE: After looking at Jonas' Twitter feed, I noticed that he mentioned that the root directory was one of the directories he moved. I'm guessing he didn't know that doing that would break prepareTrainExperiment. Regardless, the issue is fixed now, so students can continue running trains.


 * Plan:

Find out why the prepareTrainExperiment.pl is failing on our new corpus.

UPDATE: From above, Jon and I found out why. Awaiting the results from our new corpus. Making our new corpus compatible with training. UPDATE: None, now.
 * Concerns:

3/4/16

 * Task:


 * Follow the instructions from Jonas to enable sshing to the drone servers by setting up an automatic login to them.
 * Determine how long the train for Experiment 0283/008 took.
 * Determine how long our new corpus (fixed_30k) is in hours.


 * Results:

I successfully followed the instructions and was able to ssh into asterix by executing the command below: ssh asterix

The train seems to only have taken 2:32 hours, which is far under what I expected to see (maybe our new speech corpus wasn't constructed properly?). I'll be curious to see the score.

Using awk, I was able to come up with an exact number of hours for our new corpus (fixed_30k) cd /mnt/main/corpus/switchboard/fixed_30k/train/trans awk '{total += $3 - $2} END {print total / 3600}' train.trans 39.5256 hours

Also, by using this command, I found that our 125hr_3170 corpus is actually 173.92 hours. Sort of surprising how far off the mark that is. I then checked the 256hr corpus and that was at 311.76113 hours. I'm now wondering if my command is... off. I'll email Jonas about it. He knows his Unix commands.

Jonas got back to me with some interesting information. After considering the information, I decided to find out how many hours of audio are in the /utt directory to compare to the number of hours of utterances in the /trans/train.trans file. I came up with the following command to sum up the bytes in the /utt directory for the fixed_30k corpus: ls -l | awk '{total += $5 - 40} END {print total / (8000 * 60 * 60)}' 40.6174 hours

40.6174 hours : 39.5256 hours ((40.6174 hours - 39.5256 hours) / 39.5256 hours) * 100% = 2.76% more hours of audio in the /utt directory compared to the /trans/train.trans

I also tested the 256hr corpus The command resulted in the following value: 320.752 hours of audio

320.752 hours : 311.76113 hours ((320.752 hours - 311.76113 hours) / 311.76113 hours) * 100% = 2.88% more hours of audio in the /utt directory compared to the /trans/train.trans

Ideally, they would be equal.


 * Plan:

Continue to investigate the data discrepancies. My command is incorrect for checking the number of hours in a speech corpus. UPDATE: None, currently.
 * Concerns:

3/6/16

 * Task:

Read logs and get caught up on email stream from Jonas.


 * Results:

I've caught up on logs and caught up on Jonas' email stream. So the 40 value that I used in my previous log above was just a guess as to the size of the header in the sphinx audio files. Jonas found that the headers were 1024 bytes in size. He still found a discrepancy between the audio length of the sphinx files and transcript file in our 5hr_3170 corpus. He also discovered that our 5hr_3170 corpus is actually only 3.87039930555555555555 hours and the corresponding transcript is only 3.84019319444444444444 hours, resulting in a discrepancy of .03101597261836851112 hours. We'll need to look into why that is.

I also reran my command on the fixed_30k and 256hr corpus with a header of 1024 bytes. Results: fixed_30k/train/audio/utt: 39.5241 hours vs. 39.5256 hours from the transcript -- ((39.5241 hours - 39.5256 hours) / 39.5256 hours) * 100% = -0.003795% 256hr/train/audio/utt: 312.199 hours vs. 311.76113 hours from the transcript -- ((312.199 hours - 311.76113 hours) / 311.76113 hours) * 100% = 0.140450%


 * Plan:

Figure out a plan of attack for what Jonas wants us to do. Nothing specific.
 * Concerns:

3/8/16

 * Task:

Skimmed through some logs from Spring 2012 (Data Group), Spring 2013 (Data Group), and Spring 2014 (Data Group) and also David Meehan's logs.


 * Results:

I didn't find anything specific to the issues that Jonas mentioned in his emails: why there is a slight difference in time duration when comparing the time duration of an audio file and its corresponding line in the transcript file. But I did find something interesting in David Meehan's logs. He created a corpusSize.pl script in the /mnt/main/scripts/user directory. It accounts for the transcript having overlap data.

Example of using corpusSize.pl   /mnt/main/scripts/user/corpusSize.pl "fixed_30k"


 * Plan:

Continue to investigate why there is a discrepancy between the audio file time duration length and the corresponding line in the transcript file time duration length. Nothing significant, right now
 * Concerns:

3/9/16

 * Task:

Investigate the master corpus and master transcript.


 * Results:

Ben and I went into the /mnt/main/corpus/switchboard/dist/disk1, disk2, etc.

And we summed up all of the hours of audio files and came up with 517.7126 hours (the picture shows 517.7135 hours but that's including the headers -- I took those out afterward). The transcript in /mnt/main/corpus/switchboard/dist/master_trans/full_trans.text contains 518.293 hours.

% Difference: ((517.7126 hours - 518.293 hours) / 518.293 hours) * 100% = -0.1120%

Also, I just summed up all of the audio files in /mnt/main/corpus/switchboard/full/train/audio/utt and came up with 1256.36 hours. This is just odd.


 * Plan:

Start working on a new script that will generate new utterance audio files given the /corpus/switchboard/full/train/trans/train.trans Nothing significant, right now
 * Concerns:

3/10/16

 * Task:

Meet up with Jon to work on getting a new corpus up and running.


 * Results:

Script I created an empty Perl script called createUtts.pl. I went on to write the pseudocode for the process of creating new utterances given a transcript file. Pseudocode: Takes in a transcript file (i.e. train.trans) and generates utts from the conversations audio files in /mnt/main/corpus/switchboard/dist/flat Usage: createUtts.pl /absolute/path/to/train.trans /absolute/path /to/the directory you want the utts in

Get arguments Open file

Loop Successively read each line Throw full file name into variable And Throw a formatted file name (i.e. sw2345) and add a 0 after the w (i.e. sw02345) into a variable Throw the start time into a variable Throw the end time into a variable Throw the diff between end time and start time into a variable Use sox command like so: sox /mnt/main/corpus/dist/flat/formatted file name /absolute/path/to/the directory you want the utts in [start time variable] [end time variable - start time variable] End Loop

Close file

Testing Script So far, I was able to implement this pseudocode into actual code with some help from Jon. We have tested the script successfully, with a catch. Instead of actually calling the sox command with system(args), we simply print the sox command. We have yet to test this script with the sox command actually getting called.

Jonas Drops By Also, Jonas dropped by going over the log files we need to generate: train.log, conv.log, and utt.log. train.log: utterance start time -- utterance stop time -- diff conv.log: utterance stop time -- length of conversation -- diff utt.log: expected utterance -- actual utterance -- diff

New Corpus Structure: corpus /                         \                             test                   train                   info / |   \               /   |   \                 /   \                      audio  etc  trans       audio  etc  trans           log  misc /    \                 /     \                   conv    utt             conv    utt

Jonas also explained why there are so many hours in the full_trans.text. He came up with a UNIX command to calculate the total hours worth of non-verbal utterances in the full_trans.text. It's filled with 173.83 hours. Below is the UNIX command Jonas came up with: IMPRESSIVE! NOTE: /tmp/full.txt contains /mnt/main/corpus/switchboard/full/train/trans/train.trans and /tmp/all.txt contains /mnt/main/corpus/switchboard/dist/master_trans/full_trans.text echo `diff /tmp/full.txt /tmp/all.txt | sed 's/$/#/g'` | sed "s/# >/>/g" | sed "s/#/\n/g" | egrep '[0-9]+a[0-9]+>' | awk '{total += $4 - $3} END {print total / 60 /60}'


 * Plan:

To test createUtts.pl, I'm thinking of copying a train.trans file into a test directory and chop off all but the first 10 lines. And then create another test directory to be the target directory for the new utterance audio files. If the script is successful, there will be 10 new utterance audio files.

Once this is done, I'll continue working on the script and add logging to it. Nothing significant, right now
 * Concerns:

3/11/16

 * Task:

Test the createUtts.pl script on the first 10 lines of the train.trans from /mnt/main/corpus/switchboard/full/train/trans/train.trans


 * Results:

My first test was unsuccessful: Failure: [jrs1036@caesar ~/Scripts]$ ./createUtts.pl ~/TrainDirectory/train10.trans ~/TargetDirectory/ sox FAIL formats: can't open input file `0.000000': No such file or directory sox /mnt/main/corpus/switchboard/dist/flat/sw02001.sph /mnt/main/home/sp16/jrs1036/TargetDirectory/sw2001B-ms98-a-0001.sph 0.000000 2.655625 sox FAIL formats: can't open input file `0.977625': No such file or directory sox /mnt/main/corpus/switchboard/dist/flat/sw02001.sph /mnt/main/home/sp16/jrs1036/TargetDirectory/sw2001A-ms98-a-0002.sph 0.977625 10.58375 sox FAIL formats: can't open input file `10.166375': No such file or directory sox /mnt/main/corpus/switchboard/dist/flat/sw02001.sph /mnt/main/home/sp16/jrs1036/TargetDirectory/sw2001B-ms98-a-0003.sph 10.166375 12.647125 sox FAIL formats: can't open input file `19.804875': No such file or directory sox /mnt/main/corpus/switchboard/dist/flat/sw02001.sph /mnt/main/home/sp16/jrs1036/TargetDirectory/sw2001A-ms98-a-0004.sph 19.804875 1.5075 sox FAIL formats: can't open input file `22.813500': No such file or directory sox /mnt/main/corpus/switchboard/dist/flat/sw02001.sph /mnt/main/home/sp16/jrs1036/TargetDirectory/sw2001B-ms98-a-0004.sph 22.813500 4.133375 sox FAIL formats: can't open input file `26.946875': No such file or directory sox /mnt/main/corpus/switchboard/dist/flat/sw02001.sph /mnt/main/home/sp16/jrs1036/TargetDirectory/sw2001B-ms98-a-0005.sph 26.946875 7.864875 sox FAIL formats: can't open input file `27.362000': No such file or directory sox /mnt/main/corpus/switchboard/dist/flat/sw02001.sph /mnt/main/home/sp16/jrs1036/TargetDirectory/sw2001A-ms98-a-0006.sph 27.362000 1.57075 sox FAIL formats: can't open input file `33.898250': No such file or directory sox /mnt/main/corpus/switchboard/dist/flat/sw02001.sph /mnt/main/home/sp16/jrs1036/TargetDirectory/sw2001A-ms98-a-0008.sph 33.898250 6.225 sox FAIL formats: can't open input file `34.811750': No such file or directory sox /mnt/main/corpus/switchboard/dist/flat/sw02001.sph /mnt/main/home/sp16/jrs1036/TargetDirectory/sw2001B-ms98-a-0006.sph 34.811750 1.279 sox FAIL formats: can't open input file `38.555375': No such file or directory sox /mnt/main/corpus/switchboard/dist/flat/sw02001.sph /mnt/main/home/sp16/jrs1036/TargetDirectory/sw2001B-ms98-a-0008.sph 38.555375 2.2305

It turned out I forgot to add 'trim' before the start time.

After making that change, success: [jrs1036@caesar ~/Scripts]$ ./createUtts.pl ~/TrainDirectory/train10.trans ~/TargetDirectory/ sox /mnt/main/corpus/switchboard/dist/flat/sw02001.sph /mnt/main/home/sp16/jrs1036/TargetDirectory/sw2001B-ms98-a-0001.sph trim 0.000000 2.655625 sox /mnt/main/corpus/switchboard/dist/flat/sw02001.sph /mnt/main/home/sp16/jrs1036/TargetDirectory/sw2001A-ms98-a-0002.sph trim 0.977625 10.58375 sox /mnt/main/corpus/switchboard/dist/flat/sw02001.sph /mnt/main/home/sp16/jrs1036/TargetDirectory/sw2001B-ms98-a-0003.sph trim 10.166375 12.647125 sox /mnt/main/corpus/switchboard/dist/flat/sw02001.sph /mnt/main/home/sp16/jrs1036/TargetDirectory/sw2001A-ms98-a-0004.sph trim 19.804875 1.5075 sox /mnt/main/corpus/switchboard/dist/flat/sw02001.sph /mnt/main/home/sp16/jrs1036/TargetDirectory/sw2001B-ms98-a-0004.sph trim 22.813500 4.133375 sox /mnt/main/corpus/switchboard/dist/flat/sw02001.sph /mnt/main/home/sp16/jrs1036/TargetDirectory/sw2001B-ms98-a-0005.sph trim 26.946875 7.864875 sox /mnt/main/corpus/switchboard/dist/flat/sw02001.sph /mnt/main/home/sp16/jrs1036/TargetDirectory/sw2001A-ms98-a-0006.sph trim 27.362000 1.57075 sox /mnt/main/corpus/switchboard/dist/flat/sw02001.sph /mnt/main/home/sp16/jrs1036/TargetDirectory/sw2001A-ms98-a-0008.sph trim 33.898250 6.225 sox /mnt/main/corpus/switchboard/dist/flat/sw02001.sph /mnt/main/home/sp16/jrs1036/TargetDirectory/sw2001B-ms98-a-0006.sph trim 34.811750 1.279 sox /mnt/main/corpus/switchboard/dist/flat/sw02001.sph /mnt/main/home/sp16/jrs1036/TargetDirectory/sw2001B-ms98-a-0008.sph trim 38.555375 2.2305

sox usage: sox filein fileout trim start duration

Taking a look in the target directory: [jrs1036@caesar ~/TargetDirectory]$ ls sw2001A-ms98-a-0002.sph sw2001A-ms98-a-0006.sph  sw2001B-ms98-a-0001.sph  sw2001B-ms98-a-0004.sph  sw2001B-ms98-a-0006.sph sw2001A-ms98-a-0004.sph sw2001A-ms98-a-0008.sph  sw2001B-ms98-a-0003.sph  sw2001B-ms98-a-0005.sph  sw2001B-ms98-a-0008.sph

After downloading these utterances from CAESAR to my laptop, I played them and transcribed them:

train.trans: [jrs1036@caesar ~/TrainDirectory]$ head -10 train10.trans | sort sw2001A-ms98-a-0002 0.977625 11.561375 hi um yeah i'd like to talk about how you dress for work and and um what do you normally what type of outfit do you normally have to wear sw2001A-ms98-a-0004 19.804875 21.312375 um-hum sw2001A-ms98-a-0006 27.362000 28.932750 and is sw2001A-ms98-a-0008 33.898250 40.123250 right right is there is there um an[y]- is there a like a code of dress where you work do they ask sw2001B-ms98-a-0001 0.000000 2.655625 okay hi sw2001B-ms98-a-0003 10.166375 22.813500 well i work in uh corporate control so we have to dress kind of nice so i usually wear skirts and sweaters in the winter time slacks i guess [noise] and in the summer just dresses sw2001B-ms98-a-0004 22.813500 26.946875 we can't even well we're not even really supposed to wear jeans very often sw2001B-ms98-a-0005 26.946875 34.811750 so it really doesn't vary that much from season to season since the office is kind of you know always the same temperature sw2001B-ms98-a-0006 34.811750 36.090750 so sw2001B-ms98-a-0008 38.555375 40.785875 [noise] not formally

My transcription from playing these utterance files: sw2001A-ms98-a-0002 0.977625 11.561375 k hi hi um yeah uh I'd like to talk about how you dress for work and and um what do you normally what type of outfit do you normally have to wear well I work at sw2001A-ms98-a-0004 19.804875 21.312375 again um-hum sw2001A-ms98-a-0006 27.362000 28.932750 so and it really doesn't va[cutoff] sw2001A-ms98-a-0008 33.898250 40.123250 right right is there is there um is there like a code of dress where you work not form[cutoff] sw2001B-ms98-a-0001 0.000000 2.655625 ok hi hi ahh sw2001B-ms98-a-0003 10.166375 22.813500 what do you have to wear well I work in ah corporate control so we have to dress kind of nice so I usually wear skirts and sweaters in the wintertime slacks I guess and in the summer just dresses sw2001B-ms98-a-0004 22.813500 26.946875 we can't even well we're not even really supposed to wear jeans very often sw2001B-ms98-a-0005 26.946875 34.811750 so and it really doesn't vary that much from season to season since the office is kind of you know always the same temperature right sw2001B-ms98-a-0006 34.811750 36.090750 right is there is there sw2001B-ms98-a-0008 38.555375 40.785875 where you worked not formally ri[cutoff]

These results aren't that great...which would help explain our poor WER on trained data (assuming that the individual(s) who generated the utterances for the full corpus used the same exact sox command with no options thrown in like I did)

UPDATE After receiving an email from Jonas, he mentioned the use of remix Usage: sox filein fileout start duration remix 1 0 or sox filein fileout start duration remix 0 1 (note: remix 1 0 will select the first channel -- A; remix 0 1 will select the second channel -- B)

I updated the script so that it can recognize whether to use remix 1 0 or remix 0 1 depending on the speaker (A or B).

UPDATE remix 1 0 and remix 0 1 are returning the same audio. Utterances for speaker A are fantastic but utterances for speaker B aren't. Not sure what's wrong here. Emailed Jonas about it.

UPDATE Got it working. For channel 1 its remix 1 and for channel 2, its remix 2. Not remix 1 0 or remix 0 1, respectively. Link that helped me https://www.nesono.com/node/275 I will note that speaker A is sometimes barely audible in the background (should be silent, of course) while speaker B is talking. And for speaker A, it's perfect (can't hear speaker B at all), at least for the 10 utterances I checked.


 * Plan:

Look into the different options you can give sox, see if I can get better results.

UPDATE Test the updated script to see if the audio matches up with the transcript.

UPDATE Await Jonas' reply.

UPDATE Finish the script by adding the logging. The transcript utterances aren't matching up well with the utterances generated by the createUtts.pl script from the conversation files in /mnt/main/corpus/switchboard/dist/flat UPDATE None, now.
 * Concerns:

UPDATE remix 1 0 and remix 0 1 return the same audio.

UPDATE None, now, again.

3/12/16

 * Tasks:

Get the log information working in the createUtts.pl script.


 * Results:

I got it working, after much trial and error. My main issue lied in outputting the log information for utt.log and conv.log. This is because I had to use `` <--backticks to get the output of my commands to go into a variable. The primary issue I had was the result having an extra character on the end. I'm thinking it was a new line character (that's what I expected), but after trying to use chomp and a regex I found (supposedly removes whitespace from the beginning and end) and having both of them not fix my issue, I wasn't sure anymore. But I found a solution after some thinking, a very simple solution. Get the length of the returned variable and set the returned variable equal to a substring of itself starting at index 0 and ending at length - 1. Can't believe I didn't think to do that first, but oh well. You live, you learn.

Log Output [jrs1036@caesar LogDirectory]$ more conv.log 2.655625 504.72475000000000000000 502.069125 11.561375 504.72475000000000000000 493.163375 22.813500 504.72475000000000000000 481.91125 21.312375 504.72475000000000000000 483.412375 26.946875 504.72475000000000000000 477.777875 34.811750 504.72475000000000000000 469.913 28.932750 504.72475000000000000000 475.792 40.123250 504.72475000000000000000 464.6015 36.090750 504.72475000000000000000 468.634 40.785875 504.72475000000000000000 463.938875 [jrs1036@caesar LogDirectory]$ more utt.log 2.655625 2.78362500000000000000 0.128 10.58375 10.71175000000000000000 0.128 12.647125 12.77512500000000000000 0.127999999999998 1.5075 1.63550000000000000000 0.128 4.133375 4.26137500000000000000 0.128000000000003 7.864875 7.99287500000000000000 0.127999999999995 1.57075 1.69875000000000000000 0.128 6.225 6.35300000000000000000 0.127999999999998 1.279 1.40700000000000000000 0.128000000000004 2.2305 2.35850000000000000000 0.128000000000001 [jrs1036@caesar LogDirectory]$ more train.log 0.000000 2.655625 2.655625 0.977625 11.561375 10.58375 10.166375 22.813500 12.647125 19.804875 21.312375 1.5075 22.813500 26.946875 4.133375 26.946875 34.811750 7.864875 27.362000 28.932750 1.57075 33.898250 40.123250 6.225 34.811750 36.090750 1.279 38.555375 40.785875 2.2305

Check with Jonas to make sure the logs I'm generating are of the right content and formatting.
 * Plans:

None
 * Concerns:

3/14/16

 * Task:

Meet up with Jon and Ryan to work on Capstone stuff. Document createUtts.pl script.


 * Results:


 * Documented createUtts.pl. See here: /Scripts Page/createUtts.pl

UPDATE


 * Created a new corpus (first_4hr) and ran the createUtts.pl script.

UPDATE
 * Jon started a train on our new corpus. Fingers crossed the WER is stellar.


 * Plan:


 * Make a new corpus.

UPDATE
 * Run a train with the new corpus

UPDATE Nothing significant, right now
 * Wait for train to finish and do a decode and score.
 * Concerns:

3/16/16

 * Task:

Modify createUtts.pl script and update wiki accordingly.


 * Results:


 * Changed name from createUtts.pl to genUttAudio.pl
 * Subtracted off headers for utts and convs
 * Added divide by 2 for convs
 * Added corpus.log
 * Polished code
 * Updated wiki to reflect changes


 * Plan:

Get together with Jon on Friday to create new corpora. Nothing significant, right now
 * Concerns:

3/18/16

 * Task:

Generate new audio utterances for the full directory by running the genUttAudio script.


 * Results:

I'm not 100% if it was successful. See below: [jrs1036@caesar trans]$ awk '{total += $3-$2} END {print total / 3600}' train.trans 312.15
 * The total number of hours in the full transcript

[jrs1036@caesar audio]$ ls -l ./utt-old/ | awk '{total += $5} END {print total / 8000 / 3600 / 4}' 316.319   [jrs1036@caesar audio]$ ls ./utt-old/ | wc -l 250587
 * The total number of hours in the utt-old in the full corpus

[jrs1036@caesar audio]$ ls -l ./utt/ | awk '{total += $5} END {print total / 8000 / 3600}' 320.662   [jrs1036@caesar audio]$ ls ./utt/ | wc -l 250330
 * The total number of hours in the utt in the full corpus

Close, but not the same.

[jrs1036@caesar log]$ ls   conv.log  corpus.log  trans.log  utt.log
 * Logs

[jrs1036@caesar log]$ grep - conv.log 4.814625 -.06400000000000000000 -4.878625   9.758875 -.06400000000000000000 -9.822875    19.984000 -.06400000000000000000 -20.048    .    .    .    .    281.813000 -.06400000000000000000 -281.877    295.436250 -.06400000000000000000 -295.50025    287.915125 -.06400000000000000000 -287.979125    295.976375 -.06400000000000000000 -296.040375    300.386000 -.06400000000000000000 -300.45    [jrs1036@caesar log]$ grep - conv.log | wc -l 256
 * Conv Log

This is potentially of concern, even though it's such a small number compared to the total number of utterances (250330).

Expected awk '{total += $1} END {print total / 3600}' utt.log 312.15
 * Utt Log

Actual [jrs1036@caesar log]$ awk '{total += $2} END {print total / 3600}' utt.log 311.752

The differences between the expected audio utterance length and the actual audio utterance length never exceeds 0.000000000001. [jrs1036@caesar log]$ awk '{if ($3 > 0.000000000001) print $3}' utt.log [jrs1036@caesar log]$

[jrs1036@caesar log]$ grep - trans.log [jrs1036@caesar log]$
 * Trans Log

No negative numbers. Good.

[jrs1036@caesar log]$ more corpus.log Date: Fri Mar 18 15:48:30 2016 EDT Generated utt sph files for: /mnt/main/corpus/switchboard/full Used transcript: .../full/train/trans/train.trans Created following logs: ---trans.log ---utt.log ---conv.log Finished processing, file count: 250330 Time of total audio in hours: 320.662
 * Corpus Log


 * Plan:

Check with Jonas about the results. If he checks off on them, we'll go on to create the new corpora based on the new audio utterances. A little concerned on the newly generated audio utterance files.
 * Concerns

3/20/16

 * Task:


 * Modify genUttAudio.pl script so that the logging is better.
 * Update the wiki after confirming the script is logging info better.


 * Results:


 * Successfully modified the script so that it has more informative logs.
 * Tested the script on a transcript containing 10 utterances and it looked successful. Right now, I've run the script on the full transcript in my home directory; however, I disabled the sox system calls so that sph files aren't generated.

UPDATE
 * The script was unsuccessful for the utt.log. This was because the act column in the utt.log file requires generated soh files, which were not produced since the Sox calls were disabled.
 * Reran the script on full with Sox enabled and it was successful
 * Copied over updated script to /mnt/main/scripts/user
 * Updated wiki for updated script


 * Plan:


 * Wait for the script to finish.
 * If successful, copy and replace the script in the /mnt/main/scripts/user directory. Then, update the wiki.

UPDATE
 * Rename utt to utt-old-2 and make new utt dir
 * Rename log to log-old and make new log dir
 * Rerun the script for real

Nothing significant, right now
 * Concerns:

3/21/16

 * Task:

Find out why the genUttAudio script didn't generate 256 utterances.


 * Results:

After looking at the conv.log and awking for only negative numbers (awk '{if ($5 < 0) print $1, $2, $3, $4, $5}' conv.log), I discovered that there were three conversation files (sw02289, sw04361, and sw04379) that were associated with the 256 utterances that weren't generated. I then decided to look and see if these conversation files existed in the /mnt/main/corpus/switchboard/dist/disk# directories. They should exist in disk3 and disk22 and discovered that these conversation files do not exist. This would explain why these 256 utterances weren't generated because the conversation files don't exist.

One feasible plan is to simply remove the utterances that rely on the non-existent conversation files from the full transcript, after making a copy, of course.

I've emailed Jonas, the Modeling group, and the Data group with this information and plan suggestion.


 * Plan:

Wait to hear back from Jonas. Nothing significant, right now
 * Concerns:

3/23/16

 * Task:

Find out if the missing conv files are in the switchboard disks.


 * Results:

Jon and I looked at the switchboard disks and we didn't find the missing conversations files in either the folder containing the audio files or the text file containing a list of all of the conversation files.

UPDATE: A couple members in the Systems group (Neil and another -- don't know this name), came forward and noted that they couldn't run trains, recently. Ben went to help them out to see if it was something they were doing wrong. They weren't doing anything wrong. I also ran another train to verify for myself and the training fails when you run the generateFeats script. Get a warning: WARNING: "wave2feat.c", line 682: Can't find byte format in header, setting to machine's endian.

It turns out the newly generated audio utterance files, as they are now, don't play nice with training. This was due to sox options not being specified (--bit, --encoding). Also, I don't believe the train we ran on the first_4hr corpus with the new audio utterance files actually used the new audio utterance files. If you look at the /wav directory of our 0283/009 experiment (using the first_4hr corpus), there are links to the utterances in /full/audio/utt/, which at this time included the old utt files, so we didn't actually test the new ones.

See the differences between sox info for the following utterances generated in different ways: Old Utt File sox info: [jrs1036@caesar utt-old]$ sox --i sw2001A-ms98-a-0002.sph Input File    : 'sw2001A-ms98-a-0002.sph' Channels      : 2 <-- This should actually be 1, don't want the speakers talking over each other Sample Rate   : 8000 Precision     : 16-bit Duration      : 00:00:10.62 = 85000 samples ~ 796.875 CDDA sectors File Size     : 341k Bit Rate      : 257k Sample Encoding: 16-bit Signed Integer PCM

New Utt File sox info: [jrs1036@caesar utt]$ sox --i sw2001A-ms98-a-0002.sph Input File    : 'sw2001A-ms98-a-0002.sph' Channels      : 1 <-- Better Sample Rate   : 8000 Precision     : 14-bit <-- Worse Duration      : 00:00:10.58 = 84670 samples ~ 793.781 CDDA sectors File Size     : 85.7k Bit Rate      : 64.8k Sample Encoding: 8-bit u-law <-- Worse

Notice the differences.

After Tom, Ben, and John investigated the differences between the header files in the utt-old sph files and the utt sph files in full, we found there was a discrepancy. Two parameters needed to be changed: bits and encoding. Tom found the necessary options for Sox and I made a test copy of genUttAudio.pl to test whether or not this change helps. To test, I created an extremely small corpus that contains only 10 utterances. We did this to quickly see if the generateFeats would fail. The script didn't error out this time and we feel... decent about this being a fix to our problem.

After adding --bits 16 --encoding signed-integer to the sox cmd: [jrs1036@caesar utt]$ sox --i sw2001A-ms98-a-0002.sph Input File    : 'sw2001A-ms98-a-0002.sph' Channels      : 1 <-- Good Sample Rate   : 8000 Precision     : 16-bit <-- Good now Duration      : 00:00:10.58 = 84670 samples ~ 793.781 CDDA sectors File Size     : 170k Bit Rate      : 129k Sample Encoding: 16-bit Signed Integer PCM <-- Good now


 * Plan:

Parse through the full transcript and copy over the utterances into a new transcript full file not containing the bad utterances.

UPDATE: In /full/train/audio/, make a new directory: new-utt. Call the modified genUttAudio.pl script and fill new-utt. Once done rename utt to utt-old-3 and rename new-utt to utt. Then run a 5 hour train to see if this problem resolves itself. If it does, Jon is very close to being done the makeCorpus script and the data group will be able to use that script to create the desired corpora very soon.

Nothing significant, right now
 * Concerns:

3/24/16

 * Task:


 * Update the genUttAudio script to include the --bits and --encoding, as well as add warning messages to the logs files.
 * Execute the script
 * Run a 5 hour train on audio utterances files


 * Results:


 * I updated the genUttAudio script with the proper sox command: sox inFile --bits 16 --encoding signed-integer outFile trim startTime duration remix (1 or 2), as well as proper logging.
 * Take a look at the updated genUttAudio script page: here.
 * I executed the script after making new directories in /full/train/audio/ and /full/info/. I added utt-new to hold the new utt files and added logs to hold the new log files.
 * The script finished successfully and I renamed utt to utt-old-3 and utt-new to utt.
 * Scored the decode: 45.8%. See the log: here


 * Plan:

If all goes well, I'll let the data group know they can make some new corpora.

UPDATE: It may have gone well. Jon's making a quick test corpus for 150 hours in his home directory Nothing significant, right now
 * Concerns:

3/25/16

 * Task:


 * Perform the decode on the 0283/014 experiment and score.
 * UPDATE: Download some utterance audio files associated with really poor scores and compare against transcript


 * Results:


 * Downloaded: sw2005B-ms98-a-0007, sw2005B-ms98-a-0008, sw2020A-ms98-a-0002, sw2020A-ms98-a-0004, sw2020A-ms98-a-0005
 * Compared them against the transcript and they all match up.
 * Just need to use better values in the train configuration file, I guess.


 * Plan:

Email the group about this. Nothing significant, right now
 * Concerns:

3/26/16

 * Task:

UPDATE: 1:55PM
 * Redo the decode in exp 0283/014 but on /mnt/main/corpus/switchboard/145hr/test/trans/train.trans
 * Make a copy of genUttAudio.pl and modify it so that sample_n_bytes is 4 and not 2.


 * Results:

UPDATE: 1:55PM UPDATE: 5:23PMbr> [jrs1036@caesar utt]$ sox --i sw2001A-ms98-a-0002.sph Input File    : 'sw2001A-ms98-a-0002.sph' Channels      : 1 Sample Rate   : 8000 Precision     : 32-bit Duration      : 00:00:10.58 = 84670 samples ~ 793.781 CDDA sectors File Size     : 340k Bit Rate      : 257k Sample Encoding: 32-bit Signed Integer PCM [jrs1036@caesar utt]$ more sw2001A-ms98-a-0002.sph NIST_1A 1024   sample_count -i 84670 sample_n_bytes -i 4 channel_count -i 1 sample_byte_format -s2 01 sample_rate -i 8000 sample_coding -s3 pcm end_head
 * I redid the decode and it ran and scored successfully. Details
 * I made a copy of genUttAudio.pl and modified it so that sample_n_bytes is 4 and not 2. Currently residing in my home folder.
 * I verified that my changes are good (4 for sample_n_bytes and 32 bit audio)
 * I executed the script
 * The utt folder now contains the new audio utterance files
 * Running a new train on the new audio utterance files


 * Plan:

UPDATE: 1:55PM
 * Email the group about this.
 * Wait for script to finish and then run a train on it.
 * Concerns:
 * Nothing significant, right now

3/27/16

 * Task:


 * Score the decode of Exp 0283/015


 * Results:


 * I scored the decode (WER: 47.1%). Results: here


 * Plan:


 * Send an email about this.
 * Concerns:
 * Nothing significant, right now

3/30/16

 * Task:


 * Update the genUttAudio script so that the logging is more flexible. The script will work for an audio file with 2 channels and 2 bytes per sample or 1 channel and 4 bytes per sample, etc. using soxi -D audioFile.
 * Start a new train using the newly created 300hr corpus. We are using the same values as we did from Experiment 0283/016. We are doing this experiment to determine if more data will result in a better WER, which it theoretically should.


 * Results:


 * I updated the genUttAudio script. See the updated script: here.
 * Experiment Log: here.


 * Plan:


 * Wait for the train to finish.
 * Concerns:
 * Nothing significant, right now

4/1/16

 * Task:


 * Decode experiment 0283/017.
 * UPDATE 1:
 * Since that failed, create new experiment 0283/018 (post Emacs incident era) that is based off the values in experiment 0283/015 (pre Emacs incident era). Then compare the two to see if training has been affected.


 * Results:


 * Attempted to decode experiment 0283/017 but was unsuccessful. It turned out that the sphinx_train.cfg file was copied from 0283/016 straight to 0283/017. Unfortunately, we were unaware that this would yield a path issue. Also, as a result from this, 0283/016 was overwritten.
 * UPDATE 1:
 * The train for experiment 0283/018 has been started.


 * Plan:


 * Wait for the train to finish.
 * Concerns:
 * Hoping that the results of 015 and 018 are the same.

4/2/16

 * Task:


 * Start the decode for Experiment 0283/018
 * Score the decode


 * Results:


 * I started the decode and it ran successfully
 * I scored the decode and the results indicate that the Emacs incident had no adverse effects on training.
 * See the experiment log for more details: link


 * Plan:

Email Jonas about this so that he knows training can continue without the results being affected by the Emacs incident. None, now.
 * Concerns:

4/4/16

 * Task:


 * Research my assigned parameter from the sphinx_train.cfg file
 * Update the decode tutorial


 * Results:


 * I did some light research into my assigned parameter and put what I found into our unified research document.
 * I updated the decode section of the model building tutorial to show how we should be generating the _decode.fileids file. Take a look here.


 * Plan:

None, now.
 * Have Jon double check my tutorial update to see if it's understandable enough.
 * Do a little more research into my assigned parameter
 * Concerns:

4/6/16

 * Task:
 * In class, start a new train for research purposes


 * Results:


 * In class, we decided to start a train that will act as a baseline for future research.
 * The train finished a couple hours later and I created the language model and started the decode. So far, the decode is looking like it's going to take 4.5 hours. So it's looking like to test out a new parameter, the total time to do so is looking like ~8 hours. Not bad.


 * Plan:

None, currently.
 * Wait for the decode to finish and then score it.
 * Concerns:

4/7/16

 * Task:
 * Start a new train (0288/005) to determine if a certain parameter makes an improvement to our testing baseline.


 * Results:

UPDATE 8:59 PM:
 * After deciding on the parameter value to change, I went through the process to start a train to determine the effectiveness of the parameter value change.
 * I've updated our log files for Experiment 0288/005.
 * I've started the decode and updated our log files


 * Plan:

UPDATE 8:59 PM: None, currently.
 * Wait for the train to finish, so I can decode and score it.
 * Wait for the decode to finish, so I can score
 * Concerns:

4/8/16

 * Task:
 * Score the 0288/005 decode


 * Results:


 * I scored the 0288/005 decode and the results marked an improvement in WER. Our future 300 train will likely use the this updated parameter value used in 0288/005.
 * I then updated our documentation files with the appropriate information.
 * Jon and I also discovered that, currently, performing decodes on unseen data is not possible because the mfc files aren't being generated for the dev.trans and eval.trans audio utterances. Right now, the only mfc files being generated are based off of the train.trans (seen data) audio utterances.


 * Plan:

Making an attractive poster for the URC event.
 * Work on a poster with the guys on Saturday.
 * Work on fixing the issue with decoding on unseen data.
 * Concerns:

4/9/16

 * Task:
 * Meet up with Jon and Ryan to make a poster.


 * Results:

awk '{print $1 ".mfc"}' 018_train.fileids | xargs ln -s -t ../feat/ /mnt/main/corpus/switchboard/full/train/audio/mfc/ Unfortunately, this doesn't work. The file id placed at the end due to the xargs cmd introduces a space immediately before, which creates improper links. awk '{print "ln -s /mnt/main/corpus/switchboard/full/train/audio/mfc/" $1 ".mfc ../feat/" $1 ".mfc;"}' 018_decode.fileids | csh
 * I met up with my modeling buddies and we pumped out a poster. It actually ended up being simpler than I thought it would be. Creating a poster with PowerPoint is pretty straightforward.
 * Also, Jon and I scored our latest 300hr train and we were happy with the result.
 * I also tried coming up with a UNIX one-liner to create links in a directory and have them point to another directory.
 * After emailing Jonas, he came back with a UNIX one-liner that would work.
 * However, Jon has a script that does this, but it was still cool to know that this could be done in a neat UNIX one-liner.


 * Plan:


 * We will be starting up another 300hr train soon with parameter values that we've learned offer an improvement to WER.
 * Concerns:
 * None, currently.

4/11/16

 * Task:
 * Start the decode for 0288/009
 * Start the decode for 0288/010


 * Results:

UPDATE 10:39 AM: UPDATE 5:59 PM:
 * Started the decode for 0288/009 and updated the appropriate log
 * Started the decode for 0288/010 and updated the appropriate log
 * Jon scored the other two decodes
 * Configured our cfg file for 0288/011 (300 hr train) and started it


 * Plan:

UPDATE 10:39 AM: UPDATE 5:59 PM:
 * Score the decode when 0288/009 finishes
 * Wait for 0288/010 to finish training, then start the decode
 * Wait for 0288/010 to finish decoding, then score once it finishes
 * Compare the model files in 0283/015 and 0283/018 just to triple check that the emacs incident had no impact on training.
 * Concerns:
 * None, currently.

4/13/16

 * Task:
 * Meet with team
 * Decide whether or not to do a trade
 * Plan for the week ahead


 * Results:


 * We decided not to trade. We feel confident with the team we have. Also, it seems that the other team is confident as well, since they opted to not trade either. Let the competition formally commence.
 * So we had a constructive team meeting. All of us got on the same page and we divvied up different parameters from the sphinx_train.cfg file and CMU Sphinx site to research into.
 * Also, Jonas showed Jon and I how to use a mixture of UNIX commands and a foreach loop to generate the appropriate reference scoring transcript. However, an easier way is available. Since Jon ran an experiment on the full corpus to generate all of the feats, by default of running prepareTrainExperiment, a reference scoring transcript was created. This reference scoring transcript has everything in it. So for future decodes, the easiest way to score is to simply copy over this file to whatever experiment we want to do an unseen decode and score on.


 * Plan:

None, currently.
 * Decode the 0288/002 Experiment on unseen data (dev.trans). Hoping for a WER in the 40s.
 * Concerns:

4/14/16

 * Task:
 * Score the unseen decode for Experiment 0288/002.


 * Results:


 * I scored the unseen decode for Experiment 0288/002 and our result was quite positive. I decided to then email Jonas about it. After emailing back and forth with him, he initially said if our result was achieved by building the language model using train.trans, it would be considered a cheating result. However, he later said that he had it backwards. So here is the non-cheating way: train on train.trans and decode on another transcript. And here is the cheating way: train on train.trans and decode on train.trans. Whatever you decode, preferably, you want to decode on unseen data.


 * Plan:

None, currently.
 * Continue to do unseen decodes on our testing experiments. Unseen is what matters. Seen isn't particularly useful.
 * Concerns:

4/15/16

 * Task:
 * Score 0288/007
 * Start an unseen decode on 0288/009


 * Results:


 * I scored 0288/007. It looks like the parameter value we tested had a negative effect on WER. We will definitely go back to the previous value that was used.
 * I started the unseen decode on 0288/009. Once it ended, I scored it. Again, this other parameter value had a negative impact on WER. We will stick to the previous value for future.
 * I started the unseen decode on 0288/010. Once it ended, I scored it. This parameter value also had a negative impact on WER.


 * Plan:


 * Wait for 0288/011 to finish training.
 * Concerns:
 * None, currently.

4/16/16

 * Task:
 * Score 0288/010
 * Score 0288/012
 * Start an unseen decode on 0288/011


 * Results:


 * I scored 0288/010 and 0288/012.
 * I also started an unseen decode on our third big train 0288/011.
 * I also started a new train 0288/013 testing a parameter value change.


 * Plan:


 * Wait for 0288/011 to finish decoding.
 * Wait for 0288/013 to finish training.
 * Concerns:
 * None, currently.

4/17/16

 * Task:
 * Start unseen decode on 0288/013 once it finishes training.
 * Score 013 when it finishes decoding.
 * Score 011 when it finishes decoding.


 * Results:


 * I started an unseen decode on 0288/013.
 * After 0288/013 and 0288/011 finished decoding, I scored them. The results from both were an improvement over previous experiments but only by a small margin.


 * Plan:


 * Continue testing different parameter values
 * Concerns:
 * None, currently.

4/20/16

 * Task:
 * Do speech research


 * Results:

awk '{print substr($0, index($0, $4))}' train.trans >> /path/to/file UPDATE (3:52 PM): UPDATE (10:43 PM): UPDATE (11:30 PM):
 * Research led me to find an awk command that prints out only the utterances in the transcript file
 * Started a train on miraculix and asterix testing new parameter values. The research document Jon found indicates by switching them to optimal values, a decrease in WER should occur.
 * The above trains failed due to a missing binary. After finding the missing binary, I ended up restarting these two trains again.
 * First train finished and I started the decode on it. Second train failed testing the other parameter. There seems to be an issue with a file not being generated and then a script later on needs to open this file that wasn't generated. Sent an email to the team to see if they can help.


 * Plan:

UPDATE (10:43 PM):
 * Continue researching
 * Wait for the trains to finish training and decode and score
 * Concerns:
 * None, currently.

4/21/16

 * Task:
 * Run trains testing the information Jon found in the research document


 * Results:


 * I scored the first train ran last night and it showed a slight improvement, which is good.
 * I also started a new train testing another parameter value that was shown in the research document Jon found.


 * Plan:


 * Continue testing
 * Concerns:
 * None, currently.

4/22/16

 * Task:
 * Figure out how to setup the sphinx_train.cfg file for a certain type of training


 * Results:


 * After following along with the information provided in the research document and a sphinx tutorial, I started a train with hopefully all of the correct parameter values set.
 * The train succeeded; however, the type of training I wanted to do didn't occur due to me not setting a certain parameter value that wasn't shown in the research document or sphinx tutorial. I ended up finding a useful link that showed me which parameter value needed to be added and what value to set it at.
 * I ran the train again; however, the train stopped because Miraculix was unmounted. Another one of my teammates needed to unmount it to do some software installations and I thought that the train would keep running; however, I was incorrect in that assumption.
 * Once Miraculix was mounted again, I ran the train again.


 * Plan:


 * Wait for train to finish in the morning.
 * Concerns:
 * None, currently.

4/23/16

 * Task:
 * Testing out a parameter value to see if training time can be decreased


 * Results:


 * Started the train
 * Train finished -- training was much faster
 * Created the LM -- creating the LM and starting the decode failed on first attempt on miraculix -- some of the binaries and libs aren't being seen -- switched to asterix to create the LM and start the decode since it's currently not in use
 * Started the decode -- hoping the decode verifies the more quickly generated train
 * The score was only 0.1% worse -- fair trade-off for the speed, particularly when testing -- once a good configuration is achieved, we can rerun more slowly and expect a very slightly better result
 * Started 020 over again -- fingers crossed this time -- regarding miraculix not working, that was simply because miraculix wasn't properly remounted


 * Plan:


 * Wait for 020 to finish and hope it works
 * Concerns:
 * None, currently.

4/27/16

 * Task:


 * Discuss with Jonas the idea of combining both teams to further our research and quest to achieve a lower WER.
 * Start a train using MMIE. Also, copy over the modules 60, 62, and 65 from link.


 * Results:


 * Multiple trains that I've run trying to get MMIE training have not gotten to the MMIE training part. I found out this is due to three scripts folders not being in scripts_pl/ as mentioned in the Tasks above. With these script folders, hopefully MMIE training will work. Also, RunAll.pl had to be modified so that these scripts are called. Started a train with the appropriate script folders.


 * Plan:


 * Wait for train to finish.
 * Concerns:
 * The train might fail.

4/28/16

 * Task:


 * Once the train finishes, decode and score.


 * Results:


 * The train finished and it failed on module 60.lattice_generation (no point in decoding and scoring). From the error logs, it's hard to tell what actually caused the train to fail. Honestly, to get all of these options in training to work, a clean and complete re-install of sphinx is likely necessary.


 * Plan:


 * Since I'm going to a programming competition in NY, I'm just gonna keep up to date with emails.
 * Concerns:
 * None, currently.

5/1/16

 * Task:


 * Read up on all of the emails I've missed, since I've been busy and away at a programming competition/conference.


 * Results:


 * I read through all of the emails. So it looks like we will be using a perl module created by Jon that will be used in the decoding process. The module is called decode_config.pm and, from what I can tell, it contains arguments for the sphinx3_decode binary. The idea is that you modify this module to reflect the arguments and values you want included for the decode. This is what the sphinx_decode.cfg was meant for, except the sphinx_decode.cfg was never used due to various reasons that I can't remember on the spot. Regardless, decode_config.pm will be supplanting sphinx_decode.cfg.


 * Plan:


 * Check out the decode_config.pm file on Caesar and get familiar with it and how it would be used in a decode.
 * Concerns:
 * None, currently.

5/3/16

 * Task:


 * Check out the decode_config.pm file on Caesar


 * Results:


 * I checked out the read me text file for decode_config.pm and the decode_config.pm itself. They're well put together and straightforward. I also checked out the altered run_decode.pl script that now uses the decode_config.pm file. The code changes to implement the decode_config.pm file were pretty simple and easy to understand.


 * Plan:


 * Unknown
 * Concerns:
 * None, currently.

5/4/16

 * Task:


 * Write the introduction for the modeling group for the final report.


 * Results:


 * I wrote a first draft of the introduction for the modeling group for the final report. Here is a link to the section I wrote: introduction.


 * Plan:


 * Help out on the competition report, if necessary. Have my fellow members of the modeling group verify my introduction.
 * Concerns:
 * None, currently.

5/8/16

 * Task:


 * Check to see how important the s tags are in the _train.trans file.


 * Results:


 * I performed two related experiments under .../0283/ and they are 019 and 020. I have full details on my finding in there. However, I give a brief summary of what I found here.
 * For experiment 019:
 * Part 1: There were two parts to this one. The first revolved around modifying the 019_train.trans file to remove the s tags and then running a train to see if it fails or produces altered acoustic models. In short, the training succeeded, and after decoding and scoring, I found that it produced slightly worse results than experiment 020 where the same train was run with the exception being that the 020_train.trans file was unaltered.
 * Part 2: I scored 0288/011 again but after stripping away the s tags so that a more accurate score was performed. The original score with the s tags included was 41.8% and the score after removing the s tags was 48.4%. This is a 6.6% increase in WER . . . not so great, but at least it's a real score and not a cheating one.
 * For experiment 020:
 * I ran a train with the same configuration values as 019; however, with this one, I didn't modify the 020_train.trans file. The decode and score produced better results than 019.


 * Conclusions:
 * The s tags are definitely needed for training. If you compare the appropriate values found in the experiment logs, you will see that the WERs achieved in 020 were better than those achieved in 019. In addition, I tried running an experiment without the _train.trans file out of curiosity and it failed immediately, so the file itself is absolutely necessary to have.


 * Plan:


 * Unknown
 * Concerns:
 * None, currently.