Speech:Spring 2015 Kayla Mackiewicz Log


 * Home
 * Semesters
 * Spring 2015
 * Proposal
 * Report
 * Information - General Project Information
 * Experiments - List of speech experiments

Week Ending February 3, 2015

 * Task:

1/31 - My goal for tonight is to read logs, specifically past Tools group logs since that is the group I will be working in.

2/1 - Read logs. Read up on the current software which is being used by Caesar on the Speech System Information page. This page is helpful to me because it compares the current software with the newest versions. I will do some research to determine if there are even newer versions of the software that have come out, since that list was probably created last Spring and there may be changes.

2/2 - Read logs.

2/3 - Read some more logs. Did some research on software and discovered there are a few new updates, which I will elaborate on later.

2/3 - Discovered new software updates that do not match the 'Newer Versions' under the software information page. That was to be expected, since the software list was updated last Spring.
 * Results:

To hopefully get a better understanding of the project and the software that is used. Also, to work on developing the proposal.
 * Plan:
 * Concerns:

Week Ending February 10, 2015

 * Task:

2/7 - Continue to read more logs as well as practice doing a SSH on my own into Caesar/Asterix based on notes taken during class on Wednesday. As a group, we have divided a couple tasks to be worked on over the weekend and before our next meeting:


 * Me - Continue gathering information and work on writing the proposal
 * Ken - Install a VM to be used as the group machine for testing software tools and running experiments
 * Refik - Research the possibility of implementing a new software tool called Emacs, which is an extensible, customizable text editor, per professor Jonas' request
 * Nathaniel - Research compatibility of software tools with RedHat version 6.6 32-bit OS

We have decided to install a VM only to Ken's personal computer because, based on reading past logs, it seems to take up a few weeks of time to get a VM up and running properly. It makes more sense for all of us to collaborate on one PC to test and run tools and experiments and not have all of us get caught up with issues trying to install a VM. This way, time can be better spent tackling other obstacles.

2/8 - Continue to gather information to put together the rough draft of the proposal tonight and over the course of the next couple of days.

2/9 - Read logs to see what the other teams have accomplished/been up to so far with this project. I was emailing with Ken and he had mentioned that he could not find a version of RedHat that was free for him to download to his computer. I believe that, instead, we will be sticking with openSUSE.

2/10 - Continued on with reading logs and simply researching. I am looking forward to our group meeting tomorrow that way we can set up a main course of action. I am a bit curious about how the tasks will be divided up going forward since we have four people in our group, which seems like a lot just for working on software tools since last semester's Tools group only had two people.
 * Results:


 * Plan:

Since we cannot find a free version of RedHat for testing on a VM, my main concern is that we won't have easy access to test compatibility of the software tools with this particular OS. Perhaps we will be able to run tests and experiments on a drone machine, once the Systems group has installed RedHat onto the machines in the server room.
 * Concerns:

Week Ending February 17, 2015
2/12 - Finished writing up the proposal. My next step is to read the documentation on how to run a train, since Nate and I will be working together to run a small 1-2 hour train on an existing experiment on Obelix hopefully by next week. The purpose for our "training" is to learn how to test the efficiency of the software tools and be able to compare the older versions with the newer ones. Then, by the end, we will have an idea if doing a system-wide upgrading of the software tools would be beneficial or not.
 * Task:
 * Note: The Tools group has ditched the idea of installing a virtual machine, since it is too time consuming. Hence why we will be accessing Obelix for training and analyzing software tools.

2/15 - Got in touch with Melissa from the Systems group and she was able to successfully give our group access to Obelix.

2/12 - The proposal has been completed and sent to the proposal group.
 * Results:

2/15 - I was able to successfully log into Obelix from home using PuTTY: [kml2227@obelix ~]$ uname -a Linux obelix 2.6.32-504.el6.i686 #1 SMP Tue Sep 16 01:56:19 EDT 2014 i686 i686 i       386 GNU/Linux [kml2227@obelix ~]$ whoami kml2227
 * First logged into Caesar
 * SSH'd into Obelix with my username
 * Never saw an 'Obelix Welcomes you!' message, like Caesar displays, so used the command 'uname -a'. This command prints all the system information, which let me know I was in fact logged into Obelix successfully.
 * Used the 'whoami' command to make sure I was logged in as the correct user.
 * Successfully changed my password on Obelix using the 'passwd' command
 * Poked around the files on Obelix a little bit to get familiar.

2/16 - Caesar was down tonight which prevented logging in earlier, but I just received word that it is back up and running again. Unfortunately it is pretty late at this point so I will have to work more with Obelix tomorrow or Wednesday.

2/17 - Read logs to see what other groups have been up to, as well as my own group.
 * Plan:

2/12 - Before we can even begin training, we need to wait until we have access to Obelix. The Systems group will be creating accounts for us, I'm just hoping that we will be able to attempt training by next week so we can make progress.
 * Concerns:

Week Ending February 24, 2015
2/18 - Caught up with the rest of my group today and worked a little bit with Nate, who created a new experiment (number 0265) and ran a train. We looked into completing the next step, which is to create the Language Model, but we didn't get too far with it unfortunately.
 * Task:

2/21 - Revised the proposal according to Professor Jonas' edits on the rough draft. I also changed up a couple of passages that were a bit vague, so that overall it is a better proposal and makes more sense. I have sent my revisions to the other members of my group so that they can make any other changes before giving it to the Proposal group tomorrow.

2/23 - Checking in to read some logs.

2/24 - Since I have had very little experience with Linux, I am looking into the process of downloading and installing software in a Linux-based system. I have found a couple different web sites that do a good job of explaining so I'm looking forward to our group meeting tomorrow. Hopefully if everything goes well we will be able to work with updating some software tools tomorrow and/or work on some trains. It doesn't appear to be a complicated process, but who knows what will happen or if we will run into any issues.


 * Results:

2/18 - The plan as of right now is to revise the proposal and get it back to the Proposal group within the next couple of days.
 * Plan:
 * Concerns:

Week Ending March 3, 2015

 * Task:

3/1 - Read through articles/literature PDF's. Write a summary for the Wiki.

3/2 - Look into creating a sub experiment under 0265 for our meeting this week.


 * Results:

3/3 - Logged in over the weekend multiple times, but I was very bogged down with schoolwork unfortunately so I didn't get a chance to record much. I was able to read through some software literature PDF's that Ken had found. Nate and I have split the list of PDF's in half so we can get through them all faster, then we will document them on the Wiki with links to the PDF files.
 * Plan:

3/3 - I am concerned because I was so busy over these last few days and over the weekend that I wasn't able to fully accomplish my goals for the week. Thankfully this weekend I will have some more time to focus.
 * Concerns:

Week Ending March 10, 2015

 * Task:

3/4 - Created a new informational page called "Speech Recognition Related Readings" on the Wiki for references. This page will contain links to literary articles that go into detail about speech recognition and the tools involved.

3/8 - Got in touch with David Meehan, one of the Tech Consultants, who updated the Wiki's upload file options to contain PDFs. Now, instead of uploading each individual page of a PDF file as a PNG or other image type, hopefully the entire PDF will be able to be uploaded as such and in one sitting. Tomorrow I am going to upload some related speech readings to test this out.

3/9 - Uploaded a few different academic readings under under the Speech Information page, with brief descriptions. The papers included are titled "Mathifier – Speech Recognition of Math Equations", "CMU Sphinx4 Speech Recognizer in a Service-oriented Computing Style", and "Learning-Based Auditory Encoding for Robust Speech Recognition". The descriptions at this point are a sentence long and are a bit vague, so we may need to actually go into more detail.

3/10 - Did some reading to be prepared for the speech bootcamp that is set to begin tomorrow. Unfortunately, there were a couple issues with our group's original experiment 0265 and we attempted to remove the contents of the directory in order to possibly redo it or restart it. However this seemed to have caused some issues and we would need to create a whole new experiment to continue, which we aren't supposed to be doing at this. So I am now looking forward to our meeting tomorrow where I can learn more about the process and actually be able to run a train myself.


 * Results:


 * Plan:


 * Concerns:

Week Ending March 24, 2015
3/21 - Tried to gain access to the server, but it seems that it is currently down. There isn't really much else that can be done until it is back up and running again, unfortunately.
 * Task:

3/22 - Attempted to log on to the server again tonight, but it is still down. Will try again tomorrow to see if it is back up and running again.

3/23 - Made attempt number #3 at trying to log into Caesar.

3/24 - As of 8:00 pm, I was not able to access Caesar.

3/21 - No results.
 * Results:

3/22 - No results.

3/23 - This time around, it appeared as though I was able to get through, which is weird because I know that the school seems to be having networking issues. When I was connected, I received an error from PuTTY that said that there was a "potential security breach". Apparently the host key for Caesar that was cached did not match the host key to which I was connecting to. I had the option to either:
 * Cancel connection
 * Continue with the connection, but not store the new host key
 * Continue with the connection and store the new host key

I chose to continue on with connecting to Caesar, without storing the new host key, to see what would happen and I logged in with my username. However, I chose not to do anything with it once I was logged in because I wasn't quite sure what was going on with the server and, reading other people's logs, I knew that everyone else was having connectivity issues as well. So, I decided it might be best to wait until the networking conflict is completely resolved and we are all on the same page.

3/24 - No results.


 * Plan:

My concern is that the server has been down for quite a while, probably because of the move from the 400 building to Pandora. The school is having some network issues, and after reading other people's logs I know that this is currently being investigated. Hopefully this gets figured out soon, since I have not had the chance to run a train yet and I feel like this is putting the groups behind in making any progress. After reading Sam's log, he says he will be going over the training process tomorrow in our group meeting, which should be good since I know I am not the only one who hasn't run a train.
 * Concerns:

Week Ending March 31, 2015

 * Task:

3/25 - We, the Bruins group, had our first group meeting and I think it went very well. We divided up tasks to be completed before the next meeting and Sam went over the process of running a train, for those of us who are new to it. For me, my tasks to complete before our next meeting are as follows:


 * Run first 5hr train under an existing experiment - number 0265 which is the Tools group master experiment directory
 * Run a 256 hr train

I am hoping the training process goes smoothly, but am not too worried about it because I think Sam did a great job explaining it to us!

3/29 - Created a sub experiment under the 0265 experiment directory in which I began running a 5hr train.

3/30 - Could not connect to Caesar tonight, even through cisunix (which was working before).

3/31 - I could not access Caesar again this morning, so I decided to email Melissa about the issues I was having. Luckily, she was able to fix the problem and I can now access Caesar. My goal is to check up on how my 5hr train went, then hopefully finish it up so that I move on to the 256hr train.


 * Results:

3/25 - No results.

3/29 - Tomorrow I will need to finish up the training process on the 5hr train. Then, the plan will be to continue on with another train for 256hr.
 * Note: One issue I ran into when creating this exeriment was when I tried to run the createWiki_Sub_Experiment.pl script, it said 'Permission denied'. I am curious as to why that is happening.
 * Update 3/30 - Talked to Morgan and Melissa, and it is most likely not the script, but an issue with the server permissions. The systems group is working later tonight to fix the issue.

3/30 - When I attempt to SSH into Caesar with my username from cisunix, it tells me 'Permission denied'.

3/31 - The 5hr train was run successfully. I was able to create the language model, but when I began the decode process, I ran into an issue:

[kml2227@caesar DECODE]$ nohup run_decode5.pl 003 0265/003 1000 run_decode5.pl: Command not found.

I understand there have been issues over the past few days with running some scripts, including this one. Not really sure why this is happening, but it is preventing me from finishing the process.


 * Update as of 10:24 pm - Sam gave me some instructions on how to deal with the script error and get the decode to run. I got the following results back:

SYSTEM SUMMARY PERCENTAGES by SPEAKER

,-.     |                            hyp.trans                            | |-|     | SPKR    | # Snt # Wrd | Corr    Sub    Del    Ins    Err  S.Err | |-+-+-|     | sw2001b |   18    163 | 77.9   17.8    4.3   40.5   62.6  100.0 | |-+-+-|     | sw2001a |   14    101 | 85.1   12.9    2.0   55.4   70.3  100.0 | |-+-+-|     | sw2005a |   39    701 | 82.5   11.6    6.0   13.7   31.2   94.9 | |-+-+-|     | sw2005b |   67    613 | 60.2   27.2   12.6   29.4   69.2  100.0 | |-+-+-|     | sw2006b |    6     69 | 79.7   10.1   10.1   11.6   31.9  100.0 | |-+-+-|     | sw2006a |    9    233 | 87.1   10.7    2.1    6.0   18.9  100.0 | |=================================================================|     | Sum/Avg |  153   1880 | 75.4   17.1    7.4   22.3   46.9   98.7 | |=================================================================|     |  Mean   | 25.5  313.3 | 78.8   15.1    6.2   26.1   47.3   99.1 | | S.D.   | 23.4  273.5 |  9.7    6.6    4.3   19.2   22.6    2.1 | | Median | 16.0  198.0 | 81.1   12.2    5.1   21.5   47.2  100.0 | `-'


 * Plan:


 * Concerns:

Week Ending April 7, 2015
4/4 - Read some logs to see what everyone has been busy working on the past few days. Our group has been in touch over email about what is to be done before next Wed, since a bunch of us had to leave the group meeting early for a job fair this week.
 * Task:

4/5 - My intention for today was to run a train with some alterations to the settings. Since I have so far only run one train, I am not quite sure how or where to make these alterations, so I have been in contact with other members from my group. Sam will be sharing some useful information about how to make these changes.

4/6 - Tonight's task was to run a simple 5hr train with some modifications to a few of the settings (which have been recorded secretly, just in case). As I am writing this, the training process is still underway so I will need to create the LM and decode and then report back a little later with the results.

4/7 - My train that I started last night was run successfully.

4/6 - 4/7 Here are the results of my train and decode:
 * Results:

SYSTEM SUMMARY PERCENTAGES by SPEAKER

,-.     |                            hyp.trans                            | |-|     | SPKR    | # Snt # Wrd | Corr    Sub    Del    Ins    Err  S.Err | |-+-+-|     | sw2001b |   18    163 | 95.1    4.9    0.0    4.3    9.2   44.4 | |-+-+-|     | sw2001a |   14    101 | 98.0    2.0    0.0   20.8   22.8   64.3 | |-+-+-|     | sw2005a |   39    701 | 84.7    3.3   12.0    2.4   17.7   61.5 | |-+-+-|     | sw2005b |   67    613 | 87.6    4.1    8.3    9.5   21.9   65.7 | |-+-+-|     | sw2006b |   29    618 | 74.1    5.8   20.1    2.8   28.6   86.2 | |-+-+-|     | sw2006a |   33    455 | 85.3    4.6   10.1    3.1   17.8   54.5 | |-+-+-|     | sw2007a |   31    287 | 90.2    4.5    5.2    4.2   13.9   41.9 | |-+-+-|     | sw2007b |   29    472 | 87.1    7.4    5.5    3.2   16.1   82.8 | |=================================================================|     | Sum/Avg |  260   3410 | 85.1    4.8   10.1    4.7   19.6   63.5 | |=================================================================|     |  Mean   | 32.5  426.3 | 87.8    4.6    7.7    6.3   18.5   62.7 | | S.D.   | 16.1  221.9 |  7.2    1.6    6.6    6.3    5.9   16.0 | | Median | 30.0  463.5 | 87.3    4.6    6.9    3.7   17.7   62.9 | `-' Successful Completion


 * Plan:


 * Concerns:

Week Ending April 14, 2015
4/11 - Read logs; discussed some potential improvements to our group's course of action when it comes to achieving the best baseline.
 * Task:

4/12 - After reading through Professor Jonas' email about changes made to 125 hour corpus, I want to attempt to run a 125 hour train to see if it would work. The 125 hour train has been having issues and was "broken" for a time, but Professor Jonas has made some fixes.

4/13 - Talked with Sam and Morgan about the issues I was having with the 125 hour train. Sam thinks that it may be causing errors because in the 125 hour corpus, the name of the trans file is 125_train.trans and it should be train.trans to be consistent.

4/14 - Received an update that 125 hour trains seem to be working now, so I will retry my intended 125 hour experiment and see how it goes.

4/12 - Created a new sub experiment for running a 125 hour train. I ran the prepareExperiment command for 125hr_3170/train and it seems as though everything worked fine.. But there were a couple issues, and I am not sure if they are normal.
 * Results:

One of the issues read: sed: can't read etc/002_train.trans: No such file or directory. Processing 0 words against dictionary... Added 0 files to add.txt

I can't remember what this looked like in the other trains I have ran, if this is a normal response or not.. So I am wondering if I should go on with the LM and decode anyway? I may wait and email my group in the morning about this, just so I am not wasting my time and server space on a train that is dysfunctional.

4/14 - 125 hour train seems to be running well and with no issues this time around! Generating the feats data is the current step I have been on for quite some time.


 * Plan:


 * Concerns:

Week Ending April 21, 2015

 * Task:

4/18 - Continued on with decoding my 125 hour train I started a few days back.

4/19 - Checked up on my decode for the 125 hour train and it is currently still running. It is on hour 16. Also, I had been tasked with finding a means to create secret Wiki pages. I have found a way but have run into some obstacles that I needed to email Professor Jonas about. Although there may only be a few weeks left in the semester, this could be an overall beneficial find if we can figure it out.

4/20 - Went back to check on my decode again and found that the process had completed. Starting another decode process on it tonight.

4/21 - Started a second decode process on my train, but it is still running (on 1 day and 5 hours right now). Heard back from Professor Jonas and we are planning on discussing how to approach the private wiki pages further, hopefully tomorrow.

4/18 - No results to report as of yet!
 * Results:

4/20 - Scored the result of this first decode and the result was about 59% WER.

SYSTEM SUMMARY PERCENTAGES by SPEAKER

,-.     |                            hyp.trans                            | |-|     | SPKR    | # Snt # Wrd | Corr    Sub    Del    Ins    Err  S.Err | |-+-+-|     | sw2001a |   25    473 | 64.5   32.3    3.2   23.9   59.4  100.0 | |=================================================================|     | Sum/Avg |   25    473 | 64.5   32.3    3.2   23.9   59.4  100.0 | |=================================================================|     |  Mean   | 25.0  473.0 | 64.5   32.3    3.2   23.9   59.4  100.0 | | S.D.   |  0.0    0.0 |  0.0    0.0    0.0    0.0    0.0    0.0 | | Median | 25.0  473.0 | 64.5   32.3    3.2   23.9   59.4  100.0 | `-'


 * Plan:


 * Concerns:

Week Ending April 28, 2015

 * Task:

4/25 - Read logs.

4/26 - Read logs. Combed through past group emails from Sam and Zach and others to compile a list of how to change configurations within Sphinx. Added these to a word document to be eventually uploaded to Wiki.

4/27 - Took a peek at the scoring log from my 125 hour train, because I had forgotten all about the decode I had started last week. The result I received for WER (just after the first decode I did) was 32.9%. I believe I did this with the command head -7000 for the first 7000 audio files.. So Zach has explained to me how to decode for a second time around starting from where I left off, so I hope that I can do it correctly.

4/28 - Created the "Altering Configurations During Train and Decode" page.


 * Results:


 * Plan:


 * Concerns:

Week Ending May 5, 2015

 * Task:

5/2 - Scored my 125 hour train now that I have decoded the train in its entirety. I was having issues using the head command, so I was not decoding properly the first couple times around.

5/2 - The resulting WER after scoring my 125 hour train was an even 34%
 * Results:


 * Plan:


 * Concerns: