Speech:Spring 2013 Joseph Gallagher Log


 * Home
 * Semesters
 * Spring 2013
 * Proposal
 * Report

Week Ending February 5th, 2013
- Familiarize the myself with the setup of the experiments folders and read the logs and information from previous groups
 * Task:

- SSH into Caesar

1/31/2013 - Tried using Putty to SSH into Caesar but seemed to have some 64 bit conflicts. I instead opted to use a Linux live CD, Fedora 17, and used the SSH function through Nautilus file system. I was successful in logging in using the root log in. Briefly browsed through the file system through Linux which seems to be far easier than using a command line only.
 * Results:

2/1/2013 - Purchased some additional reading for referencing PERL and scripting. Also, installed Fedora 17 to dual boot on my system because using a live CD seemed counter productive. Had to partition my hard drive by removing the recovery drive (HP) to allow space for dual boot.

- Use some reference material for learning some scripting tips and using PERL. - Figure out the easiest way to SSH into Caesar
 * Plan:

- Retaining the knowledge I learn as I go!
 * Concerns:

Week Ending February 12, 2013
- Continue to understand tasks for running experiments and the structure of Caesar - Use reference material to learn more about scripting
 * Task:

2/7 - Communicate with Justin on my situation, try to setup a conference session with Skype. Read some previous logs from last semester
 * Results:

2/11 - Spent a few hours learning more shell commands in Linux, started the reference material on PERL scripts

2/12 - Read into reference material regarding PERL, attempted to view some of the scripts on Caesar but had no access

- Make progress with reference materials and make sure the group understands the scope of what tasks are needed for the project - Keeping focused on the project tasks and not getting lost in learning more than is necessary to accomplish our part of the project
 * Plan:
 * Concerns:

Week Ending February 19, 2013
Continue to work on and understand the purpose of the experiment group Download and install necessary software to get a better grasp of what each component does and how it pertains to the groups
 * Task:

2/14/13 - Began downloading Sphinx software and installing into Fedora 17. Ran into some issues because of some resources that needed to be installed. Used the logs created for installation but had to interpret what worked for openSUSE into what would work with Fedora 17. This took the better part of the evening but I still have to locate some resources to finish.
 * Results:

2/15/13 - Communicated with Justin and Prof Jonas to setup a meeting to further understand the group purpose and how we can efficiently continue on task to accomplish group goals. Finished locating the necessary items to install Sphinx but have not progressed further.

2/17/13 - Read more logs from previous class to try and understand their methods of progressing through the project.

2/18/13 - Having trouble with getting Sphinx installed properly which again relates back to my ability to interpret items using Fedora instead of VM with openSUSE. I want to keep with it because using Linux over a VM with openSUSE is going to be faster, although I am not opposed to trying it. There is a vast amount of help available online so I should be able to get it straightened out shortly. Also having problems logging into caesar again (permission denied).

Keep on tasks and clarify the role of the experiment group Communicate more with Justin and understand what his capabilities are outside of class (download Sphinx, openSUSE) Getting enough time this week to sit down with Prof Jonas
 * Plan:
 * Concerns:

Week Ending February 26, 2013
Clarify plan of action for the group and submit a proposal
 * Task:

Start reading through the scripts that are currently part of the experiment and train process

2/21/2013 - Met with Prof Jonas and with Justin in order to get a better course of action moving forward for our sub group. Clarified some questions about how the role of our group would benefit the project as a whole. Justin and I then set a plan for looking over the scripts being used currently and adapting them to be more efficient for the modeling group.
 * Results:

2/22/2013 - Looked at the logs of the student(s) who were previously handling the file format and file capture via scripts for the experiment group in order to understand challenges they faced and any issues they felt could be possibly fixed in future classes. Looking at these logs and then considering the current plan for our group I drafted a proposal with input from Justin. After some editing and communication, we submitted the proposal.

2/25/2013 - Have started to read through the scripts in order to see what we will be dealing with in learning and modifying the lines of code. Since programming languages have common ground there are some parts that I can see what is happening but there are also areas where Perl commands that I am not familiar with get me confused. It does help to have reference materials available while reading through the scripts, but being able to piece together more efficient scripts will be challenging.

2/26/2013 - Spent some more time in the logs of spring 12 and classmates to see where some of the groups are on their tasks. Specifically spent more time in the createTranscript.pl and copySph.pl to see if there can be some automation in place of arguments. Still have to reference back to Perl materials to make sure I know what I am reading but it is easier(quicker) to go over using the links than going through in vi. Looked over some of the issues seen by the modeling group, Eric dictates his efforts pretty well so it helps to see where his and the group efforts are going and where the experiment group can assist.

Learn as much as possible about the current scripts and understand how they can be condensed to run more efficiently for the modeling group. Justin and I also agreed that we both need to enhance our knowledge of PERL so we do not cause more problem than solution.
 * Plan:

Learning PERL enough to make a positive impact on the project
 * Concerns:

Week Ending March 5, 2013

 * Task:

Experiment with editing the createTranscript.pl and copySph.pl scripts into one script by combining tasks

3/3 - Read logs - Saw that Eric from modeling group created updateDict.pl to speed up the process and have read through the script. Helps to see what tasks are being accomplished in the script.
 * Results:

3/4 - Read more logs today as well as more materials on using shell, made some connections with alternate ways to speed up some experiment running other than using .pl but still need to determine if it is possible to connect the scripts as a step by step execution in shell.

3/5 - Looking through the create script it is not clear if it can be modified to take input (automatically) i.e. getting argument parameters from previous experiment files or not. It seems (as is described in the instructions) that it is not always necessary to run the create script but when it is used it takes large parameters. It might be much more easy to use if we created some aliases for the specific file locations but need to find out if its possible to use them in root or if we should load them specifically to the folders where they will be used. We could also just add them as environment variables as a another potential speed up.


 * Plan:

Trial and error
 * Concerns:

Understanding if we can pull data from previous files to help speed up the scripts

Week Ending March 12, 2013

 * Task:


 * Results:


 * Plan:


 * Concerns:

Week Ending March 26, 2013

 * Task:
 * Read and understand the steps for setting up and an experiment and running a train


 * Experiment logs:
 * 0039

- 3/23/2013 Went back into Cedric Woodbury's steps for experiment and train setup. During class on 3/20/2013 it was concluded that a defined set of instructions needed to be revamped because of some conflicting instructions between both Cedric's steps and the previous modeling group from Sp12. Although Cedric's proved to be the better set, there was still some clean up needed to be more concise in what to do. Went over available logs from experiments performing tiny trains and mini trains to see what to expect.
 * Results:

- 3/24/2013 Saw that the experiment setup and train instructions have been edited, they are much clearer now! Went through the directions up to starting a train as most of the setup part was confusing prior to it being edited by modeling group. Instructions are easy enough to follow, it is good to possible outcomes due to errors so that care is taken when performing the steps.

- 3/25/2013 Read logs of class members, it appears a lot of frustration started from the conflicting instructions and experiment setup. Being a member of the experiment group, the feed back in logs helped point out where a lot of the flaws were in the structure of the experiments. I think that the wiki was really disorganized when it came to this because a few different groups had attempted to rectify the process but did not colocate the revised instructions. Started reading the Exp logs from SP13 members who had tried to run a train. There are a few errors that I think have been addressed within the modeling group but happen to occur on different machines because SoX is apparently missing from them.

- 3/26/2013 Claimed an Exp number for myself (0039) to get started and be ready for the group meeting during class 3/27. Plan on getting the folder structure setup so I can get a mini train done prior to to class because both Justin and I share the same machine and running dual trains will cause issues.

Setup the experiment without issues. Started the train and received this error:

-Phase 7: TRANSCRIPT - Checking that all the phones in the transcript are in the phonelist, and all phones in the phonelist appear at least once Something failed: (/mnt/main/Exp/0039/scripts_pl/00.verify/verify_all.pl) Going read more into it and start an Exp log. Update - realized this was the dictionary issue that is common when running trains. Need to update the missing words which there seems to be quite a few. Get a mini train attempted to weed out any potential problems and be ready to ask questions during the next meeting
 * Plan:


 * Concerns:

Week Ending April 2, 2013

 * Task:
 * Run a successful train and decode


 * Experiment logs:
 * 0039
 * 0057

- 3/27/2013 During the group meeting I was able to continue my troubleshooting of Exp 0039. The dictionary issue was fixed and I was able to use updateDict.pl in order to add the missing words to my pruned dictionary. After getting the updated dictionary made and regenerating the Phones list I was able to start a train. After about 5-8 minutes the train threw a few errors listed below:
 * Results:

MODULE: 45 Prune Trees Phase 1: Tree Pruning FATAL: "main.c", line 167: Unable to open /mnt/main/Exp/0039/trees/0039.unpruned/D-1.dtree for reading; No such file or directory MODULE: 50 Training Context dependent models Phase 1: Cleaning up directories: accumulator...logs...qmanager... Phase 2: Copy CI to CD initialize Phase 3: Forward-Backward Baum welch starting for 1 Gaussian(s), iteration: 1 (1 of 1) 0% FATAL_ERROR: "main.c", line 1054: initialization failed

Failed to start bw Only 0 parts of 1 of Baum Welch were successfully completed Parts 1 failed to run! Training failed in iteration 1 Something failed: (/mnt/main/Exp/0039/scripts_pl/50.cd_hmm_tied/slave_convg.pl)

I also created an experiment log for 0039 where I list the occurrences of the errors and my attempts to troubleshoot them. The group is meeting tomorrow (3/28) to go over and hopefully resolve some of the errors we are seeing so we can move to the decode portion of the experiment.

-3/28/2013 Had a group meeting today to continue going over train and decode process. Attempted to figure out what was causing the errors we were seeing previously. I tried running my train for 0039 after a few small modifications to the dtree section but still returned the same errors. Looking further into the Baum Welch errors did not reveal any possible fixes so there needs to be further discussion regarding the errors. I also attempted to run my train on Traubadix to see if Majestix had a setup issue but I received the same errors. Mike and Justin are running Mini evals with some success so I will try this as well.

-3/29/2013 Moved to running a Mini eval instead of a Mini train, have the experiment setup and appropriate files created. I started the train with no issues and it completed a short while after initiation with no fatal errors. Success! After completion I then ran the decode and scoring with some issues but was able to troubleshoot through them. More details of the experiment are in the Exp log 0057.

-4/2/2013 Read some group logs to see where other people were in their experiment trials. It appears that the errors our group was running into might be fixed so I'll have to verify this and make another attempt at running a mini train. Also attempted to log into SPEAK but couldn't and I am not sure if I forgot my password or not so I'll have to talk to the SPEAK guys about figuring that out so I can add my experiment information.


 * Plan:
 * Keep working with the group to move forward with train and decode so we are ready for the next stage


 * Concerns:
 * Getting the errors figured out

Week Ending April 9, 2013

 * Task:
 * Create a new corpus
 * Work on getting a 5 hour train going
 * Experiment logs:
 * 0077

-4/3/2013 Group updated during class, seems we have all been able to create and run a mini eval with success along with scoring the experiment which was recorded previously. After we combined into larger groups, Mike, Justin and myself attempted to create a new corpus for the first 5 hours of the transcript. We were running into issues with permission errors with so we tried using root to create a transcript. After a few attempts we were able to get it created but had problems with the copySph.pl script which were actually documented by Cedric but did not help us solve the issue. We tried to start again from scratch but learned that group A had created a 5 transcript successfully.
 * Results:

-4/4/2013 Attempted to create a new corpus for the mini train I ran using exp 0039 to see if that would make any difference from the errors I was seeing before. I have heard that there is an issue with the lexical stress values not being properly added to the during the dictionary creation but I have not heard anything further. I used a random value to grab data for the transcript at 1 hour into the transcript and grabbed 10 minutes worth. The transcript created successfully into an alternate folder so as not to replace the current mini train trans. I again ran into the copySph.pl issue I had experienced during the group meeting. I'll have to bring this up again unless I see it has been fixed or troubleshot.

-4/8/2013 Read through logs to see what kind of progress is being made with moving forward with our 5 hour corpus creating and train attempt. Communications seem to be a bit sparse but there is still some communication taking place. Logged into caesar to see what the experiment folder looks like and there is quite a few exp folders but not as many logs going. As far as I can tell it is mainly people getting themselves caught up with running trains. I am not clear if we have a Exp folder for the 5 hour train yet so I'll have to ask.

-4/9/2013 Explored the first_5hr folder in the corpus switchboard directory, I noticed there was both a train.trans file and .sph files in the wav folder. Looking into logs I saw that group BC had progressed further in their train session from some modifications and and alternate steps to getting the the dictionary updated. Seeing this I made some effort in getting some progress made for group AD and went so far as to create 0077 as a start to the 5 hr train. Again using, logs I was able to go so far as to generate a list of missing words which we can distribute among the group in the same manner as group BC did. This will get the group more involved and also help us get to the next steps in finishing the AM and creating the LM. Since there is currently 111 words to be added I won't be progressing the experiment any further until the group meets.


 * Plan:
 * Keep in contact with group to figure out the progress of the copri being created
 * Clarify any questions regarding the corpus creation process


 * Concerns:
 * Getting a 5 hour corpus setup successfully (had issues in first attempts)

Week Ending April 16, 2013

 * Task:
 * Continue to work with the group to get the error rate down on the train and decode
 * Keep up to date with group efforts and communication

-4/10/2013 Group meet to get everyone on the same page. Understand that Tyler ran a 5hr train on his own and was able to complete it without errors. The downside was the high error rate which is believed to be an issue with certain text that are not actual words but just emotional reactions/recordings not being properly removed or altered. Groups need to move forward to create a script to get this fixed using real expressions.
 * Results:

-4/13/2013 Group communication took place in order to get a meet time set. Read logs that were available to see progress on getting a transcript generated.

-4/16/2013 Read logs again to catch up with progress from group AD. I can see where there has been some significant effort in getting the train to run with a new script but it seems there are still some bugs to work on. I was able to watch some of the Google hangout that I missed because of complications that occurred on Monday (thanks for making the video!) so I am in tune with the steps taken so far and some of the items to be worked on.


 * Concerns:
 * Working out a good script that will take care of the issue with forward slashes and hyphens in the transcripts

Week Ending April 23, 2013

 * Task:
 * Work on posters for URC
 * Continue to work on the 5 hour train for group AD and lower the score

- 4/17/2013 Poster work during class for the modeling group. Talked about the design and most important factors to be conveyed to anyone viewing the poster. Used previous Speech group posters to get a good idea of what message they wanted to convey and came up with our own.
 * Results:

- 4/18/2013 Made a couple small revisions to the URC poster (background, alignments, content). It seems for the most part we were able to get it done during the class time and it just needs a couple teaks to be ready. Also read through the Abstract that was put together.

- 4/21/2013 Looked through the additional words that need to be defined for the train. Tyler had these setup on a google doc that everybody looked through to help pitch in. Hopefully we can get a better score this time around! I believe he will be running the next train for the group.

- 4/23/2013 Read through the recent experiment logs to see where AD and BC were at, seems as though BC may have had a better error rate from their previous run but who knows if that is just because we have a higher rate of words not being recognized in the train and decode. It seems we still have some of the same results as the previous weeks' scoring so it is not clear on whether we made some type of progress or not.


 * Plan:
 * Work on poster elements
 * Stay on track with group train


 * Concerns:
 * Getting the error rate lower than BC!

Week Ending April 30, 2013

 * Task:
 * Begin work on the final report
 * Review previous reports for references

- 4/24/2013 Group AD had a last ditch effort to improve our word error after determining that our current score was based on 5 hours and not a 30 minute test on train. After some deliberation and a second train it was decided that group AD would be writing the final report! The final report sections were assigned amongst the team and we began the collaborative effort of creating a document with google docs. My section is the Experiment directory.
 * Results:

- 4/28/2013 Checked to see if group BC had any progress with proceeding on running their train and/or increasing the time trained, it dies not appear that much has occurred since class on 4/24. Also began reading and trying to understand the script move_to_expdir.pl to see if it can be altered to perform copy tasks in the experiment directory folder instead of its current purpose of copying over train files. More to come...

- 4/29/2013 Read class logs and also reviewed the previous class report to get an idea of what my portion of the report should look like and/or discuss. THe report is fairly thorough though I think some parts are some what over done because of their redundancies within portions of the wiki. The wiki needs more work overall, and I know a lot of people have mention it already but maybe before the semesters over we'll do a few more changes.

- 4/30/2013 Worked on my portion of the final report for the experiment directory section. ALso read through what other group members have written so far to understand where everyone's head is at in regards to pulling together this lengthy report. SO far it looks fairly decent thought there might be some editing to reduce some unnecessary content. I'm sure the group will come together and discuss editing during class.


 * Plan:
 * Work on the Report with the group via Google docs
 * Keep up with other members contributions


 * Concerns:
 * Keeping the report condensed and trimmed so as not to over do it and make people fall asleep while reading it!

Week Ending May 7, 2013

 * Task:
 * Continue work on editing/revising the final report

- 5/1/2013 Group meeting to discuss more additions to the final report. Discussed the focus of tying in the semester goal with Charlie's introduction and being able to get a conclusion or explanation of results from Eric and group BC after they complete further work on the 5 hour train.
 * Results:

- 5/5/2013 Looked over the final report additions, changes or edits that had been done to date. Minor grammatical error corrections.

- 5/7/2013 Wrote up some content for the semester goal section for the final report. ALso made some corrections to the current Experiment directory setup section that were suggested by other group members.


 * Plan:
 * Keep working on the document with group


 * Concerns:
 * None