Speech:Spring 2015 Stephen Griffin Log


 * Home
 * Semesters
 * Spring 2015
 * Proposal
 * Report
 * Information - General Project Information
 * Experiments - List of speech experiments

Week Ending February 3, 2015
Gain a better understanding of data team requirements
 * Task:

'''01/29/15 Reviewed previous Data team logs, including Jared Rohrdanz' log.  This appears to be a well kept log that I will use for reference in the future. I have also been learning the wiki system, as it is something I have not utilized before. Editing seems to be fairly straightforward and done completely with plain text formatting.
 * Results:

'''01/31/15 Installed Putty terminal program, and got familiarized with the usage of the program. Also reviewed Sphinx documentation further. Considered setting up a virtual machine and installing Sphinx, but I will wait to see if we get our system login information soon. If not, I will set up a local environment so as to practice running trains and other tasks as soon as possible.

'''02/01/15 Emailed the data team in an effort to coordinate a meeting for tomorrow evening to discuss our proposal. I also continued to review semester logs, including the other teams for this semester. I noticed that Mohamed was able to log in to Caesar, and I have emailed him regarding this. He is on the systems team, so he may have early access to the system. Once I have my log in information I will begin familiarizing myself with the server and will try to run my first train.

'''02/02/2015 Group met via hangouts today to discuss proposal work. I drafted up some initial proposal documentation, and Krista has as well. We are going to combine and refine what we have, as well as piece together our timelines for each group member. We are scheduled to meet again tomorrow to do a final proposal review and then send it off to the documentation team.

'''Rough Draft of initial proposal:

This semester, the Data group has several tasks to fulfill, including:

Becoming proficient in running experiments and trains: All members must be capable of these basic tasks that form the basis of all research conducted in the speech project. This process will begin as soon as we are given access to the Caesar server.

Organization of files and elimination of redundancies: Speech data is in need of organization, particularly the collection of .wav files. We will attempt to make sure the data is efficiently structured and organized, and create a collection of "soft links" to existing data for ease of access. The soft links will allow for a cleaner and more robust organization system.

Documentation Creation: We will work to fully and comprehensively document the organization and new structure of data storage that is to be implemented, so that other teams can quickly take advantage of the structural changes. Any additional functionality that we develop in the form of scripts or other other types of tools will be documented fully as well.

We expect other objectives to come to light as we begin actual work on the project, and will adjust our weekly plans based on our discoveries when working with the speech system.

'''02/03/2015 Revised draft of proposal and developed individual timeline. Met online with group members to finalize document before it is sent off to the documentation team today.

'''Revised Draft of Proposal (with individual timeline) Introduction

The Data group’s primary objective is to deal with the following types of data pertaining to the speech project: •   Transcripts •   Word Alignment •   Audio files (.wav)

Our main objective this semester is to clean up the audio files and provide much needed documentation for where the files are located. We will make sure the data is efficiently structured and organized, so that they are all centrally located. We will also go through previous experiments, creating “soft links” to the new location of the .wav files. This will assist in eliminating duplicate data. The soft links will allow for a cleaner and more robust organization system that will benefit every team assigned to the project.

The Data group also plans to become proficient in running experiments and trains: All members must be capable of these basic tasks that form the basis of all research conducted in the speech project. This process will begin as soon as we are given access to the Caesar server.

Implementation Plan

This semester the Data group fully intends on being diligent with the documentation of the audio file cleanup and data structure implementation. We will work to fully and comprehensively document the organization and new structure of data storage that is to be implemented, so that other teams can quickly take advantage of the structural changes. Any additional functionality that we develop in the form of scripts or other types of tools will be documented fully as well. Existing documentation will be reviewed and clarified if necessary.

Experiments and trains will be run by each member to gain a better understanding of the system as a whole, and how we can best assist the other teams in achieving their objectives. We will maintain open communication with the other teams as the project evolves, so that we can provide the best data support possible for all those who are currently working on the speech project.

Estimated Timeline

Stephen

'''Week Ending Feb 3rd Research past Data team logs and Sphinx documentation to become familiar with the speech project system Work with the team to develop initial proposal documentation

'''Week Ending Feb 10th Gain access to the Caesar server and become familiarized with current project file structure Attempt to run first train and experiments

'''Week Ending Feb 17th Review data files and structure with the team, and begin making appropriate changes to the system Document any and all procedures and changes

'''Week Ending Feb 24th Continue with data organization based on additional feedback from other teams and the Data team

'''Week Ending March 3rd Complete any further work necessary with the Data team in preparation for scheduled team reassignment

Produce proposal, gain access to Caesar, become familiarized with the system as a whole.
 * Plan:

No server log in information available yet, as far as I know. Making inquiries to gain access.
 * Concerns:

Week Ending February 10, 2015

 * Task:


 * Results:

'''02/04/2015 Managed to log in to Caesar as well as Asterix with new account. SSH'd into server and set up SSH pubkey. Examined directory structures.

Tasks: 1. ssh on port 22 to caesar.unh.edu 2. generated ssh key with ssh-keygen 3. create link to public key called via command: "ln -s id_rsa.pub authorized keys"

'''02/07/2015

Today I logged in to Caesar from home, and continued to further examine the files in the various directories, including the contents of the different experiment and train components in order to get a feel for how these files are constructed. The dictionary (.dic) files seemed a little overwhelming, and while I read that the files are supposed to have one word element per line, the files that I examined had no such formatting. I will try with a different editor next time, as I suspect it may not be an issue with the file itself.

I reviewed everyone's logs to see where the different teams are at in their progress. I saw that Mohamed actually attempted to run a train, but it seemed to have failed for some reason. I would like to attempt to run one of the sample experiments at some point this week, to further my understanding of the system as a whole. At the moment it is a little overwhelming, but the more I read through the documentation and the files, the more comfortable I am becoming with the material.

I also decided to scrap Putty as my terminal after seeing the in class demonstration of the windows SSH secure shell client. Being able to issue commands at the terminal level, while having a windows style GUI experience of the file structure is very helpful. I have not been able to find something quite like it yet for linux, which I generally prefer to use over Windows for tasks such as this. I look forward to doing a lot of real work with linux this semester; even though I have a pretty good grasp of it as I use it for work on a daily basis, there is so much to it and I still have a lot to learn about the nuances of the command shell.

'''02/08/2015

Today we communicated as a team to coordinate our findings presentation for our next class. We were unable to find a definitive time today that everyone could meet, so we have rescheduled for tomorrow afternoon so that everyone may be present.

I have continued to review additional log entries from the rest of the class, and further explored the caesar.unh.edu system. I had initially wanted to run a train this week, but after reviewing the logs and seeing the difficulty that Mohamed has has so far, I will wait until the modeling team has worked out any issues that might be preventing successful train runs.

I have reviewed some of the existing perl scripts, as well as documentation on general perl scripting. It is fairly straightforward, and should not be difficult to use if necessary. Once we have identified specific tasks that we may want to complete, we will look into whether we can automate or speed up the process via scripts as opposed to executing such tasks manually.

Regarding perl, the most important command that will probably come in handy for us is the system statement, which allows you to run shell commands and act upon the results. A basic outline of how it works is available at Perl HowTo, and includes the following snippet:

Using system:

system executes the command specified. It doesn't capture the output of the command.

system accepts as argument either a scalar or an array. If the argument is a scalar, system uses a shell to execute the command ("/bin/sh -c command"); if the argument is an array it executes the      command directly, considering the first element of the array as the command name and the remaining array elements as arguments to the command to be executed.

For that reason, it's highly recommended for efficiency and safety reasons (specially if you're running a cgi script) that you use an array to pass arguments to system

Example:

#-- calling 'command' with arguments system("command arg1 arg2 arg3"); #-- better way of calling the same command system("command", "arg1", "arg2", "arg3");

The return value is set in $?; this value is the exit status of the command as returned by the 'wait' call; to get the real exit status of the command you have to shift right by 8 the value of $? ($? >> 8).

If the value of $? is -1, then the command failed to execute, in that case you may check the value of $! for the reason of the failure.

Example:

system("command", "arg1"); if ( $? == -1 ) {    print "command failed: $!\n"; }  else {    printf "command exited with value %d", $? >> 8;  }

'''02/10/2015

The Data group met over hangouts this evening to discuss our progress and our upcoming goals. One of our main tasks involves creating soft links for the .wav files contained in the corpora, to be used in the existing and upcoming experiments. We discussed the possibility of automating this task with a perl script, and how to document the process. Beyond this main task, we are still a little unclear as a team on what else we should be responsible for in the overall project scope. We will meet before class tomorrow to further discuss the matter, and hopefully discuss possible additional tasks to complete with the project lead in class tomorrow.

Beyond this, additional file structure examination and review of different file contents continued in order to better understand the system as a whole.


 * Plan:


 * Concerns:

Week Ending February 17, 2015
Begin eliminating .wave file redundancies
 * Task:


 * Results:

'''02/11/2015

Our group met before our class today to go over our progress for the week and to prepare information that was presented to the class. We also worked after our class meeting to revise our individual proposal timeline schedules, to be more in line with the proposal requirements discussed in class. We continued to investigate the file/directory structure, in preparation for our upcoming tasks this week, which will include reducing the amount of redundant .wav data contained on the servers. We also plan to create an experiment this week, and a set of sub-experiments where we can each being practicing running experiments/trains. The group has typically met on hangouts a few days before the next class session, and we will most likely do the same this week on Monday or Tuesday.

'''02/14/2015 Read through the class logs to see how progress is coming this week. Dakota has started work on our tasks for this week, and has a detailed description of his discoveries in his log entries. I will be using them as a guide for my own work this week. I also hope to create a new experiment in the next day or two. Will consult with the team about doing a hangout probably Monday or Tuesday night to coordinate our efforts before our next class.

'''02/16/2015

Today I have gone ahead and created a new experiment in the project Wiki, as well as a new directory to store the experiment on Caesar. The purpose of this experiment, is to have a place where all members of the Data team can set up individual experiments. Each team member will be able to create a child experiment in order to learn how to run trains, without cluttering the experiment directory structure.

Steps Taken:

Added new experiment to main Wiki page: *0263 Data Group Experiment Test

Filled in experiment data at Experiment page for #0263

Logged in to Caesar, and navigated to  mnt/main/Exp

Created new directory with command: mkdir 0263

From here, I will coordinate with the team and see how we would like to set up our individual experiments. We will hold an online meeting either tonight or tomorrow evening to discuss the particulars of the week's progress.

I have also reviewed the rest of the class's logs to see where everyone is at. Dakota had an especially good log entry this week, and he began making progress on our task of ensuring that soft linking data of existing files is up to date and correct. Next, we will have to figure out the best way to reduce .wav file redundancy to clear unnecessary files that are taking up valuable space on the server.

Some of the helpful commands used this week include:

ls -1 | wc -l : Return count of files in directory

mkdir: create new directory

'''02/17/2015

The Data team met via Hangouts this evening to discuss our progress this week. We have inventoried the various tasks we must accomplish, and assigned specific items to each team member so that everyone has a definitive task to work on for the time being. We are not sure how long some of them may take, and obviously some may take much longer than others. If one member manages to finish up before the others, we have agreed to have that team member assist one of the other members with one of the other tasks at hand.

Tasks have been tentatively divided as follows: Krista: Begin eliminating .wav file redundancies in the file structure

Dakota: Continue with soft linking of data files

Russ: Work with the Switchboard corpus and attempt to determine the total amount of hours of data that it currently contains

Stephen: Establish initial experiment and successfully run a train. Document process, and then show other team members how to perform the same tasks

The Data team will meet before class tomorrow to discuss any further progress and prepare for class progress presentations.

Existing logs from other teams have been reviewed as well to keep updated on overall project progress.

Online meeting .wav file redundancy check Initial experiment creation
 * Plan:


 * Concerns:

Week Ending February 24, 2015

 * Task:


 * Results:

'''02/19/2015

Reviewed logs and documentation regarding setting up experiments and running trains. Will be attempting first train this week.

'''02/21/2015

Today marked the Data team's first initial train run for our experiment, #0263. I spent time researching the process by reading other semester logs, the logs of those who have already ran trains this semester, as well as the Wiki instructions page. It would probably be helpful if there was a guide in the Wiki that was essentially a guide from start to finish on running your first train. It was a good amount of effort to parse out the relevant information from past semester logs and the wiki from a first time experimentation point of view.

Everyone seemed to follow the same initial first steps, which was a good template to get started with the train. I chose the default 5 hour train that was referenced in the Wiki to keep things simple. I have outlined the steps that I have taken, and they are as follows:

'''Steps:

1. Created child experiment in Wiki

2. Created child experiment under /mnt/main/Exp/0263

cd /mnt/main/Exp/0263

mkdir 001

3. Create directory structure

/mnt/main/scripts/user/prepareExperiment3.pl first_5hr/train

4. Generate feats

/mnt/main/scripts/user/generateFeats2.pl

5. Run train command

nohup scripts_pl/RunAll.pl. &

The train continues to run with no errors as of yet. I will try to keep an eye on things through the day, and will check for completion this evening or tomorrow morning. I will record the results, and share them with the rest of the Data team when we do a remote meeting in the coming days. I will also be reaching out to the Modeling team upon completion, to attempt to learn more about altering train configurations and running more advanced experiments that will produce meaningful data.

'''02/22/2015

Today I ran the additional steps after the initial train run yesterday, in order to generate a scoring log result. At the end of the train run, I encountered the following output:

Baum-Welch gaussians 8 iteration 4 Average log-likelihood 5.66991046069966 Training for 8 Gaussian(s) completed after 5 iterations MODULE: 90 deleted interpolation Skipped for continuous models MODULE: 99 Convert to Sphinx2 format models Can not create models used by Sphinx-II. If you intend to create models to use with Sphinx-II models, please rerun with: $ST::CFG_HMM_TYPE = '.semi.' or  $ST::CFG_HMM_TYPE = '.cont' and $ST::CFG_FEATURE = '1s_12c_12d_3p_12dd' and $ST::CFG_STATESPERHMM = '5'

After researching the error received, it looked like the process still completed successfully (according to: Sphinx Forum).

I continued with the procedure and completed the following steps to generate the language model:

Create LM direcory

mkdir LM

Enter directory for setup

cd LM

Copy transcript from corpus directory

cp -i /mnt/main/corpus/switchboard/first_5hr/train/trans/train.trans trans_unedited

Prepared the transcript

/mnt/main/corpus/switchboard/dist/transcripts/ICSI_Transcriptions/trans/icsi/ParseTranscript.perl trans_unedited trans_parsed

Copied the script that creates the language model.

cp -i /mnt/main/scripts/user/lm_create.pl.

Executed the script

./lm_create.pl trans_parsed

After this process was completed, I moved on to the decoding process. Working through the process, I followed the wiki guide to complete the following steps:

Decode:

Create directory

mkdir DECODE

Enter directory for setup

cd DECODE

Copied decode_2 script into directory

cp -i /mnt/main/scripts/user/run_decode2.pl.

Ran Decode:

nohup run_decode2.pl 001 001 1000

Transformed decode log file to hyp.trans

/mnt/main/scripts/user/parseDecode.pl decode.log ../etc/hyp.trans

Switched to etc directory

cd ../etc

Ran SCLite

sclite -r 001_train.trans -h hyp.trans -i swb >> scoring.log

Instead of getting a proper scoring log, I simply got the error: Segmentation Fault

Further investigation is needed to determine what has gone wrong with the process.

'''02/23/2015

Met on Google Hangouts with the Data team to discuss the week's progress and where we stand for next class. Shared my results related to running a train, and several team members plan to attempt one for themselves before our next class. Will compare results with my own outcome to try to figure out what exactly went wrong during the last step of my decode.

Also reviewed other classmate logs.


 * Plan:


 * Concerns:

Week Ending March 3, 2015

 * Task:

Run a successful train

Successfully complete file soft linking


 * Results:

'''

'''02/26/2014

After consulting with Sam from the modeling team in class yesterday, I discovered that I was not the only one having issues running a successful decode. Today, I attempted to retrace my steps, and ran through the decoding process of my existing experiment once again, to try to achieve a proper result. The steps taken were as follow:

Ran Decode:

nohup run_decode2.pl 001 001 1000

Transformed decode log file to hyp.trans

/mnt/main/scripts/user/parseDecode.pl decode.log ../etc/hyp.trans

Switched to etc directory

cd ../etc

Ran SCLite

sclite -r 001_train.trans -h hyp.trans -i swb >> scoring.log

Unfortunately, the same error has persisted, and I received a notice of a segmentation fault once again. I have dispatched an email to Sam regarding the issue, and hope to work with him this week to resolve the issue.

Also coordinated with the Data team today via emails on the various tasks being completed at the moment. They have made great progress on the soft linking portion of our task list.

'''02/27/2015

Waiting to hear back from the modeling team before attempting to rerun train and/or decode. Reviewed logs in the meantime. Other members of the data team have been actively working on the soft linking of .wav files, and it appears the task is just about entirely complete.

'''03/01/2015

After speaking with Sam from the modeling team via email, I attempted to run a new train experiment to see if I could get a better result than my first attempt.

Everything seemed to go well, until I reached the decode portion of the experiment.

I ram the following command: nohup run_decode2.pl 005 005 1000

These the default settings for the decode and the experiment number I am using. The process took only a second, which concerned me as when I spoke to Sam he said this part should be taking around an hour to complete. I noticed a decode.log file was generated, and analyzed the contents using the nano utility:

nano decode.log

The log ended with the following information:

INFO:  Reading HMM in Sphinx 3 Model format INFO:  Model Definition File: (null) INFO:  Mean File: (null) INFO:  Variance File: (null) INFO:  Mixture Weight File: (null) INFO:  Transition Matrices File: (null) FATAL_ERROR: "mdef.c", line 680: No mdef-file

I will contact Sam regarding the fatal error, as I am not sure what has happened here. Due to this, I am unable to continue the experiment at this moment, but will try again once I get some feedback on the process.

'''03/02/2015

Spoke to Sam over the weekend regarding the problems I was having. He gave a few suggestions, including running run_decode5.pl as opposed to run_decode2.pl, and I was able to get the experiment to actively decode. I have not had a chance to review the results as of yet, but will do so this week. The Data group is set to meet tomorrow night via Hangouts to go over progress for the week before our next class meeting.

Also, reviewed other classmate's logs to see how things are going with the other teams.


 * Plan:


 * Concerns:

Week Ending March 10, 2015

 * Task:


 * Results:

'''03/05/2015

Yesterday before and after our class meeting, the Data group worked with Sam to iron out various issues with training, and data in the switchboard corpus. Before the meeting, I was able to successfully decode the results of my second experiment. A few minor adjustments had to be made in order to generate a proper decode result.

Instead of running:

cp -i /mnt/main/scripts/user/run_decode2.pl.

nohup run_decode2.pl 001 001 1000

A different script was used, with slightly different parameters:

nohup run_decode5.pl 005 0263/005 1000 919 18:32   cp -i /mnt/main/scripts/user/run_decode5.pl.

The rest of the steps remained the same. In the end, the final output was achieved:

SYSTEM SUMMARY PERCENTAGES by SPEAKER

,-.     |                            hyp.trans                            | |-|     |-+-+-|      | sw2062a |   40    755 | 75.1   17.1    7.8    8.3   33.2   92.5 | |-+-+-|     | sw2062b |   71   1483 | 77.1   14.2    8.8    7.8   30.7   98.6 | |=================================================================|     | Sum/Avg | 4659  69209 | 69.2   22.8    8.0   13.6   44.4   95.7 | |=================================================================|     |  Mean   | 58.2  865.1 | 69.1   23.1    7.8   14.8   45.7   96.1 | | S.D.   | 22.1  334.6 |  7.3    5.7    2.6    6.8   10.9    4.8 | | Median | 55.5  817.5 | 69.9   21.8    7.3   14.0   44.5   97.6 | `-'

I will continue to assist the group with anything they may need to get their initial trains functioning correctly. Have also reviewed logs, but not much has been written as it is early in the week. I will monitor log outputs as the week goes on.

'''03/07/2015

Reviewed class logs, especially the contents of Sam's discoveries from last week. I may attempt to run a different train this week, and perhaps alter the default values in order to see what sort of effect that may have on the accuracy results.

'''03/09/2015

Reviewed class logs again. Will reach out to the Data team tonight to schedule an online meeting for tomorrow evening. May run another experiment tomorrow for further preparation before the scheduled "boot camp" begins.

'''03/10/2015

The Data team met via hangouts this evening to discuss the week's progress. It seems we have basically achieved everything we have set out to accomplish in our timeline goals. The data is cleaned up and more readily usable, as well as correctly soft linked. I began setting up an additional experiment to run with the following procedure:

Steps Taken:

Added new child experiment to main Wiki page: *0263 Data Group Experiment Test

Filled in experiment data at Experiment page for #0263

Logged in to Caesar, and navigated to  mnt/main/Exp

Created new directory with command: mkdir 0263

From here, I will set up the new experiment.

I would like to review a few questions with Sam before proceeding, but also may defer the experiment completely depending on how things go during our next class. Depending on the status of our planned boot camp (designed to get everyone in the class up to speed on training etc) it may be best to wait until new groups are formed to continue with experimenting. Once we have designated new groups, I will continue with the experimentation as planned.

Also, reviewed logs from the rest of the class.


 * Plan:


 * Concerns:

Week Ending March 24, 2015

 * Task:


 * Results:

'''03/19/2015

Reviewed logs.

Data group met via Hangouts last night to discuss our remaining tasks. We have divided up what is left among the team members, and will proceed with the workload this week.

Caesar has been offline yesterday and today. Will continue progress once it is available.

Emailed Sam this week regarding the coordination of our team to start testing our current baseline.

'''03/21/2015

Read logs, and have been attempting to log in to Caesar daily. System still appears to be down, with no mention in the system team logs or Wiki page. Will follow up with email to the team and see if there is any information.

'''03/23/2015

Followed up with Adam and Mohamed regarding the current state of Caesar. I eventually heard back from both of them, and Mohamed said he was going in today to see what is happening with the server. Hopefully we will hear something back soon and proceed with our work in the Data group, as well as with our Bruins team.

This evening I met with Sam via hangouts, in order to discuss our upcoming strategy to optimize our trains and improve our baseline. I will refrain from mentioning details in this public log, as it is a competition and I don't want to divulge the particulars of our discussion. We are ready to get started as soon as the system is available and start honing our baseline result.

Also reviewed logs for the rest of the class.

'''03/24/2015

Heard back from Mohamed today, after he went to Pandora to address the issue. It appears to be a network problem, and even the laptops in the classrooms are having issues connecting to the network according to him. I am not sure what the current state of the matter is as far as having the issue resolved, but Caesar remains inaccessible. I had hoped to finish our Data team tasks this week, but they will have to wait until the networking issue has been resolved.


 * Plan:


 * Concerns:

Week Ending March 31, 2015

 * Task:


 * Results:

'''03/25/2015

Bruins group met during class today and covered a lot of ground. Sam did a good job of outlining the training process for the group, and showed everyone how to get started. After, we discussed various points and potential strategies to improve our baseline result. We identified several core approaches that we will be using in the coming weeks. The group was broken down into pairs, which will run trains with different strategies and settings across 125hr and 256 trains respectively.

Caesar appears to be online again, and reachable to begin work this week. I will start by wrapping up my remaining tasks with the Data group, and move on to the more advanced training that we have decided to execute as a group. We will coordinate via email this week to track group progress.

Also linked account via Blackboard to utilize cisunix to caesar connection.

'''03/26/2015

Logged into Caesar via cisunix system and checked out current experiments to confirm I am able to use the system remotely. I was unable to access the system last night, but it appears to be working for me today. Sent email to confirm settings being used for upcoming train before I get it running. Also reviewed class logs.

'''03/29/2015

Created a new experiment today in order to test a particular setting's effect on the baseline. Experiment was established on the Wiki first, and then created on Caesar.


 * Steps:

cd mnt/main/Exp

mkdir 0274

cd 0274

mkdir 001

cd 001

/mnt/main/scripts/user/prepareExperiment.pl switchboard 125hr_3170/train

/mnt/main/scripts/user/generateFeats.pl

nohup scripts_pl/RunAll.pl. &

Unfortunately the last step appears to be failing for an unknown reason. I am currently getting an error when the runall script runs.

Error Message: Something failed: (/mnt/main/Exp/0274/001/scripts_pl/00.verify/verify_all.pl)

I have emailed Sam and Ben regarding the issue and will continue to look into the matter. I know there were some script issues a few days ago, so perhaps others are having similar issues to the one that I am experiencing.

'''03/31/2015

Apparently many people have had several issues related to running their experiments and accessing Caesar this week. I attempted to rerun my assigned experiment using:

nohup scripts_pl/RunAll.pl. &

Unfortunately, the same error message was received, and I have been unable to find a solution. I had been hoping that maybe there was still some ongoing system configuration that would have alleviated the issue, but it is still non functional as of 03/31.

Error message:

Something failed: (/mnt/main/Exp/0274/001/scripts_pl/00.verify/verify_all.pl)

Looking forward to discussing with the team tomorrow to figure out how best to proceed in light of all the issues we have been experiencing. I have also reviewed class logs to see how others are doing with their work this week. The group was in contact through the week via email to discuss the various issues that came up as people worked on their experiments.


 * Plan:


 * Concerns:

Week Ending April 7, 2015

 * Task:


 * Results:

'''04/03/2015

After discussing the various issues the group had trying to run certain trains this past week, we have refined our strategy to focus on certain different objectives instead of continuing to debug some of the issues we experienced. Our group also elected against doing a team trade, and so did the other group. No one really wanted to disrupt our team, and so far people seem to be doing a pretty good job with their work. We have continued to converse via email this week about our current strategies.

It was also mentioned that we should archive some of our more productive discussions to save for future classes. It might help Capstone classes in the future get a better understanding of how to conduct this research, instead of basically having to relearn everything from scratch each semester. It was decided that we would not publish these documents until after the competition is over, as it would really serve no purpose in achieving our current goals.

I will be conducting my next train over this weekend with altered settings that I have previously discussed with Sam. Details will be listed in our new group page log.

'''04/04/2015

Today I am attempting to run a new 256 hour train, after it appears that issues with the 125 hour train have not been resolved. I am altering several settings that were suggested to me by Sam, and will see how they affect the baseline. I will cover the differences in the new group page once it has been established. Experiment has been listed in the experiment Wiki.


 * Steps for today:

cd mnt/main/Exp/0274

mkdir 002

cd 002

/mnt/main/scripts/user/prepareExperiment.pl switchboard 125hr_3170/train

* altered config settings *

/mnt/main/scripts/user/generateFeats.pl

nohup scripts_pl/RunAll.pl. &

Will update process information as these steps are completed.

'''04/05/2015

256 hour train that I initialized yesterday continues to run successfully. I will continue to monitor the log as it progresses. I notice that certain steps seem to have an extreme number of error messages, but not quite sure what the significance of them is at the moment. I will confer with Sam and the rest of the team and see if others are receiving similar results in their experiments.

'''04/06/2015

Current train continues to run as of this evening. Reviewed other team member logs and have been in touch via email with the Bruins team regarding several issues as well as the matter of having the decoding protocol overhauled. The current setup creates too much system strain, so Professor Jonas has made some changes as well as suggestions to the team to optimize our efforts so that we may achieve results in a more timely fashion.


 * Plan:


 * Concerns:

Week Ending April 14, 2015

 * Task:


 * Results:

'''04/10/2015

256hr training finally finished, and now we are waiting on the new 5 hr decoding test set to be prepared before moving forward. In the meantime, I have taken the steps to create the language model, by doing the following once the train finished:

Create LM direcory

mkdir LM

Enter directory for setup

cd LM

Copy transcript from corpus directory

cp -i /mnt/main/corpus/switchboard/first_5hr/train/trans/train.trans trans_unedited

Prepared the transcript

/mnt/main/corpus/switchboard/dist/transcripts/ICSI_Transcriptions/trans/icsi/ParseTranscript.perl trans_unedited trans_parsed

Copied the script that creates the language model.

cp -i /mnt/main/scripts/user/lm_create.pl.

Executed the script

./lm_create.pl trans_parsed

Our group has been in constant communication about our current efforts, and also ways that we might be able to optimize our system resources to achieve faster results. The servers in use at the moment are under heavy load, so if we can find a way to get more done in less time it would be a big help in our experimentation process.

'''04/12/2015

Today I began the decoding process of my 256hr train. After reading through the various emails sent out over the weekend regarding the project and the new decode test set, I went ahead and set up the decoding phase. The current test set data resides in 0272/002, so it was copied to my local 0274 directory for use.

I have also gone ahead and utilized the miraculix server for my decoding today, in order to take some of the strain off the main Caesar server. I also updated the team via email that it is being used for this purpose, in case anyone else was looking for a faster server to run their tasks.

One note regarding the miraculix server; I had to use the "perl" command in order to run the decode script, which is not included in the Wiki indications.

'''Steps

ssh cisunix.unh.edu ssh caesar.unh.edu ssh miraculix


 * Set up the decode directory:

mkdir DECODE


 * Copy decode script:

cp -i /mnt/main/scripts/user/run_decode.pl.

cp -i 256_train.fileids /mnt/main/Exp/0274/002/etc/256_train.fileids
 * Copy the test set from 0272:


 * Take the first 5 hours:

head -5000 256_train.fileids > 256_decode.fileids


 * Rename file for script:

mv 256_decode.fileids 002_decode.fileids

nohup perl run_decode.pl 002 0274/002 1000
 * Run decode in the DECODE folder:

I will monitor progress today and review the results of the decode.

'''04/13/2015

Decode of the 256hr train finished, and I proceeded with generating the scored results of the decode. End result was 49.3%, which is not too bad. I spoke to Sam, and he made some suggestions regarding running another decode with additional setting changes that could potentially yield a stronger result. I will begin work on that this week.

Steps utilized to score the decode:

/mnt/main/scripts/user/parseDecode.pl decode.log ../etc/hyp.trans

cd ../etc

sclite -r 002_train.trans -h hyp.trans -i swb >> scoring.log

Will post status of new decode when new data is available.

'''04/14/2015

Reviewed logs of classmates. Will run second decode this week and review the results with the Bruins team.


 * Plan:


 * Concerns:

Week Ending April 21, 2015

 * Task:


 * Results:

'''04/17/2015

After reviewing the results of the previous decode, I have changed an additional setting after consulting with Sam, and have initialized a new decode. Decode process is currently running on the miraculix server. Will post results when the process is complete.

I was not in class this week, but have been reading the group emails, and it seems that a few group members are going to be undertaking tasks other than training/decoding. I think this is a good idea and a more efficient approach to our efforts, especially since there is only a few more weeks in the semester for us to generate the best baseline possible.


 * Steps for decode

Login

ssh cisunix.unh.edu ssh caesar.unh.edu ssh miraculix

Rename scoring log

cd / cd mnt/main/Exp/0274/002 cd etc mv scoring.log scoring.log.old

Establish new decode process (First changed settings before initializing)

cd .. cd DECODE nohup perl run_decode.pl 002 0274/002 1000

'''04/19/2015

Results of the second decode run were disappointing - the changes that I had made had no effect whatsoever on the scoring result. I executed the scoring procedure more than once to ensure that I had not made a mistake, but still came out with a final result of 49.3%. I was anticipating some sort of change, given the success other team members have had with modifying the settings I was working with, but the result remained static. I have updated the team on my results, and will be starting another train once I can confer with a few of my teammates. This week is also the URC, so we must determine if we will be meeting to discuss our strategies when we attend the poster presentations.

Also reviewed logs of class members to see what sort of progress is being made.

Steps utilized to score the decode:

cd /

cd mnt/main/Exp/0274/002/DECODE

/mnt/main/scripts/user/parseDecode.pl decode.log ../etc/hyp.trans

cd ../etc

sclite -r 002_train.trans -h hyp.trans -i swb >> scoring.log

'''04/20/2015

Review of class logs.

'''04/20/2015

Additional review of class logs.

Also a note.. it seems I was not the only one having issues with redoing decodes with different parameters and getting the same results. We suspect there may be something going on here that is affecting the final result, and will have to investigate the process to see what is going on.


 * Plan:


 * Concerns:

Week Ending April 28, 2015

 * Task:


 * Results:

'''04/23/14

Have been in contact with the group via email discussions on how to best utilize our final weeks of experimentation. Some new ideas have been floated that sound promising, but we will have to test them out to see how effective they may be. Not much was achieved during our usual class period as it coincided with the URC poster presentations.

Also reviewed logs for any additional information that might help us.

I am not going to have internet access this weekend due to moving, but I will be resuming my efforts early next week to assist the other members who will be conducting experiments over the weekend.

'''04/24/2015

Zach and Sam have been in touch about current strategies, and we were discussing a few potential ideas for experimentation. I will be offline due to no internet connection over the weekend, but will be back Monday.

Also reviewed log activity of the class.

'''04/27/2015

Back online after the move and have internet access again. Have been in touch with Sam and emailed him and Zach regarding our next moves regarding the baseline. Will hopefully build on the work they have been doing and enhance our current results if possible.

'''04/28/2015

Have been reading the group correspondence, as people are working on their assigned tasks and everything is coming together to create the deliverables for our final results. There are still a few experiments being conducted, and depending on what Zach and Sam have going on at the moment, I may start another one this week if it will assist our efforts. Tomorrow is the second to last class of the semester.

Also reviewing logs of classmates.


 * Plan:


 * Concerns:

Week Ending May 5, 2015

 * Task:


 * Results:


 * Plan:


 * Concerns: