Speech:Spring 2014 Experiment Group
- Information - General Project Information
- Experiments - List of speech experiments
Group Member Logs
Assigned machines are: methusalix & verleihnix
|Joshua Anderson||6:30PM - 9PM||6PM - 9PM||4PM - 5:30PM||Not Available||Not Available||Morning and early afternoon||Morning and early afternoon|
|Brian Gailis||6PM - 9PM||6PM - 9PM||4PM - 9PM||6PM - 9PM||6PM - 9PM||All Day||All Day|
|Ramon Whitman||All day||not availible||5-9pm||not availible||6-9pm||All day||All day|
|Pauline Wilk||9AM - 12PM||12PM - 9PM||4PM - 5:30PM||4PM - 9PM||Fluctuates||9AM - 1PM||All Day|
- Times Shown Offer Available start and end times and represent Eastern Time Zone
- For example, if time shows 2PM - 11PM, this represents available between the start hour of 2PM through the end hour of 11PM
- Grammar for signing up for URC: CIS Capstone: Experiment Group
- Poster should be about the focus of the group.
Week of Feb 12, 2014
After our group Capstone meeting today, our goals shifted a little bit. We have a new set of goals listed below that we will be following as of now.
- We need to collectively be learning the structure of the experiment directory. We will all work on this individually through out the week and log our research in each of our personal logs. then, before the next meeting we will collaborate our efforts into a guide and add it to the information page.
- When learning the structure of the experiment directory, we can do this by focusing on the following:
- Where things are
- How they are stored
- What does sphinx create when it runs a train?
- There are currently guides on wiki that reference items in file paths that no longer exist. To go through these guides and update them will the correct file paths and file names.
- Decipher and explain Eric Beikman's experiment automation scripts. (Articulate what already exists)
- Describe in detail what each of them do and document it on the information media wiki page.
- Gain an understand of why all of the steps are running in the train and what each of them is creating.
Perhaps if members have time, here are a few more tasks that we could tackle:
- Continue to check out the condition of the SpEAK application. Find out how to get it running off of Rome.
- Go through the experiment logs and figure out the type of each experiment. Then reorganize them in a way so that when a group is looking for past experiments of a specific type, they won't have to open up hundreds of experiment logs
- Automating the backup for experiments. For this we have to wait on the Systems Group? to figure out where we will be backing them up, set it up, and configure it for us.
Week if Feb 5, 2014
- Learn the current system and how to create a new experiment with the existing scripts
- The most important task for each of us to accomplish would be to go through the process of running a train on an experiment. Everyone should to this, take notes, fix and bugs in the scripts, and then we should reconvene and discuss how we will tackle this. The detailed instructions on how to do this can be found here Steps for Running a Train. Read the notes at the top before starting to prevent creating a mess.
- Work on proposal of what we plan to incorporate this semester.
Implementation/Long Term Goals
Modify the Perl scripts that runs all the scripts needed to run a new train; priority right now is to automate the following steps:
- Set up the task directory - http://foss.unh.edu/projects/index.php/Speech:Training#Set_up_the_task_directory:
- Set up the Sphinx Train Configuration file - http://foss.unh.edu/projects/index.php/Speech:Training#Set up the Sphinx Train Configuration file:
- Generate the transcript and its associated audio-file list - http://foss.unh.edu/projects/index.php/Speech:Training#Generate the transcript and its associated audio-file list.
NOTE: During last Summer, Eric wrote three new scripts that help in automation of some processes. Two of which were built to automate parts of the beginning processes of Running a Train experiment: train_01.pl (http://foss.unh.edu/projects/index.php/Speech:Train_01.pl) and train_02.pl (http://foss.unh.edu/projects/index.php/Speech:Train_02.pl). These two scripts have been tested by him during the summer. During this semester, it would be helpful for the Modeling group if these scripts were combined into one that allows the end user to call one script and having them pass all the required parameters needed for these scripts.
Complete the clone_exp.pl script (http://foss.unh.edu/projects/index.php/Speech:Clone_exp.pl)
- This is the third script Eric wrote. He has tested this as well.
- Design to clone one experiment based on the contents of another.
- From Eric: "This script is designed to clone an existing experiment. It will either clone the dictionaries and phone list; the transcripts, file list, and wav-files; or it will do both. It will not touch the sphinx_train.cfg file or create feats from the copied wav files; use train_02.pl and make_feats.pl respectively to do those tasks."
I think it will be good to expand on this script to include the steps Eric left out (make_feats.pl and train_02.pl) to make this script more robust and usable.
Create a master Perl script that encompasses all TYPES of different experiments that can be run including (need confirmation that these are the types needed):
- Language Model
This is a more ambitious task. According to Professor Jonas, there are a variety of different types of Experiments that can be run. The list above is what we received during the 2nd class meeting. Before anything, we need to confirm the list above and make sure we have all the unique types of experiments that can be run. This way we can create a Perl script that will allow the end user to possibly use it like:
=> Please enter type (train, dict, t/dic, decode, t/dec, langmodel): ...
And go on from there having them specify the appropriate parameters.
I think the first step in creating a fully fledged Master Script would be to get the Run a new Train scripts all together and get those finished and tested so we have a base.
12Feb2014 - Colby Johnson
I had discussed making a master script that automates running a train, well it does seem that train_01 does that first part, Train_02 does the second part, genTrans#.pl (i.e. 6) does the third part, pruneDictionary2.pl does some of the fourth part, and make_feats. does the 6th part. The fourth part needs a file copied as well as the script run. but that could be a simple automation. The fifth part also needs a file copied but will need an entry to the file. This would be the order the operations for the master script. each of these steps have necessary parameters. Such as dictionaries to use. these could be optional and default to certain values or there could just be a lot of necessary parameters. This would be a useful to those who understand the process and just need to expedite the process for data collection purposes.
- Week of Feb 5 - Feb 12
- All of us understanding the process of Running a Train which includes creating a new experiment. Also looking into the entire Experiment system as a whole to gain more knowledge of the different types of experiments that can be run.
- Week of Feb 13 - Feb 19
- Final draft of project proposal/timeline is due.
- Understand the experiment directory structure
- Describe all scripts by reading through them and by reading media wiki logs.
- Get a better understanding of the directories
- Look at directories and understand what is in them and why.
- Week of Feb 20 - Feb 26
- Week of Feb 27 - Mar 5