Speech:Spring 2013 Proposal

From Openitware
Jump to: navigation, search



This following proposal is for Spring Semester 2013. The proposal is broken up into different sections based on the groups we have working in class. The end goal of this project is to develop speech models to be used in conjunction with various tools to improve on our speech recognition technology. The major tools we will be using are the LDC Switchboard and the CMU Sphinx Speech Recognition Toolkit. Our team intends to build on what the previous classes have worked on to improve and ultimately succeed in getting our models to work with high success rates.


Group Members: Brian Drouin Michelangelo Bianchi Tyler Martin Michael Tormos

Our tasks throughout the semester are divided in to four parts. We ran into a minor issue when powering up the system for the first time in which two of Caesar’s fans had died and required replacement. Fans have subsequently been donated and will be replaced by the team. On 2/11 Caesar lost a drive which took down the server along with access to the rest of the system. Members of our team along with a few other students were able to replace the drive and restore the server after a few days spent figuring out the system's configuration. An attempt to upgrade the system was attempted during restoration efforts but the decision was made to keep the current openSUSE version. Further upgrade research may continue throughout the semester but an OS upgrade appears to be out of the question. Our team will now focus the other three system tasks along with any other new issues that arise.We will also look at the current value of each server as the information posted during the Spring 2012 semester is most likely out of date. We will also improve upon the documentation of this system as it relates specifically to the hardware.


Another team goal is to research the distribution of memory throughout the system. We have each server’s memory configuration to include the size of the modules and their locations on which banks which will be posted to the wiki. We will draft a plan in which we can purchase or acquire more memory to create an equal distribution of resources across the system. This memory upgrade has the potential of greatly increasing the performance however, the cost may of memory upgrades can be expensive. A cost benefit analysis will be drafted and presented for discussion.

Our plan is to purchase 6x2Gb memory sticks to be distributed throughout the servers. This upgrade will balance out all of the servers except for Caesar. Caesar is still getting an upgrade in memory but was not the priority since the other servers will be carrying the work load. Altogether this proposed upgrade will cost $48. Funding for this upgrade has already been approved and awaiting funds.



Expected downtime:

1 - 2 hrs

Operating System Updates

Currently the OpenSUSE system that is installed on the Caesar server and client drives (Asterix, Obelix etc.) is 11.3. This system currently works very well and is able to run the Sphinx speech experiments that the speech project team members conduct on it. However as noted in last year's log, 11.3 is outdated and also no longer supported by the developers as well. Initially, we had considered upgrading to the next version of OpenSUSE 12.2 since the initial OS (11.3) became corrupted after Caesar's Hard Drive failed, but this did not work out well as 12.2 ran slowly and could not connect to the Internet (which lead to us downgrading back to 11.3). This leads us to the question of whether or not we should update to an alternative Linux system such as Fedora or Ubuntu. We want to look into this as the newer versions of both Fedora (17) and Ubuntu (12) may have updated drivers, better security features, and/or run better with whatever version of Sphinx the tools team decides to install on the Caesar server. The Systems group plans to spend some time not only researching backup options, LDAP and hardware upgrades, but also researching if there any notable differences between OpenSUSE 11.3 and the newer versions of the two alternative Linux based systems that would warrant a reason to upgrade the current system. An example of these reasons being the aforementioned major bug fixes or security patches. We will also make sure that the system will support the RAID array that is already set up for the server (and was very helpful with the last hard drive failure). The group also wants to make sure that the system will work on the given hardware as well and will research the hardware specs for the new systems too. A decision on whether to update the system should be made by the time specified in the timeline below.


2/6-3/12 - Decision and possible proposal regarding the decision on updating the system (bumped up to allow time to backup system and files).


The backup option that we’re currently investigating is clonezilla in which we’ll not only backup the data but will image the operating systems of the servers. Clonezilla is a free open source tool that supports GNU/Linux, can easily be installed and provides a simple way of backing the system up in its entirety. Two issues that have yet to be worked out are that we will need to acquire additional external external hard drive space to perform the backup. The other being that clonezilla isn’t able to image a system in which the file system is mounted. This requires a scheduled shutdown of the system for a specified time while the backups are performed. Research is ongoing on exactly how much external disk space is required and how to manage this back up. If we do decide to upgrade, a backup of the previous configuration is essential along with periodic backups throughout the semester to ensure that data, models, trains, decodes are preserved in the event of a system failure.


Test on one of the batch machines - 2/27/13

System backup - 3/6/13

User Accounts

With respect to user accounts, the implementation of LDAP is a consideration. Our team will research this option and a final decision will be made on whether to move forward when we receive the final consensus on if the system is getting an upgrade.


Group Members Charles Haynes Michael Brown Bego Terzimustafic Harry Dodson

The tools group will be responsible for testing the software previous tools groups have installed and maintained. We will be testing the installations Sphinx, CMU LM Toolkit, and SCLite Score to ensure proper program stability. With the progress of last year’s tools team being able to create a soft link from Caesar’s /mtn/bin to the batch machines means this year’s tool group will make sure the soft link is operational, and no other groups are having problems using tools on their batch machines.

We will also be updating the software chart containing what versions of software are currently installed on the server and posting our information on whether an upgrade was necessary or not. It is important to document why our team has not chosen to upgrade a particular piece of software if that is the case. Furthermore, our group has decided to help consolidate the work of previous tool groups by highlighting the key points the groups were able to accomplish. The goal of this is to help future tool groups have a more centralized location to find the information needed to accomplish what is required during the semester. We will be making small updates to Sphinx and the Toolkit on a virtual machine to test stability without compromising Caesar. This requires a virtual machine to be created with the installation of Sphinx, CMU LM Toolkit, and SCLite. Our goal is to also configure the machine with SSH that could not only allow our group more flexibility, but also allow other groups a testing environment for changes they might deem necessary.

We will be making small updates to Sphinx and the Toolkit on a virtual machine to test stability without compromising Caesar. This requires a virtual machine to be created with the installation of Sphinx, CMU LM Toolkit, and SCLite. Our goal is to also configure the machine with SSH that could not only allow our group more flexibility, but also allow other groups a testing environment for changes they might deem necessary.

We will also be spending time with familiarizing ourselves with a Unix based operating system. This is crucial for seamless operation of the software and will allow us to build on previous semesters work. Furthermore, we need to understand how to create scripts using the Perl programming language which will allow us to troubleshoot problems and refine code already created. The most important aspect our group must focus on is the proper documentation of what we are doing. This fact cannot be overstated due to the importance of our role within the class. Our peers should be able to easily follow step by step instructions that will allow them to complete their goals as well. The tasks to be divided among our four group members are as follows:

  • Understanding and running Sphinx
  • Maintaining and updating related software within the OpenSUSE environment
  • Understanding and reworking Perl scripts to load into Sphinx
  • Updating and documenting the individual processes for posting on the Wiki

Weekly Goals for the semester

  • 2/20/13 – 2/27/13: Test current software installations to ensure program stability.
  • 2/20/13 – 2/27/13: Test current software installations to ensure program stability.
  • 3/4/13 – 3/11/13: Compile accomplishments from pervious semesters to place in a central location on the wiki
  • 3/11/13 – 3/18/13: Update wiki with the information we have learned over the semester in combination with previous tool groups
  • 3/18/13 – 3/25/13: Complete virtual machine to provide testing environment and create proper documentation on how to configure the virtual machine


Group Members: Matthew Henninger Sam Workman Kevin Annis

Switchboard Corpus

Data has plans to organize all the transcripts. We plan to match up all the transcripts with the audio files. Our first step in the process is to verify that we have all the audio files, and corresponding transcripts. We will first calculate the total times for both the 2,400 audio files, and the various transcripts. Once we have the time stamps, be can begin match the audio. We will then take all the transcripts and append them to make a single transcript file.
  • Week of 2/27- To verify that all transcripts are up to date, and match audio
  • Week of 3/6 - Append transcripts to make one Transcript file
  • Week of 3/13 - Make Data available for the other groups

Language Modeling

Missing Transcripts

This team needs to find the missing transcripts from the data we've been using with previous semesters. Earlier classes thought they were using all one hundred hours of recorded audio with transcripts, in reality they were using a fraction of that data. Goal is to find missing data in the first section of class.


Group Members Scott Adie Marc Southard Nicholas Regan Joseph Gallagher Justin Mulholland

File Server, File Capture and File Format

As part of the experiment group, this sub group will work to assist the Modeling group by reducing the amount of scripts needed to copy experiments and run trains. This will be the main focus because of the intricate details needed to be understood in order to modify the current scripts to the needs of the Modeling group. This will require that the group spend time reading the current scripts and their functions while also getting familiar with PERL in order to better write and consolidate the functions of each script. Knowing what was accomplished by the previous class and using the logs of what they felt might be lacking in the scripts they creatd will also help guide our group to continue to improve upon the system in place. The reduction and consolidation of script files will help facilitate a more efficient process for not only the Modeling group but for any user(s) in the future.

Tasks / Goals:

  • Familiarize ourselves with PERL scripting
  • Familiarize ourselves with the scripts created by previous classes
  • Communicate with the Modeling group in order to adapt scripts to function more efficiently
  • Understand the current process used for trains and look for areas to improve upon


The SpEAK website that is to be used with this project is an online database for speech experiments. The goals of this team are to upgrade and maintain the SpEAK website. One such upgrade is the inclusion of user accounts to the site. This will make it easier for everyone on the team to know who is uploading to the site to hopefully resolve problems that may arise quicker due to having an immediate source. Another feature is the ability to upload files and edit the experiments that go along with them. This will make it easier to create a central depository for our project related speech files.



  • Add the ability to upload files to the SpEAK site.
  • The ability to edit experiments.
  • Update user permissions per page and account management.
  • Allow admins to create new users.
  • Install jQuery 1.8.3 functionality including the adition of DatePicker to add/search
  • Fix description that isn't showing when viewing.


Members: Eric Beikman, Josh MacPherson, Drew Mather, Thomas McCarthy

Our goal is to achieve proficiency with the process of building the models which are a core component of Sphinx, and indeed all other voice-recognition software. Once this proficiency has been met, we will disseminate this information to the other teams.

There are two key models which are required by Sphinx, an Acoustic model, and a Language model. The Acoustic model can be summarized as a mathematical representation of each component which makes up every word. Likewise the language model is a statistical analysis of a single language, mainly looking at how often certain words appear in certain positions and combinations within sentences. By utilizing a good Acoustic and Language models, a speech recognition system, such as Sphinx, will be able to determine spoken words based on how its pronounced and its context within a sentence.

The creation of these models is a key aspect within the UNHM Speech project. Experiments are built around the building each of these models utilizing a set of audio recordings with a corresponding set of written transcripts. There are two main parts to modeling: Training, which creates the actual acoustic and language models; and Decoding and Scoring, which tests the models utilizing a separate set of audio and transcripts, creating a score correlating to the quality of the new models based on the the accuracy of the decoding step. The modeling team will need to be proficient in all of these aspects.


Currently the assembly of a train is a time intensive process that involves the use of a 8 different scripts and a number a intermediary steps to complete. With the help of the Experiment Group we plan to have the process reduced to 2 larger scripts which will automate the most time consuming part of the process. This will enable the trains to be set up quickly and easily. The training process entails creating an experiment directory, then creating a transcript file of a specific portion of the switchboard corpus, next the sphinx_train.cfg file has to be modified with the experiment specific variables. Then the dictionary is generated from the sphinx files transcripts. Then the filler dictionary is copied over into the experiment directory, The filler dictionary provides the phones necessary for handling silences and gaps in speech. Next genPhones.pl script is used to sort through the dictionary and copy the necessary phones. At this point the train is prepared and ready to be run. The bulk of this process will be what gets automated, though it is important to understand the steps in some detail.

Decoding & Scoring

After a training session has been completed, this will take between 30min-10hours depending on the size of the train, the results have to be decoded and scored. Decoding is used to verify the acoustic models generated from the training session. The final step in the experiment is to score the decoded training session, this will be dependent upon the Tools group successfully getting SCLite working properly.


The modeling team will have two phases. The first phase is to thoroughly understand the process of running an experiment, and thus thoroughly understanding the process of creating both the acoustic and language models and verifying their accuracy.

By week 8, all members of the modeling group will be able to create and successfully run a train to completion, including the decoding the subsequent analysis and scoring.

The second phase will be to distribute this knowledge to the other groups. The modeling group will be incorporated into other groups, for which we will distribute the knowledge gathered during the first phase. This phase will start after the completion of the first phase, will continue until the end of the semester.