Speech:Spring 2012 Proposal

From Openitware
Jump to: navigation, search


Contents

Introduction

The following is the class project proposal for the Spring Semester 2012. The proposal is broken down into sub sections which detail specific team members and their associated contributions to the Speech Project. As a whole, the team plans to develop speech models utilizing the Sphinx Speech Recognition Toolkit. The overarching goal for the semester is to execute a "mini train set", however, the team intends to improve upon many other areas including organization of the hardware and software of the system, documentation of work progress, and creating backup mechanisms. Conclusion of the Spring 2012's class should leave future groups in position to pursue the execution of the "full train set" and improve upon the acoustic model. This will improve the accuracy of decoding with Speech software and improve overall speech recognition technology.

Infrastructure

COMMENT: All work on Infrastructure needs to be complete by March 27th. (Note that you lose a week for Spring Break.)

Capture

Hardware Configuration

Assigned to: Aaron Green

This portion of the class proposal will outline the hardware makeup of the Speech Tools System. This proposal will outline Aaron Green's individual contributions to the team “Speech” project in an effort to move the system closer to its full functionally. The hardware makeup is a key component in the project and properly identifying the major components will be one of the many parts of the project proposal. Within the hardware proposal will sit three categories. The categories will outline the hardware makeup and specific system performance aspects and over all capabilities of each piece of equipment that make up the entire hardware foot print. After looking at the key components, research must be done to see if the components are up to date and are functional for the task. Identifying possible upgrade requirements will need to be properly vetted and weighted against cost, labor hours, and school network restrictions before a possible upgrade can be completed.

The first phase of the hardware project work is to properly identify each piece of equipment in an informative table for future users to reference. The information that will make up the table will have individual component speed, memory, disc space, CPU information, and the estimated value of each item in the hardware configuration. In order to properly map each component's capabilities, Aaron will need to look at Linux commands similar to the resource links below to run queries about the hardware. This will take time to investigate, but via the command line he will be able to extract all the data with a few Linux commands. Once all the data is compiled it will be inserted into a functional and usable table.

The table will be drafted and made using an image capture from a spreadsheet program and then moving the image to the wiki using HTML table tags. This will allow the table to have future link capabilities and allow for system changes to easily be reflected by small changes in the HTML tags. The table will have a header and boarder, and clearly show all the components listed by name. The names of each component can have wiki links for future development. The table will be simple but informative, but also have additional information about each component that may not be easily apparent to system users.

The additional information about the components will take some investigating and research in hardware makeup, possible upgrade opportunities and cost estimates of the current components. The makeup will be covered in the first part of the hardware outline, but upgrading possibilities will take online research about the server and system performance limits vice cost to upgrade to newer components. This can all be done online after the system has been effectively outlined.

I plan to use all available tools to complete this project prior to the project deadline.

TimeLine:

  • (2/28- 3/5) Identify all the components in the system within a spreadsheet. For each component, indentify CPU information, Speed, Memory, Max Memory and Disc Space. Start on wiki table to insert data from spreadsheet to live wiki site. Extra table blocks will be made for following weeks work.
  • (3/6 – 3/12) Look at each component anatomically. Research estimated cost for current hardware components. Look at possible hardware upgrade opportunities within each component. This information needs to be inserted into the wiki table. This process will continue into next week.
  • (3/13 – 3/20) Continue..Look at each component anatomically. Research estimated cost for current hardware components. Look at possible hardware upgrade opportunities within each component. Also, estimate labor to upgrade system components and complete a risk verses gain on the upgrade information researched. This information needs to be inserted into the wiki table.
  • (3/21 – 3/26) Finalize Hardware table and write documentation/description for the hardware section.

System Software Configuration

Assigned to: Damir Ibrahimovic

Having the correct software configuration on every system is very important. Each piece of hardware in the team's current machines must have the correct software installed to operate properly. It is important to follow up with updates for the release being used and find out what updates can increase system performance or resolve known bugs or other issues. OpenSUSE announced on November 30th that there will be no more updates to the OpenSUSE 11.3 Linux distribution, starting on January 16th, 2012. Support for OpenSUSE 11.3 will be officially dropped, which means that starting January 16th, 2012, the OpenSUSE project will stop "feeding" the OpenSUSE 11.3 operating system with security fixes or other critical fixes, and software updates. The research team will have a system that runs this version, and to know that they will be stuck with what they have is a bit scary, but if it works for the project then they are okay!

What the team proposes to do is to find which version of the OS is currently being used. Because the research team will be using Sphinx3 for their project, Damir will find out if this version is fully compatible with the current OS version and future versions of the OS. There are also additional packages that come with Sphinx3 that need to be checked. The new release usually brings new features, bug fixes, and other improvements to the OS, but does not mean that the system will maintain all of its current functionality. For example: the packages that are intended for use on OpenSUSE 11.3. are unlikely to work on other releases. This is due to numerous differences in compilers, libraries and release contents.

Having all software correctly installed and running versions that are supported by the current operating system is a must. There is a new version of OpenSUSE that supports Cloud technologies, better handling of smaller screens and multi-screen setups, more robust notifications and a centralized online accounts configuration. The current version is working well for now, but the team is discussing backup options including cloud solutions! Damir proposes to elicit the System Minimum Requirements necessary before updates are instituted.

The time line for these and other procedures will be:

  • On (3/5) access Caesar and other machines and check current software configuration using Yast or command line.
  • On (3/12) compare current software configuration with new, create comparison table for current, new, and beta version (if exist). Write down benefits of upgrading to new software version and why not to upgrade to new software version.
  • On (3/30) finish up the proposal and post it to Wiki.

Speech Software Configuration

Assigned to: Bethany Ross The Speech Software will create a table that will show compatibility. It will show what the current version of the five speech software tools are in the first column. The five speech software tools are Sphinx, the language model, the Dictionary, the Trainer and the Decoder. The current version of Sphinx we're using is 3: it is currently installed on Caesar with the openSUSE 11.3 operating system. The next column will show what the newest versions of the speech software tools. The last column will show the differences between the current and the newest version, including bug fixes and patches.

Aside from configuration, tables will contain information about where to find the current version of the speech tool that is hosted on Caesar and where to find the newest version. This will be done with direct links to the location of the files online if possible. Of all the various parts of the table, this will probably take the longest amount of time and research because the current version may be outdated or contain bugs in some cases. When this happens, the creators of the software will delete the old versions and only post the newest version. If a direct link cannot be found, an explanation of other possible ways to retrieve the software will be provided.

The main purpose of this table will be to assist its user in determining whether or not the operating system will support the newest version of the software tools. To gather this type of information, searches will be conducted online. Also, tests will be performed to see if the newest versions of the tools will work correctly on a machine running OpenSUSE 11.3 operating system. This step will require additional research and input from team members to learn how the system works.

Timeline:

  • (2/28- 3/5) Begin creating table and determine what current version there are of the speech tools. Also where to find the current versions.
  • (3/6- 3/12) Determine what the newest version of the speech tools are and where to find them.
  • (3/13- 3/19) Research the differences between the two versions, such as what bugs have been fixed.
  • (3/20- 3/27) Test and/or research to see if the newest versions are compatible with OpenSUSE 11.3 operating system.

Modify

Speech Software Installation

Assigned to: Jonathan Schultz & Matt Vartanian

Matt Vartanian and Jonathan Schultz of the software team will be documenting and installing a CMU Speech Recognition software on a server called Majestix. Majestix is one of several networked servers connected to a main server, Caesar. As of 2/24/2012, Caesar is the only server to have CMU's Speech Recognition software installed. The software team plans to locate the files that will be created during the installation of the Speech software, perform the installation of the Speech software and tools on Majestix, and make a soft link to those files on the mounted directory of /mnt/main. The primary purpose of this installation is to provide students with more convenient access to the Speech software, but also to organize the file system and assist students in gaining familiarity with the files that comprise the Speech software.

The directory /mnt/main is physically located on Caesar and is shared by all servers on the network. Each server networked to Caesar (including Majestix) has a mounted directory called /mnt/main which points to Caesar's /mnt/main directory. The software team will be installing the Speech Recognition software in the the directory of /usr/local/bin under the root of Majestix. Once this has been accomplished, they will then create soft links in Caesar's /mnt/main which point to the Speech Recognition software on Majestix, making it readily available on all servers.

To complete this task the software team must first create a directory under /mnt/main on Caesar called /root/tools. Next, they will create a soft link in /root/tools which points to Majestix's /usr/local/bin folder after installation of the Speech software has been completed on Majestix. Once the soft link is created, every server on Caesar's network should be able to run the tools needed to decode and train with the CMU Speech Recognition software by navigating to the /mnt/main/tools directory. If not, installation of the speech tools and Sphinx 3 will have to be performed on each server individually.

To create the soft links, the software team will rename the bin folder in /usr/local to binold. Then they will create a soft link named bin in the /usr/local directory and copy the files over to this link location. Once this is done they will create another link in the /mnt/main/root/tools directory to complete the link. Once they are done creating the link they will delete binold. In theory this will create the install point of the software and make it possible for the other servers to run the software off of the share /mnt/main directory.

These tasks will take time and some experimentation before they're completed. The software team will need to research the nature of soft links in the Suse Linux environment and change plans according to any unforeseen circumstances. All tasks will be completed by March 19th. A more detailed schedule is listed below but exact dates may be subject to change pending research outcomes. The timeline for these tasks are as follows:

  • (2/29 - 3/1) Jonathan Schultz will create the folders on /mnt/main
  • (3/2 - 3/4) Jonathan Schultz will install Sphinx 3 and locate all files on Majestix
  • (3/2 - 3/4) Matt Vartanian will install the Speech Tools and locate all files on Majestix
  • Jonathan Schultz and Matt Vartanian will decide if the soft link will work or if we need to install everything on each server
  • (3/5 - 3/6) If the files are in the right place, Jonathan Schultz will create the link on Majestix
  • (3/5 - 3/6) If the files are in the right place, Matt Vartanian will create the link on /mnt/main
  • (3/5 - 3/19) If the files are not in /usr/local/bin, Jonathan Schultz will install Sphinx 3 on each server
  • (3/5 - 3/19) If the files are not in /usr/local/bin, Matt Vartanian will install the Speech Tools on each server

Speech Data Corpora

Assigned to: Brandon McLaughlin & Michael Henenberg

Divide into Mini & Full with train, dev & eval sets for each

The speech data Corpora will consist of all the data transcriptions and the converted .sph files into .wav files. All the disks right now rest in the media/data/switchboard directory. The disks must be moved to the mnt/main/corpus/dist directory. In order for the .sph files to be converted into .wav files the SOX command must be used. The SOX command will take the .sph file and convert it to a .wav file. The command to be used is: SOX filename.sph SOX filename.wav. The best way to do this will be to bring a whole disk into Brandon or Mike's testing folder so no damages will be done to the real files.

The folders needed to hold the files will be created first with the appropriate files moved into them as they are completed. There is a script that can create the needed directories and files. Michael will attempt to learn to use the script to create the folder. However, if the deadline becomes a problem then he will instead create the folder manually in the directories, so progress can continue.

The second part of the file conversions involves cleaning up the transcripts. Right now all the transcripts still have headers and brackets that indicate emotions. They also contain brackets that indicate points in the audio where the speaker did not fully say the word. All of these files will be moved to Brandon or Mike's testing folder to avoid dealing damage to the real files. The SED command will be used to run through the text and eliminate what is specified in the command's parameters. This command will clean up the transcripts to the point that they will be about 95% complete. Writing a script to issue this command will be much more efficient by eliminating the need for the team to issue the command manually on each file.

The switchboard directory created in corpus will hold the directories "mini" and "full". These folders will hold the files needed for the mini and full trains and be divided into train, dev, and eval. The /train in both will store the cleaned transcripts and the .wav files that were created in the second part of the process. The dev folders will hold the file samples used in the test of the trains' rate of correct transcribing. The eval folders will hold the files used to analyze the results of the train in order to judge the accuracy of the trains' transcriptions.

The timeline for this section is as follows:

Michael

  • (2/28 - 3/6) Create folders
  • (2/28 - 3/6) Copy files in old folders to the new folders
  • (3/6) Make sure everything works properly

Brandon/Mike (once done his roles)

  • (2/28 - 3/10) Convert .sph files to .wav files using the SOX command and putting the files in the correct folder
  • (3/2 - 3/27) Copy the converted files to the correct folder (mnt/main/corpus/dist)
  • (3/9 - 3/24) Clean up the transcripts using the SED command
  • (3/9 - 3/24) Copy the converted transcripts to the Switchboard/(mini,full)/train folders


Dates subject to small changes.

Develop

Network Bridge

Assigned to: Skyler Swendsboe & Evan Morse

The main task for Skyler and Evan of the hardware group is to set up a network bridge. This will be accomplished by using Caesar as a DHCP and DNS server. Caesar will act as a gateway for nine other servers so that they will always have a connection to the internet. To accomplish this task, an additional network card will be installed on Caesar to which the other servers will connect.

The setup will consist of two networks: Caesar's connection to the UNH network, and the local network created by Caesar for the other nine servers. Caesar will have a UNH IP address for its internet connection. The network Caesar creates locally will be a 192.168.X.X network. Each server should also have its own static IP for ease of use.

The physical setup of this bridge will be simple: Caesar has two NIC cards. One card is connected to the UNH network, the other will connect to a switch. The switch connected to Caesar will also have attached media from each of the servers feeding into it. This is how all nine servers can connect simultaneously to Caesar.

The purpose of this setup will be to allow each machine to maintain a connection to the internet so that they will be able to update the software being used. This will allow all groups in the project to get whichever packages their machines need. This will also allow the servers to perform any other type of internet-related task necessary.

The process of creating a “DHCP/DNS server” is one that both members of this group are familiar with on Windows machines. The main obstacle for this task is learning how to perform the same configuration on the OpenSUSE operating system through a terminal. This is going to be tested on two OpenSUSE test machines before full deployment onto Caesar.

Testing the network bridge is a simple task. The only things needed are two OpenSUSE sytems. One system will have two NIC cards and the other will have one NIC card. Getting this test system working will directly translate onto Caesar. The only difference for this setup will be that on the real system, the extra NIC connects to a switch, not to another NIC.


The team proposes to have Caesar act as both a DHCP router and a DNS server. The other nine servers will access the internet via Caesar's second network card. This second network card on Caesar, along with the other nine servers will exist on the 192.168.X.X network and use Caesar's primary network card as the default gateway.

This method has some advantages. Namely, all nine servers will access the internet via network address translation. This means that additional connections won't be necessary. This will increase the load on Caesar's bandwidth, but users will not need to physically switch ethernet cables around on a switch, or access the root system through software and enable or disable network cards before connecting to the internet.

The proposed configuration will also increase software flexibility. Servers will be able to update their software automatically or pick and choose when to apply certain updates. This will make the updating process more agile and eliminate other configuration needs.

Skyler and Evan will accomplish this by utilizing DNS and DHCP with Caesar's extra network card. One network card will become the default gateway for the other nine servers, while the other line will be inside the local private network 192.168.X.X. In this manner, Caesar will become a dedicated router and default gateway for the private network.

Skyler and Evan both have experience with this concept and have configured it on a Windows based system. Their challenge in this procedure is figuring out how to assign internet sharing with OpenSUSE Linux. They will need two Linux powered machines to develop their solution. One machine with two network cards and one with a sole network card. Both will need a copy of Linux installed.

After completing the tests successfully on Linux machines, they will proceed to setup the design on Caesar. The other machines will only need to be configured with a static address and a default gateway back to Caesar.

Timeline: (just a draft)

  • (2/21) Evan Morse Setting up hardware of test systems
  • (2/22) Evan Morse Researching openSUSE bridging
  • (2/22 - 2/26) Sky Swendsboe Setup experiment linux at home
  • (2/25 - 2/27) Sky Swendsboe Researching linux networking Enviro.
  • (2/28 - 3/6) Sky Swendsboe Start Network tests/commands on SUSE
  • (3/6 - 3/27) Sky Swendsboe Write instruction on results from SUSE Networking
  • (4/17 - 5/8) Sky Swendsboe Arrive at final test with SUSE and slowly implement to Caesar

Backup Mechanism

Assigned to: Bethany Ross

Once the team identifies the files that need to be backed up, they will need to be organized. The team will then find an Open Source Cloud to store these files for protection. This needs to be a free service capable of being accessed from anywhere in case of a failure of the Caesar machine and physical backups. Once this service is discovered the information will be posted to the wiki so that everyone can access the files if needed. This could be helpful if team members would like to make a local copy on their own machines as well.

Timeline:

  • (2/28-3/6) Identify the files that need to be backed up and an open source service to use.
  • (3/6- 3/12) Upload files and post sign in information to wiki.

Experiment Repository

Assigned to: Johnny Mom

The Experiment Repository will include the decoder experiment files created by the current script "run_decode.pl". The script, "run_decode.pl" runs the a Sphinx 3 decode job which creates various files for the specific task name or experiment ID in this case. The Decode experiment script uses the models utilized in the training experiment to decode. The script will be edited to execute a decode job that will organize the files in the correct experiment folders based on ID and the various files created by the job.

The new decoder script will be able to run a decode job depending on the Experiment ID (ex: 1003). The script will be run by typing the command "run_decode.pl <taskname>\n". The <taskname> represents an ID number such as 1003. The script will then create the appropriate folders in /mnt/main/exp/1003. The folder 1003 will include folders such as trains, wavs, config, trans, logs, and more, based on what is create by the decode experiment. This will help users who run an experiment have an easier time seeing where certain files are and it will build structure to the experiment rather than having random files strewn into one folder.

The type of files and structure of folders will also be explained through documentation on foss.unh.edu. The files in the folders created by the script will be explained as to what purpose it serves for Sphinx 3. The folders although not created initially by Sphinx but the folders will be explained as to why those files are in those folders. The categorization of the files will help users understand the significance of the folders and files which in turn will help show how Sphinx works in running a decode job.

The timeline for this section is as follows: Johnny

  • (2/25 - 3/5) Look at current decode script to understand how it currently works and start to document which file and folder needs to be where with a reason.
  • (3/6 - 3/12) Create a new script to automatically create specific experiment folders and place files correctly in those corresponding folders based on what was found.
  • (3/13 - 3/20) Document on FOSS.UNH.EDU on why the decode output created those files and folders based on previous weeks of analysis.
  • (3/1 - 3/26) Finalize and make sure everything works properly.

Timeline may be subject to change.

Speech Modeling

Sample Run

COMMENT: All work for Sample Run excluding Dissemination needs to be complete by March 27th. (Note that you lose a week for Spring Break.)

Data Preparation

Assigned to: Chad Connors

Capture what input files are needed for train and decode and how they were created...this includes dictionary creation.

In order for the team to set up a train and decode process they must first prepare the data that we will be using in the test. In order to do this they will navigate around Caesar to find the current .wav files. They appear to be in in the Sphinx train section under .wav files ending with an .sph file name. This is the Sphere file that the data group will be converting to .wav files to use for the train and decode. It looks as though they will be moving it to another directory which will be more streamlined, which should be called mnt/main/corpus/dist directory.

All sound files are paired with transcripts of what is being said. As of right now there are extra brackets and items in the transcript that can potentially disrupt the train and decode processes. The speech data corp will clean up these files so there will be monitoring of the files while trying to prepare all the information that is needed for the train and decode. In this task the team will work with the information that they have found and look through transcripts and .wav files to document which items are needed for the train and decode. This might be simple observation and documentation of their work or more detailed items depending on where they will be.

Once all .wav files and transcripts are accounted for the team will need a working dictionary. There are currently dictionaries in Caesar that will be inspected to see if it needs to be updated or completely redone. The second issue with dictionaries that are used in speech recognition software is they usually require two instances. First is a language dictionary which is used for standard words (in our case, the English dictionary). The second dictionary which is called the Filler Dictionary is used for non-speech sounds that are mapped to corresponding non-speech or speech-like sound units called phonemes. There will need to be further investigation into whether the team will need both dictionaries, but currently it appears as though only the Language dictionary will be needed. Once all of these tasks are completed, the rest of the team will be able to start the train and decode processes.

Projected Timeline

  • (3/5) Look into current wav files and transcripts get a general idea of what the data group will be doing with the files
  • (3/12) Finish the log of where the transcripts and wav files will be. Track their name, content and all info associated with them. Begin looking into dictionary creation
  • (3/26 Create and finish the dictionary

Language Modeling

Assigned to: Ted Pulkowski

Capture what steps are needed to generate language model.

In order to generate a language model, there are a few requirements and steps that need to be followed. There are two perl scripts that are located in the /media/data/trans directory on Caesar. One is called CreateLanguageModelFromText.perl. This is the script that actually generates the language model. The other is called ParseTranscript.perl. Ted will research how this script works and in what capacity it is needed to generate the language model.The ParseTranscript.perl script is currently located in the /media/data/Switchboard/transcripts/ICSI Transcriptions/trans directory.

  • (3/5)

There are five different files that are created during the process of generating a language model. The script first takes a text file and from that generates a word frequency file. The word frequency file is the used to make a vocab file. This then creates an ID 3 gram file. Once that is completed, two language models are created, one in arpa format, and one in binary format. The text file is simply a copy of the transcripts. As of February 27, the team has not identified the use of the other files. The proposed plan is to read each of the files to learn what each does and why it's required.

  • (3/19)

After learning what data is stored in each file in the /media/data/trans directory, the team will attempt to run both the CreateLanguageModelFromText.perl and the ParseTranscript.perl scripts. The objective will be to gain enough of an understanding of how the scripts and files work together to be able to present the findings to the remainder of the class. If things progress nicely, the language modeling team will have this done by March 19. Given the deadline of March 26th will allow the team ample time so to understand it perfectly.

  • (3/26)

Building and Verifying Models

Assigned to: Aaron Jarzombek & Brice Rader

Capture what steps are needed specifically in training and what it generates that then decoder uses...verification is test on train data.

In order to understand how the Sphinx decode system works Brice and Aaron will first break it down into its base components. By breaking it down they will be able to analyze each step that the system goes through to make a functioning decode. Scripts will be found by digging around on Caesar and locating them within the file system. The run_decode script has already been found at this time, but the others are missing. After the scripts are found, the cat program can be run on the scripts and it will show the contents of the scripts and the directories associated with it.

Once they have acquired the list of files, directories and scripts that the main scripts are using, they will be able to track down what each of the individual scripts do. A catalog of the information, that is called and made with a datasheet to reference, will be much more efficient to understand. The internal commands link the necessary files, directories and scripts with their path name and assign them to a static variable. Knowing the path is key to using the information, unless it was all located in the local directory with the script. This is not usually the case, so exploring the main scripts is necessary. Once the other scripts are found, learning how they work in order to figure out if they are applicable to the trainer and decoder.

Testing of scripts will allow to see see what the output is with the input given in the system already. With some sample data, an attempt to change the input data will take place. This will hopefully give different output. By obtaining a different output, a better understanding will be had on how to manipulate the data to get the information desired. This task could be very cumbersome, but would provide essential information to understand how the decoder works.


To go along with the data is is being manipulated and changed, .wav files will be found and matched to the output from the decoder to the sound. The sounds in the .wav files should be the same as the input values. In Caesar, the .wav files are swxxxxx. If the information comes out differently, discussion training methods will occur. Once the training is matched up well, correct sample output should be had. After this step is complete, the system will be in good shape to run a sample train on the data to make sure everything is correct. Additionally, it might be possible to grab utterances and take the audio file to the utterances and put them together.


  • (3/5) Scripts will be found and be documented on how they work
  • (3/12) A catalog of scripts will be created along with what scripts do
  • (3/19) Testing of scripts will be done
  • (3/26) Take .wav files and match them up with transcripts (Overall progress completed will have documentation of how the perl scripts work that pertain to the decoder and trainer and documentation of how the trainer works

Dissemination

COMMENT: Give approx two weeks to have 4 groups, each led by one of Speech Tools members, repeat training & decoding using new infrastructure


Mini Run

A mini run will take a small part of the audio and transcript, about 1 hour worth, to run through both training and decoding. A mini Switchboard set will need to be created, with the work from the Data Group's transcripts and dictionary. We will also need a language model to run the mini train. We will then move to decoding the audio files and comparing them to the transcripts created by the Data Group.

Training

The objective of training in general is to create an acoustic model. The creation of an acoustic model relies upon a phonetic dictionary. To create an updated acoustic model, both a phonetic dictionary and current acoustic model can be used. The team proposes to complete a "mini train run" in which one hour of audio and transcripts will be used to create a new acoustic model from scratch. The language model they will use will be created with CMU language model toolkit. Performing the "mini train run" will give the team experience with training and help prepare them for a full run of all one hundred hours of audio.

Decoding

To create a mini test set, the team will need to develop a set consisting of a small part of the Switchboard Corpus. They will then need to break up the audio and transcripts to smaller pieces to test. They will then evaluate the performance by taking the dictionary and trained models and decode with them to see how accurate their models come to recognizing the audio.

Full Run

The full train will take all 100 hours of audio and transcripts and run them in parallel on each server in pieces of 10 hours each. We will need to create a bigger more robust language model. We will need all of the transcripts to be cleaned and a dictionary of all 100 hours to be created.

Training

The team will take knowledge from the "mini train run" and put it to use on all one hundred hours of audio. Ideally, parallelization will be used to break up the work load amongst ten separate machines. The "full train run" is complete when one hundred hours of audio result in a new acoustic model.

Parallelization

In the parallelization process, each machine will have a subset of ten hours of audio with which to derive an acoustic model. These separate acoustic models will then be merged together to create one acoustic model. This will signify the completion of one iteration. Theoretically, more accurate models are constructed from many iterations.

Decoding

To decode a full test set, we will utilize all one hundred hours of the Switchboard Corpus. The team will then evaluate the performance of the decoding process by taking the dictionary and trained models and determining with what accuracy the models recognize the audio.