Speech:Spring 2016 Proposal

From Openitware
Jump to: navigation, search

Contents

Overview

The 2016 Spring Semester Capstone Project is a project involving research and experimentation in speech recognition using Sphinx, the CMU LM Toolkit, and various other tools, on "Switchboard" Corpus speech data. The primary goal of the project will be to create a world class baseline, to lay the foundation to perform speech recognition research in the future. This will be achieved chiefly through improvements to practical Word Error Rate (WER) on both seen and unseen data, through a variety of changes to the Acoustic Model, Language Model, dictionary, and verification of the Switchboard Corpus speech data.

Work will also be undertaken to streamline, optimize, or replace the supporting hardware, software, and automation implementations, to improve the overall efficiency of the project.

Team Sub-Groups and Process Improvements

Work in the first half of the Spring 2016 Capstone Project can be broadly divided into five primary categories, each headed by a team, with a specific area of chief responsibility. These teams will lead efforts within their subject area with regards to process improvements or development of new processes or script implementations.

For more information on each group's sub-tasks, see the corresponding sub-section for that group.
For more information on each member, see their personal log.

Group Membership

Modeling Group Systems Group Tools Group Data Group Experiments Group
Ben, James, Jonathan S., Ryan Aaron, Michael, Neil, Saverna Daisuke, Jonathan T., Nigel, Thomas Brenden, Brian A., Brian D., Justin Kevin, Matthew, Meagan, Peter

Competitive Groups and Performance Improvements

In the second half of the project all groups will be evenly divided between two teams. These teams will compete with each other to achieve the lowest word error rate on either seen or unseen data. Division into two teams is intended to promote competition and drive better results. The team which performs more poorly will be tasked with creating the final report of all experimentation results.

Modeling Group

Team Members

Overview

The role of the modeling group is to develop effective models to be used in the speech recognition process, be the primary driver towards reducing Word Error Rate (WER), and to serve as "subject matter experts" in the speech recognition field. The group is expected to have an in depth understanding of the topic and any related media, and to serve as advisers to the other groups in all facets of speech recognition research and practical understanding.

There are two models, used by Sphinx, together with the phonetic dictionary, to conduct programmatic speech recognition. In their capacity as subject matter experts and drivers of Word Error Rate reduction, the modeling group is expected to be familiar with, and make modifications to, the following:

  • Acoustic Modeling
In short, an acoustic model contains acoustic properties for each senone. The acoustic model is very complex and contains multiple variables that can be manipulated.
  • Phonetic Dictionary
Phonetic dictionaries contain a mapping from words to phones. This is used by the decoder in recognition of which phones correspond to which spoken words.
  • Language Modeling
The language model defines how likely words are to follow previously recognized words. For example, if we recognize the word “Merry” we can assign a high probability to words such as "Christmas", "Hanukkah", or "Holidays".

Goals

The previous semesters have achieved a baseline recognition result of 38.2%. The modeling group will lead the effort to improve upon this.

To achieve this, the modeling group will make a variety of changes to all available models, dictionaries, configurations, and scripts, as deemed necessary. Due to the largely emergent nature of the research involved, all timelines presented here are tentative and may be subject to change, and all tasks will be globally assigned to all modeling group members, with the responsibility for their completion being shared equally among members.

Plan

By the dates, below, the modeling group will, at minimum, have achieved the indicated milestones.

Implementation Timeline
Feb 24th, 2016

  • Replicated and verified the previous semester's best result
  • Performed research into training and decoding with Sphinx and the CMU Toolkit
  • Assumed some portion of their advisory role to non-modeling-group team members.

Mar 2nd, 2016

  • Performed one experiment intended to reduce word error rate and reported their results
  • Continued and expanded in their advisory role to non-modeling-group team members
  • Added documentation of their changes to the team wiki, either in tutorial modifications or logs

Mar 9th, 2016

  • Performed two experiments intended to reduce word error rate and reported their results
  • Assisted the Data group in their analysis and validation of the Switchboard corpus speech data
  • Expanded in their advisory role to non-modeling-group members
  • Added documentation of their changes to the team wiki, either in tutorial modifications or logs

Implementation Tasks
Ben

  • Act as group supervisor due to previous Sphinx experience.
  • Coordinate with other teams to disseminate information and act as a train/decode/score advisor.
  • Continue to research Sphinx language models and variables that affect the WER.

Jon

  • Update train/decode wiki tutorial to reflect modeling group research.
  • Create vocabulary document of commonly used terminology
  • Continue to research Sphinx language models and variables that affect the WER.

Ryan

  • Coordinate with other teams to disseminate information and act as a train/decode/score advisor.
  • Research the feasibility of switching to phonetically-tied mixture acoustic models (PTM) in order to reduce training time while preserving accuracy
  • Continue to research Sphinx language models and variables that affect the WER.
  • Look for inconsistencies in wiki tutorials and update and modify the wiki tutorials with proper wiki markup format.

James

  • Responsible for running the train/decode/score process and tracking results.
  • Coordinate with other teams to disseminate information and act as a train/decode/score advisor.
  • Continue to research Sphinx language models and variables that affect the WER.

Systems Group

Team Members

Overview

The focal objective of the Systems Group is to upgrade the equipment that the project is presently running on. The current Dell PowerEdge 1750 servers are being replaced because the current servers are old and it is an incumbent need to have an effective system of servers functioning competently. Due to the age of the existing supplemental Dell PowerEdge 1750 drones, Caesar is essentially the only server that is being utilized at the moment to run experiments.

The group has been assigned five Dell PowerEdge 1950 servers, which will replace the old equipment. Care must be taken, as there is a student who is working on an independent project and is using two of the now installed servers. As such, the group is not to remove these systems as it is crucial that they remain online.

Once the equipment has been fully installed and configured, the team will be converting the Dell PowerEdge 2950 that is currently in the rack into a new server that will be named "Rome." This drone will act as the host of the IRC (Internet Relay Chat) server and the Systems Group will be responsible for installing and managing. In addition to this, the group will be expected to be work with the Experiments Group to fix the permissions for the experiments folder.

Goals

The goal of the Systems Group will be to replace five PowerEdge 1750 servers with five newer PowerEdge 1950 servers. Once this replacement and configuration is completed, the Systems Group will allocate one server to the Tools Group for experimentation and testing.

The remaining four servers will be used for developing a world class baseline for the Speech Project that students will be using to do research in the future. These servers will spread the load of running experiments during the course of the project, so that Caesar would not be the only unit used for development.

The team will also be responsible for creating an IRC server on Rome with the hope to facilitate ease and encourage communication between teams during this phase and duration of the project.

The Systems Group will be updating the following Wiki’s documentation below:

  • Relocate Red Hat installation instructions to a more visible place on Wiki, and update these instructions to broader installation since the current documentation is geared towards Caesar being replaced.
  • Networking configuration
  • Update the physical equipment page with current 1950 specs
  • Fix the and verify actual labels on the servers
  • Add documentation regarding the installation, usage, and management of the soon-to-be IRC server.

The group will be creating new documentation for the following:

  • Dell PowerEdge 1950 installation
  • Dell PowerEdge 2950 (Rome) conversion
  • IRC server installation, mangement, and usage

Plan

Implementation Timeline:
Feb 22nd, 2016

  • Replace the designated Dell 1750 servers with Dell 1950 servers
  • Started Red Hat installations

Mar 2nd, 2016

  • Red Hat installed and network settings configured for most, if not all new equipment
  • Bugs in the installation have been documented and resolved
  • Allocated Majestix to Tools group
  • Servers are now properly and visibly labeled

Mar 9th, 2016

  • Permissions have been addressed and fixed on the experiments directory
  • All drones have been pointed to Caesar’s /mnt/main
  • Began or completed conversion of Dell PowerEdge 2950 into Rome
  • Began IRC server installation and configuration
  • The documentation for Red Hat installation has been moved and updated
  • The hardware configurations page has been updated to reflect the current equipment

Mar 16th, 2016

  • IRC has been installed and configured, Rome is now fully operational.
  • Rome has been documented as well as the IRC documentation

Implementation Tasks:
Mike

  • Install new five Dell PowerEdge 1950 servers
  • Install and configure Red Hat
  • Configure networking on new equipment and point drone servers to Caesar

Saverna

  • Install new five Dell PowerEdge 1950 servers
  • Install and configure Red Hat
  • Notes/Initial documentation of server installs, Red Hat installs, bugs etc.

Neil

  • Install new five Dell PowerEdge 1950 servers
  • Install and configure Red Hat

Aaron

  • Install new five Dell PowerEdge 1950 servers
  • Install and configure Red Hat
  • Fix the hardware configurations page
  • Configure "Rome" server for IRC implementation
  • Install and configure IRC on "Rome"

Tools Group

Team Members

Overview

The project contains many software tools. The responsibility of ensuring that these tools and applications are maximally contributing to the overall project goal of creating a world class baseline rests on the tools group. This is accomplished by defining the software tools currently utilized, researching upgrades to these tools, and exploring potential software additions. Documentation on all of these functions is also a key component of this group, taking place on specific tool's pages, the 'Speech Software Functionality' page, the 'Speech Recognition Related Readings' page, and the team member's logs.

Goals

In order to accomplish these responsibilities, the tools group will document tool versions currently utilized by the project and determine the most recent versions of these tools on the 'Speech Software Functionality' page. Articles that provide insight into the comparison of the speech tools will be documented in the 'Speech Recognition Related Readings' page. These articles will be used to determine if any of the current speech tools will require upgrading.

The sox application is currently not running because it is missing some dependencies. This will be fixed and documented. A hotfix directory will be created on Caesar at /mnt/main/local/lib/hotfix for sox dependencies and for any other tools that encounter dependency issues in the future.

Every user of the project has root access. This is dangerous because it can lead to accidental deletion and editing of files on Caesar which is not fully backed up. An incremental backup system will be implemented to enable version history and prevent data loss. Hard link backups will be used rather than full backups in order to reduce data consumption.

Additional tools will also be researched, tested, and installed on Majestix, a sandbox server. These tools include emacs, a more convenient and feature-full text editor, screen, a console session sharing tool that can be useful for long distance collaboration, and tree, a useful program that allows the user to view the structure of a file system. A means to share documents will also be researched in order to enhance collaboration.

Plan

By the dates, below, the tools group will, at minimum, have achieved the indicated milestones.

Implementation Timeline
Feb 24th, 2016

  • Research collaboration tool and make a manual for the group.
  • Research Sphinx Trainer, document, and make recommendation on upgrade if needed.
  • Research CMU Dictionary, document, and make recommendation on upgrade if needed.
  • Research Sphinx Decoder, document, and make recommendation on upgrade if needed.
  • Research CMU Language Model Toolkit, document, and make recommendation on upgrade if needed.

Mar 2nd, 2016

  • Research Sphinx base, document, and make recommendation on upgrade if needed.
  • Research CMU Cam Toolkit, document, and make recommendation on upgrade if needed.

Mar 9th, 2016

  • Repair and document sox fix.

Mar 16th, 2016

  • Research SCLITE, document, and make recommendation on upgrade if needed.
  • Develop and implement a backup system for Caesar system level files.
  • Create a hotfix directory on Caesar.
  • Install and configure tree on Majestix. Document tree on wiki.
  • Install and configure emacs on Majestix. Document emacs on wiki.

Implementation Tasks
Daisuke Matsukura

  • Research collaboration tool and make a manual for the group.
  • Research Sphinx base, document, and make recommendation on upgrade if needed.
  • Research CMU Cam Toolkit, document, and make recommendation on upgrade if needed.
  • Research SCLITE, document, and make recommendation on upgrade if needed.

Thomas Rubino

  • Repair and document sox fix.
  • Develop and implement a backup system for Caesar system level files.
  • Create a hotfix directory on Caesar.

Nigel Swanson

  • Research Sphinx Trainer, document, and make recommendation on upgrade if needed.
  • Research CMU Dictionary, document, and make recommendation on upgrade if needed.
  • Install and configure tree on Majestix. Document tree on wiki.

Jonathan Trimble

  • Research Sphinx Decoder, document, and make recommendation on upgrade if needed.
  • Research CMU Language Model Toolkit, document, and make recommendation on upgrade if needed.
  • Install and configure emacs on Majestix. Document emacs on wiki.

Data Team

Team Members

Overview

The Data Group is responsible for Switchboard Corpus data that is used to run trains and decode to establish a low word error rate and create a world class baseline for speech recognition. The Switchboard Corpus data is composed of 256 hours of audio telephone conversations which are broken into conversation files, containing a smaller subset of audio between two individuals, as well as utterance files which further segment the audio into files which capture a specific phrase or sentence spoken by a single individual. All audio files correspond with a transcript file which textualizes the spoken conversations and utterances to ensure proper analysis when creating a baseline. The Data Group manages, organizes and ensures the integrity of these audio files to guarantee accurate and true results for word error rate calculations that will help determine a world class baseline. While tweaks to software configurations and longer trains on data will decrease WER (Word Error Rate) percentages, if the Capstone Project train on inaccurate and incomplete data, the results produced from training and decoding will be greatly flawed.

Goals

Previous semesters have focused on organization and management of the Switchboard Corpus audio data with little attention paid to ensuring complete and error-free audio files. Upon analysis of the audio files, the Data Group has identified over 250,000 utterance files that will need to be evaluated and documented as valid data. Due to the large amount of files that would need to be checked, a random sample of 1% of the data will be surveyed by the group and their findings documented. Audio files will be compared to transcript files as well to ensure that not only the audio is clear and error-free, but that the spoken word matches the transcript text exactly. To reach the goal of an evaluation of 1% of the data, each team member will:

  • Convert to .wav and listen 60 utterance audio files a week.
  • Document all potential problems regarding these files.
  • Search for 256 utterances that are suspected to be missing. Experiment 0272/004 revealed that some utterance files may be missing as the transcript lists audio files that have produced the same transcript over and over again.
  • Fix or remove any unnecessary soft links.

Plan

The Data Group will use some UNIX commands to collect a random sample of data from our data set. To reach a random sample size of 1%, roughly 2500 audio files of the 250,000 will be evaluated this semester. With 4 group members, this is approximately 60 audio files that will be converted from .sph data files to .wav files and evaluated by each member, each week. Annotations will be made for checked audio files so that future teams will be able to pick up where the Spring 2016 semester left off. SoX, though not currently working on Caesar, will be run on personal laptop machines and will be used to convert .sph files to .wav format to easy listening. Utterance audio files will be compared to transcript files to ensure they are accurate. Through this process, soft links between directories will be verified and clean up of unnecessary broken soft links and any other unneeded data will occur.

Implementation Timeline

Feb 24th, 2016 - Mar 1st, 2016

  • Compare at least 240 of the utterances.
  • Document each utterance that can cause problems with train.

Mar 2nd, 2016 - Mar 8th, 2016

  • Compare at least 240 of the utterances.
  • Document each utterance that can cause problems with train.
  • Search for 256 missing utterances.

Implementation Tasks

Justin

  • Create random sample of 250 audio files per week
  • Review a minimum of 60 utterances each week.
  • Search for the 256 missing utterances.
  • Search/fix broken soft links in Caesar.

Brenden

  • Review a minimum of 60 utterances each week.
  • Search for the 256 missing utterances.
  • Search/fix broken soft links in Caesar.

Brian A.

  • Review a minimum of 60 utterances each week.
  • Search for the 256 missing utterances.
  • Search/fix broken soft links in Caesar.

Brian D.

  • Review a minimum of 60 utterances each week.
  • Search for the 256 missing utterances.
  • Search/fix broken soft links in Caesar.

Experiments Group

Team Members

Overview

Spring 2015 Experiment group focused on developing a way to easily generate an experiment on foss.unh.edu for the purpose of documenting ongoing experiments. Spring 2016 Experiment group is not only expanding on Spring 2015's goal by developing two scripts addExp.pl and mkDec.pl, but also creating and organizing a software repository and archival of old experiments. The end goal this year for the Experiment group is focused on organizing Caesaer which has grown in the past six years without a structured archival system in both experiments and user generated scripts. Completion of these projects will help assist future semesters in creating a world class baseline by making it easier to understand the structure of Caesar and easier to get started on training and decoding experiments.

Goals

The end result of what the Spring 2016 Experiment group is attempting to achieve is to make documenting a organizing on foss.unh.edu and Caesar easier. the following scripts addExp.pl and mkDec.pl will achieve this goal in the following manner.

  • addEx.pl is essentially going to combine the already existing createWikiExperiment.pl and createWiki_Sub_Experiment.pl scripts developed by Spring 2015. Both of those scripts currently walk a user through the creation of an experiment on foss.unh.edu so a user doesn't need to access two systems to start an experiment. Right now you have to know is an experiment exists and if it does run createWiki_Sub_Experiment.pl. addExp.pl will make a function out of both of those scripts and ask the user the question "Does the experiment exist?" If it does call one function if not call the other function.
  • mkDec.pl will allow the user to decode on previously trained experiments data by inserting the path to the experiment designated by the user during the execution of the script. It will take the experiment directory the user wishes to decode on as a parameter as well as the necessary configurations the user wished to decode with.
  • The software repository will expand on the already existing system located in /mnt/main/scripts/user/History/ where a directory will exist inside History with the name of the script and inside that directory will be directories one through however many versions have been generated up until a directory named cur which will hold the current working script that should be utilized during training and decoding operations. All of the deprecated scripts will be moved to the /mnt/main/scripts/user/DELETE directory where they will be kept in case they are needed in the future.
    • As a secondary goal for the software repository we will be properly documenting all the current scripts on foss.unh.edu
  • The archival process will take all the previous semesters experiments and store them in the /mnt/main/Exp directory inside a directory that corresponds with the year they were generated. For example /mnt/main/Exp/sp13 will hold all of the experiments generated by the Spring 2013 semesters Capstone.

Plan

By the dates, below, the experiment group will, at minimum, have achieved the indicated milestones.

Implementation Timeline
Feb 24th, 2016

  • Get the first version working for addExp.pl.
  • Get the first version working for mkDec.pl
  • Organize notes on scripts for the software repository.
  • Finish the archival of experiments from previous semesters.

Mar 2nd, 2016

  • Begin the pilot phase of addExp.pl for general usage.
  • Get the second version working for mkDec.pl
  • Implement the movement of deprecated scripts and versions of scripts for the software repository.

Mar 9th, 2016

  • Document changes to createWikiExperiment.pl and createWiki_Sub_Experiment.pl that makes addExp.pl.
  • Confirm the final version of addExp.pl is function, if not make changes as necessary.
  • Begin the pilot phase of mkDec.pl for general usage.
  • Document the changes the software repository changes.

Implementation Tasks
Kevin Soucey

  • Starting the creation of addExp.pl with the rest of the group as support

Meagan Wolf

  • Creation of the software repository and organizing of experiment directories.

Peter Ferro & Matthew Heyner

  • Development of prepareDecode.pl script
  • Prepare for creation of the script
  • Begin creation of script
  • Assess where we are in the process of the prepareDecode.pl script

Matthew Heyner

  • Archival of older experiments
  • Work on proposal for Experiment Group