Speech:Spring 2012 Poster Notes


 * Home
 * Semesters
 * Spring 2012
 * Proposal
 * Report

Abstract
The goal of the Capstone Program at the University of New Hampshire at Manchester is to provide students with hands-on work experience that will enhance their degree and help prepare them for successful careers. Major employers consider a "Capstone" project a key part of gaining knowledge and preparing students for future job positions. The class will use the Sphinx Speech Recognition Toolkit, further the research, and development of pervious semesters work. The goal is to execute a "mini train set", however, the team intends to improve upon many other areas including organization of the hardware and software of the system, documentation of work progress, and creating backup mechanisms. In conclusion, the Spring 2012's class should leave future groups in position to pursue the execution of the "full train set" and improve upon the acoustic model. This will improve the accuracy of decoding with Speech software and improve overall speech recognition technology.

(New) The goal of the Capstone Program at the University of New Hampshire at Manchester is to provide students with hands-on experience that will enhance their degree and help prepare them for successful careers. Employers consider Capstone a key part of a student’s educational background in gaining knowledge and preparing for future job positions. The class will use CMU’s Sphinx Speech Recognition Toolkit, further the research, and development of previous semester’s work. The goal is to learn and apply speech recognition technology as well as to improve upon many other areas including organization of hardware and software of the system, documentation of work progress, and creation of backup mechanisms. In conclusion, the 2012 spring semester class should leave future groups in position to pursue further research to improve upon the overall quality of speech recognition with its primary goal of improving the accuracy of the technology.

Solution
I just put in reference to the full train since that is what is on the poster.... If it overlaps the train and decode section too much feel free to edit it

Our solution is to run a full train and decode set, which includes 100 hours of audio and transcripts. We will run them in parallel on 10 separate servers. Each server will be responsible for processing 10 hours of audio. At the end of each process we will create separate acoustic models. These acoustic models will then be merged together to create one improved upon model. This process will improve the accuracy of decoding with Speech software and improve overall speech recognition technology.

History of Speech
1950's - Olson and Belar of RCA Laboratories built a system to recognize 10 syllables of a single talker

1950's - Forgie and Forgie built a speaker-independent 10-vowel recognizer.

1960's - A vowel recognizer of Suzuki and Nakata at the Radio Research Lab in Tokyo

1960's - Sakai and Doshita at Kyoto University created the digit recognizer of NEC Laboratories

1962 - IBM at the World's Fair in Seattle, displayed a device they built called the "Shoebox". Saying the capability to recognize 16 spoken words including 0-9 "plus", "minus" and "total".

Present - Leading speech recognition companies include Dragon Systems, IBM, and AT&T.

Infrastructure



 * OpenSUSE 11.3: A free and open source operating system that is built on the Linux kernel. OpenSUSE is developed and maintained by the openSUSE Project. OpenSUSE has both a command line interface and a graphical user interface (GUI).
 * Sphinx 3: An open source toolkit that is used for speech recognition. Sphinx was developed and maintained by Carnegie Mellon University (CMU). Sphinx 3 is written in C and intended to be used for research.
 * CMU-Cambridge Statistical Language Modeling toolkit: A collection of UNIX tools that are used to construct and test statistical language models. This toolkit is maintained by CMU.
 * SphinxTrain: The trainer used by Sphinx to identify phones in the training corpus. The trainer was developed and maintained by CMU.

Training and Decoding
The purpose of the training and decoding is to get speech recognition working. Created a "mini-train" which takes conversations saved in .wav file format and matches it up with the transcript of the conversation to create a speech recognition tool. The words are matched up from the transcript to the audio to relate words together. In order to complete this process a dictionary of every word that is transcribed along with a phonemes dictionary that has every word listed in the dictionary. Decoding simply tests the to see if the train worked how accurate it was. This is completed by running one script; run_decode.pl