Speech:Spring 2012 Poster Notes

From Openitware
Revision as of 08:03, 16 April 2012 by Cpc2 (Talk | contribs)

Jump to: navigation, search


Notes


Poster Notes

First Template for poster

  • Abstract

The goal of the Capstone Program at the University of New Hampshire at Manchester is to provide students with hands-on work experience that will enhance their degree and help prepare them for successful careers. Major employers consider a "Capstone" project a key part of gaining knowledge and preparing students for future job positions. The class will use the Sphinx Speech Recognition Toolkit, further the research, and development of pervious semesters work. The goal is to execute a "mini train set", however, the team intends to improve upon many other areas including organization of the hardware and software of the system, documentation of work progress, and creating backup mechanisms. In conclusion, the Spring 2012's class should leave future groups in position to pursue the execution of the "full train set" and improve upon the acoustic model. This will improve the accuracy of decoding with Speech software and improve overall speech recognition technology.

(New) The goal of the Capstone Program at the University of New Hampshire at Manchester is to provide students with hands-on experience that will enhance their degree and help prepare them for successful careers. Employers consider Capstone a key part of a student’s educational background in gaining knowledge and preparing for future job positions. The class will use CMU’s Sphinx Speech Recognition Toolkit, further the research, and development of previous semester’s work. The goal is to learn and apply speech recognition technology as well as to improve upon many other areas including organization of hardware and software of the system, documentation of work progress, and creation of backup mechanisms. In conclusion, the 2012 spring semester class should leave future groups in position to pursue further research to improve upon the overall quality of speech recognition with its primary goal of improving the accuracy of the technology.

  • Solution

I just put in reference to the full train since that is what is on the poster.... If it overlaps the train and decode section too much feel free to edit it

Our solution is to run a full train and decode set, which includes 100 hours of audio and transcripts. We will run them in parallel on 10 separate servers. Each server will be responsible for processing 10 hours of audio. At the end of each process we will create separate acoustic models. These acoustic models will then be merged together to create one improved upon model. This process will improve the accuracy of decoding with Speech software and improve overall speech recognition technology.

  • History of Speech

1950's - Olson and Belar of RCA Laboratories built a system to recognize 10 syllables of a single talker

1950's - Forgie and Forgie built a speaker-independent 10-vowel recognizer.

1960's - A vowel recognizer of Suzuki and Nakata at the Radio Research Lab in Tokyo

1960's - Sakai and Doshita at Kyoto University created the digit recognizer of NEC Laboratories

1962 - IBM at the World's Fair in Seattle, displayed a device they built called the "Shoebox". Saying the capability to recognize 16 spoken words including 0-9 "plus", "minus" and "total".

Present - Leading speech recognition companies include Dragon Systems, IBM, and AT&T.

Infrastructure

openSUSE logo

  • OpenSUSE 11.3: A free and open source operating system that is built on the Linux kernel. OpenSUSE is developed and maintained by the openSUSE Project. OpenSUSE has both a command line interface and a graphical user interface (GUI).
  • Sphinx 3: An open source toolkit that is used for speech recognition. Sphinx was developed and maintained by Carnegie Mellon University (CMU). Sphinx 3 is written in C and intended to be used for research.
  • CMU-Cambridge Statistical Language Modeling toolkit: A collection of UNIX tools that are used to construct and test statistical language models. This toolkit is maintained by CMU.
  • SphinxTrain: The trainer used by Sphinx to identify phones in the training corpus. The trainer was developed and maintained by CMU.