Speech:Readings

From Openitware
Jump to: navigation, search


Project Notes


Related Readings

Academic paper describing functionally and performance differences between Sphinx 3.6 and Sphinx 4.0


Academic paper discussing baseline scores for speech recognition when translated over telephone networks and sources of degradation.


Academic paper describing changes made within Sphinx 3.X for improved efficiency.


Academic paper discussing speech recognition experiments using Sphinx 3.X


Academic paper that discusses converting speech to digital math equations.


Academic paper that proposes a new framework by refactoring Sphinx4 in a service oriented computing style.


Academic paper that discusses Learning-Based Auditory Encoding for Robust Speech Recognition.


Academic paper that discusses Recent Advances in Speech Recognition.


Academic paper describing a speech recognition architecture. Second page has a technical Sphinx decoder description.


Academic paper shows how use of MWF (filter) during voice recording improves speech recognition performance. Uses Sphinx 4 to test difference.


CMU article on Sphinx 3 vs Sphinx 4 performance comparison.


Catalog page with more information on the switchboard audio data: https://catalog.ldc.upenn.edu/LDC97S62 (current as of 3/26/2018)


IBM reports a WER of 5.5%.


The IBM article noted that the most of the speakers in their training data set were also in their testing data sets. Two papers had differing opinions as to whether this could be regarded as cheating (among other useful information on speech recognition in general):

Helpful Links

Overview of ASR (Automatic Speech Recognition) https://www.youtube.com/watch?v=q67z7PTGRi8&feature=youtu.be

You will refer to this site all through the semester. The main guide for CMU Sphinx3 Toolkit https://cmusphinx.github.io/wiki/tutorial/

A guide to regular expressions https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions

The closest thing to an online textbook for Sphinx3 http://www.cs.cmu.edu/~archan/documentation/sphinxDocDraft3.pdf

A huge and valuable collection of links about Sphinx3 http://www.speech.cs.cmu.edu/sphinxman/fr4.html

The second half is a useful tutorial in Sphinx3 http://www.cs.cmu.edu/~archan/documentation/sphinxDocDraft3.pdf

http://www.cs.cmu.edu/~archan/s_info/Sphinx3/doc/s3_description.html#sec_dec_prune

The CMU Pronouncing Dictionary tool. It gives you the phonemes and lexical stresses http://www.speech.cs.cmu.edu/cgi-bin/cmudict

Udemy course on Perl https://www.udemy.com/learn-perl-in-just-7-days/

an online text editor for testing Perl scripts http://rextester.com/l/perl_online_compiler