Speech:Readings


 * Home
 * Semesters - Project Work by Semester
 * [Information]
 * System Description
 * Experiments - List of speech experiments

Project Notes

 * Unix Notes
 * Speech Corpus Setup - Switchboard,  NOAA
 * [Speech Recognition Related Readings]
 * Experiment Setup
 * Scripts Page
 * Model Building - more info on data prep,  language models, &  building models
 * Step 1: Run a Train
 * Step 2: Create the Language Model
 * Step 3: Run a Decode

Related Readings
Academic paper describing functionally and performance differences between Sphinx 3.6 and Sphinx 4.0
 * [[Media:SphinxVerDifferences.pdf]]

Academic paper discussing baseline scores for speech recognition when translated over telephone networks and sources of degradation.
 * [[Media:Paper1.pdf]]

Academic paper describing changes made within Sphinx 3.X for improved efficiency.
 * [[Media:Paper3.pdf]]

Academic paper discussing speech recognition experiments using Sphinx 3.X
 * [[Media:Paper4.pdf]]

Academic paper that discusses converting speech to digital math equations.
 * [[Media:Speech_Recognition_of_Math_Equations.pdf]]

Academic paper that proposes a new framework by refactoring Sphinx4 in a service oriented computing style.
 * [[Media:Sphinx4_Service-oriented_Computing.pdf]]

Academic paper that discusses Learning-Based Auditory Encoding for Robust Speech Recognition.
 * [[Media:Learning-Based_Auditory_Encoding.pdf]]

Academic paper that discusses Recent Advances in Speech Recognition.
 * [[Media:advancesinspeechrecog.pdf]]

Academic paper describing a speech recognition architecture. Second page has a technical Sphinx decoder description.
 * [[Media:Sphinx_architecture.pdf]]

Academic paper shows how use of MWF (filter) during voice recording improves speech recognition performance. Uses Sphinx 4 to test difference.
 * [[Media:Sphinx_noise_test.pdf]]

CMU article on Sphinx 3 vs Sphinx 4 performance comparison.
 * link here

Catalog page with more information on the switchboard audio data: https://catalog.ldc.upenn.edu/LDC97S62 (current as of 3/26/2018)

IBM reports a WER of 5.5%.
 * https://www.ibm.com/blogs/watson/2017/03/reaching-new-records-in-speech-recognition/

The IBM article noted that the most of the speakers in their training data set were also in their testing data sets. Two papers had differing opinions as to whether this could be regarded as cheating (among other useful information on speech recognition in general):


 * https://arxiv.org/pdf/1708.08615.pdf


 * https://arxiv.org/pdf/1703.02136.pdf