Speech:Models Data Prep

From Openitware
Revision as of 08:41, 10 April 2012 by Cpc2 (Talk | contribs)

Jump to: navigation, search

Class Notes

Speech System Model Building

  1. [Data Preparation]
  2. Language Modeling
  3. Building & Verifying Models

Data Preparation Steps

  • In order to do the Train and Decode we are going to need to have a all our correct files in place. The main 3 things that we are going to need in order to accomplish this is actual audio files in .SPH format, a transcript of the files and working dictionary that has the current words and the phonetic spelling of the word (pronounce names.)
  • You can find a current copy of the transcripts on Caesar under caesar:/media/data/Switchboard/disk1/swb1
  • You can find a copy of the transcripts currently under caesar:~/speechtools/SphinxTrain-1.0/train1/etc its is the file called trans_uneditied.