Speech:Exps 0257 018

<< 0257 Marcel NOAA exp

In this experiment I build a Statistical Language Model Using CMUCLMTK

The starting point was the translation.txt file that is a corrected version (/mnt/main/corpus/noaa/full/trans).


 * See bellow the steps necessary to generate a new language model.

text2wfreq < /mnt/main/corpus/noaa/full/trans/translation.txt | wfreq2vocab > noaa.vocab text2idngram -vocab noaa.vocab -idngram noaa.idngram < transcript.txt idngram2lm -vocab_type 0 -idngram noaa.idngram -vocab noaa.vocab -arpa noaa.arpa sphinx_lm_convert -i noaa.arpa -o noaa.lm.DMP
 * Generate the vocabulary file. This is a list of all the words in the file:
 * Generate the arpa format language model with the commands:
 * Generate the CMU binary form (DMP)