Speech:Create LM


 * Home
 * Semesters - Project Work by Semester
 * [Information]
 * System Description
 * Experiments - List of speech experiments

Project Notes

 * Unix Notes
 * Speech Corpus Setup - Switchboard,  NOAA
 * Speech Recognition Related Readings
 * Experiment Setup
 * Scripts Page
 * Model Building - more info on data prep,  language models, &  building models
 * Step 1: Run a Train
 * [Step 2: Create the Language Model]
 * Step 3: Run a Decode

Create the Language Model
Read following before starting: 


 * 1) Replace all instances of:  with your experiment number!
 * 2) *Experiment numbers are 4 digits long (includes any preceding zeros), starting from 0001 to 9999.
 * 3) *Do not include the '<' or '>'.
 * 4) Similarly, replace all items encapsulated in < and > with the appropriate text.
 * 5) * Usually its a filename/path.
 * 6) *Do not include the '<' or '>'.
 * 7) Pay attention as to what directory you execute scripts in!
 * 8) *Certain scripts need to be executed in specific directories.
 * 9) DO copy and paste commands from this page. Do NOT copy and paste multiple commands from this page at once.
 * 10) *Most commands/scripts on this page need specific information added specific to your experiment. If you paste multiple commands at once into the terminal without adding in this information, bad things may result.
 * 11) Percent signs (%) indicate a command to be executed on the shell.
 * 12) *Leave them out  when copying a command from this page.
 * 13) Do NOT execute any of the following commands as root.
 * 14) *While it won't result in any of the following consequences, it does mess up the permissions for any directory and files created during the process.
 * 15) **This effectively blocks others from accessing the data derived from the experiment. Which isn't a very nice thing to do.


 * Please note:
 * The Base Experiment directory is specific to each experiment, and refers to
 * The Root Experiment directory is generic to all experiments, and refers to


 * Failure to pay heed to the above may result in:


 * 1) At best: Script failure.
 * 2) At worst: Data deletion.
 * 3) Very annoyingly: Will create a mess.
 * 4) But most annoyingly: Will create a mess in a publicly used directory such as /mnt/main/Exp.

Steps for Creating the Language Model
September 6th (Cedric Woodbury) - Major changes have been made to the entire process during the Summer 2012 Semester. To see the new revised process click here.

March 22, 2013 (Eric Beikman). The following instructions are current:


 * <font color='green'>Setup the Language Model folder and copy over the unedited transcript.
 * 1) From your <font color='red'>Base Experiment folder make a folder called LM.
 * 2) * % mkdir LM
 * 3) Go into this new directory.
 * 4) * % cd LM
 * 5) Copy over the transcript used from the corpus directory: Put the corpus path you used when creating your transcript (using genTrans.pl) in !
 * 6) * % cp -i /train/trans/train.trans trans_unedited
 * 7) *FOR EXAMPLE: If we are using the <font color='green'>30hr/train corpus:
 * 8) ** % cp -i /mnt/main/corpus/switchboard/30hr/train/trans/train.trans trans_unedited


 * <font color='green'>Prepare the transcript and execute the script that will build the language model.
 * 1) Prepare the transcript:
 * 2) * % parseLMTrans.pl trans_unedited trans_parsed
 * 3) Execute the script:
 * 4) * % lm_create.pl trans_parsed

The Language Model has been created. Move onto Speech:Run_Decode.