Speech:Scripts


 * Home
 * Semesters - Project Work by Semester
 * [Information]
 * System Description
 * Experiments - List of speech experiments

Project Notes

 * Unix Notes
 * Speech Corpus Setup - Switchboard,  NOAA
 * Speech Recognition Related Readings
 * Experiment Setup
 * [Scripts Page]
 * Model Building - more info on data prep,  language models, &  building models
 * Step 1: Run a Train
 * Step 2: Create the Language Model
 * Step 3: Run a Decode

Overview
/mnt/main/scripts/user is the root directory for the Capstone user generated scripts, meaning scripts students have created over the years. The History directory is our version control where we have directories named after the scripts in the root software repository. Those directories named after the scripts contain a set of directories in the format 1 2 3 4 cur where the numbered directory is where that version of the script is located for reference purposes and the cur directory is where the current version of the script is kept. /mnt/main/scripts/user --> Root software repository directory |                       History --> Software repository version control |                       %nameOfScript% --> Named after script we want to have versions for |                                1...cur --> directories between 1 to cur for the versions of the script
 * Important to Note: the script located in the History/%script%/cur directory should be the same one that gets called in the root software repository directory

As of 3-5-2016 /mnt/main/scripts/user has been added to users login $PATH variable allowing users to call the scripts located within that directory without the need to type the entire path. This was accomplished by creating a script named scriptsPath.csh in the directory /etc/profile.d the script simply runs the following command on user login. set path = ($path /usr/local/bin /mnt/main/scripts/user)

Production

 * linkTransUtt ( Created 2017 ) Creates utterance soft links to main utterance files (for use in creation of new corpora).
 * genUttAudio ( Created 2016 ) Generates the corresponding audio utterance files given a transcript file
 * makeCorpus ( Created 2016 ) Generates the directory structure for corpora.

Used for Training

 * makeTrain - ( Was prepareTrainExperiment.pl ) This script prepares an experiment directory for running a train, performing all steps up to genFeats.pl.
 * genTrans - ( Updated 2016 ) This script generates transcripts and wave files.
 * pruneDictionary - This script prunes master dictionary, creating a new dictionary with only words we are interested in. Used in makeTrain.pl current version
 * genFeats - ( Updated 2016 ) This script is the second part to makeTrain.pl, which generates the feats files and changes the symbolic link of the wav directory to the source audio.
 * linkTransAudio ( Created 2016 ) Creates utterance soft links to main utterance files.
 * lm_create - ( Created 2014 ) This script creates a Sphinx Language Model from a text file.

Used for Decoding

 * makeTest ( Created 2016 ) Prepares the decode from a source acoustic model/train.
 * parseDecode - ( Created 2014 ) This script generates a list of hypothesis (decoded) transcripts from decode output.

Dictionary Creation/Manipulation

 * updateDict - This script takes a list of words and the according pronunciations, adds them in sorted order to dictionary.
 * find - Looks like this just searches for a term in the cmudict.0.6d dictionary specifically.
 * dictionary - Compares list of words in file to words in a dictionary and outputs words available with pronunciations.
 * copySph - ( Created 2014 ) This script will make symbolic links to all the required sph files that are noted in a transcript file located in a particular corpus directory.

Misc

 * createExp ( Created 2017 ) Allows user to create a train, language model or decode more easily. Can also run all at the same time, will take you step by step.
 * copyExp ( Created 2017 ) Allows user to copy full experiment, train files, or decode files into another sub directory, making duplicating experiments easy.
 * addExp ( Created 2016 ) Combines the functionality of createWikiExperiment.pl and createWiki_Sub_Experiment.pl. into one script.
 * convert - ( Created 2014 ) This script makes symbolic links to all required sph files from a transcript file located in a corpus directory.
 * checkTrain - ( Created 2014 ) This Script checks the training trans against the .sph files to see if they match.
 * createTranscript - ( new for 2014 ) This script will create a transcript where the spoken dialog lasts for the amount of time specified by length_of_time.
 * createSubTranscript - ( new for 2014 ) This script extends createTranscript further by using the same algorithm we used to calculate the corpus size. It takes three arguments, the base transcript, the duration in hours and the start time in hours.
 * gen_errors - This script is used when training the acoustic model for an experiment.
 * genFileIDs - This script will generate wave file ID's for transcripts.
 * sampleTrans ( Created 2016 ) Creates samples transcripts of a given transcript.

Deprecated

 * createWikiExperiment - ( Created 2015 )( Updated 2016 )This script grabs the next available experiment number, adds a link the the experiment page on the Speech:Exps page, and creates a wiki page for the new experiment.
 * createWiki_Sub_Experiment - ( Created 2015 )( Updated 2016 )This script grabs the next available sub experiment number for a given experiment, adds a link the the sub experiment page on the main experiment's page, and creates a wiki page for the new sub experiment.
 * prepareTrainExperiment - ( Created for 2015 )( Name changed to makeFeats.pl ) This script prepares an experiment directory for running a train, performing all steps up to genFeats.pl.
 * buildData - ( Created 2014 )( Obsolete ) Links and copies all files needed to run a train from a corpus data subset.
 * train_01 - ( Created 2014 )( Obsolete ) This script sets up the experiment directory.
 * train_02 - ( Created 2014 )( Obsolete ) This script sets up the experiment configuration file.
 * trans_time - ( Created 2014 )( Obsolete ) This script counts the lines and durations of a transcript file.
 * mkDec ( Created 2016 ) ( Obsolete ) Executes a decode and creates the corresponding directories. Deprecated for makeTest.pl
 * clone_exp - ( Created 2013 )This script will clone an experiment.
 * master_run_train - ( Created 2014 )( Obsolete ) Training Master script