Speech:Master run train.pl

Summary
Title: master_run_train.pl

Author: Joshua Anderson

Location: mnt/main/scripts/user/master_run_train.pl

Date: Feb, 2014

Modified: Mar, 2014

NOTE: I plan on releasing this formally on the class meeting slated for March 19th

Revisions History:
 * Version 2.5: March 16th, 2014
 * Added the following functionality:
 * Included the last 2 steps in Running a Train:
 * Insert 'SIL' in the .phone file located in the /etc folder.
 * Run the make_feats.pl script to create the feats directory.
 * Add the ability for the script to automatically determine the next available Experiment Number when user wants to create a new Master Experiment (/mnt/main/Exp/0200).
 * Add the ability for the script to prompt user for a Dictionary file they want to use instead of hard-coding the cmudict.0.6d one. This will significantly add more flexibility as this file changes often.
 * Version 2.0
 * Added functionality to create child-experiments inside one that was already created
 * Version 1.5
 * Added functionality to be able to read all outputs from Perl script I call inside my master_run_train.pl script.
 * Version: 1.0
 * Initial release

Usage:

This script is built as a wizard type program. It will walk through each step and simply prompt the end user for input rather then having to call multiple scripts with confusing parameter names: this script takes the input and call those itself.

Dependencies: Scripts this master script uses that have been modified:
 * Creating a MASTER Experiment (/mnt/main/Exp/0200)
 * exp_dir_setup - Master Experiment Directory Setup script (was train_01.pl) used to setup the new Experiment directory by creating all necessary sub-folders and copying over essential Perl scripts.
 * exp_sphinx_config - Master Sphinx Configuration script (was train_02.pl) used to configure the sphinx_train.cfg file with new Experiment data and custom parameter values.
 * Creating a CHILD Experiment (/mnt/main/Exp/0200/d8/s2000)
 * child_exp_dir_setup - Child Experiment Directory Setup script (was train_01.pl) used to setup the new Experiment directory by creating all necessary sub-folders and copying over essential Perl scripts.
 * exp_sphinx_config - Child Sphinx Configuration script (was train_02.pl) used to configure the sphinx_train.cfg file with new Experiment data and custom parameter values.

Description
This script is designed to help ease the tedious process of Running a Train. The base foundation of this script is to instead having the user call the script and passing a large number of arguments with hard-to-understand argument titles (i.e. -a, -e, -n, -x etc ...), the user just inputs the required arguments as input through the terminal.

Future Features
 * Add more error checking after receiving input
 * Create a fast parameter driven execution version of this script
 * i.e.
 * Include all steps
 * Insert 'SIL' in the phones file
 * Generate the feats
 * (Added in Version 1.5) Show any outputs from scripts I call inside the master_run_train.pl script as they happen. Currently when I call the genTrans5.pl script. that has a foreach loop that prints out the % of work done after it completes logic for an iteration. However the only output I get back is "Status: 100% Complete" ... not exactly what I want. Currently looking for solutions to this.

Code
NOTE: Some of the formatting is off because of how MediaWiki renders it. Version 2.5
 * 1) !/usr/bin/perl

=comment

NOTE: This script is still being built.

Title: master_run_train.pl Author: Joshua Anderson Date Released: Feb 22nd, 2014 Date Modified: March 16th, 2014 Version: 2.5 Summary: This script ecompasses a number of scripts that go through the process of Running a Train (http://foss.unh.edu/projects/index.php/Speech:Training). The reason this script exists is to simplify this process and make it more streamlined for those who have rely on running a significant number of scripts during the semester. By only calling this script and passing ALL the required arguments for the actual scripts themselves, it will save a good chunk of time.

=cut

use Getopt::Std; use File::Path; use File::Basename; use Tie::File; use Cwd; use POSIX ":sys_wait_h"; use IPC::Open3; use IO::Select;

$SCRIPT_NAME = "master_run_train2.pl"; $VERSION_NO = "2.5";
 * 1) Global variables for this script.

$EXPERIMENT_NUMBER; $EXPERIMENT_DIR_ABSOLUTE_PATH = "/mnt/main/Exp";
 * 1) TODO: Change this to the /mnt/main/Exp
 * 2) $EXPERIMENT_DIR_ABSOLUTE_PATH = "/mnt/main/home/sp14/jsm69/Exp";

$EXPERIMENT_DIR_SETUP_SCRIPT = "/mnt/main/scripts/user/exp_dir_setup.pl"; $EXPERIMENT_SPHINX_CONFIG_SCRIPT = "/mnt/main/scripts/user/exp_sphinx_config.pl";
 * 1) Scripts that are called within this script.

$CHILD_EXPERIMENT_DIR_SETUP_SCRIPT = "/mnt/main/scripts/user/child_exp_dir_setup.pl"; $CHILD_EXPERIMENT_SPHINX_CONFIG_SCRIPT = "/mnt/main/scripts/user/child_exp_sphinx_config.pl";

$GEN_TRANS_SCRIPT = "/mnt/main/scripts/user/genTrans5.pl"; $PRUNE_DICTIONARY_SCRIPT = "/mnt/main/scripts/train/scripts_pl/pruneDictionary2.pl";
 * 1) TODO: Change this pruneDictionary script to /mnt/main/scripts/train/scripts_pl/pruneDictionary2.pl when David puts it there.
 * 2) $PRUNE_DICTIONARY_SCRIPT = "/mnt/main/scripts/user/pruneDictionary2_josh.pl";

$EXAMPLE_DICTIONARY_FILE = "cmudict.0.6d"; $EXAMPLE_DICTIONARY_FILE_CUSTOM = "custom/cmudict.0.6d.custom"; $DICTIONARY_ROOT_PATH = "/mnt/main/corpus/dist/"; $DICTONARY_CUSTOM_ROOT_PATH = "/mnt/main/corpus/dist/custom/"; $SELECTED_DICTIONARY;
 * 1) The example dictionary file below will be used to show end users a common dictionary file that could be used

$FILLER_DICTIONARY_FILE_TO_COPY = "/mnt/main/root/tools/SphinxTrain-1.0/train1/etc/train1.filler"; $GEN_PHONES_SCRIPT_TO_COPY = "/mnt/main/scripts/user/genPhones.csh . "; $MAKE_FEATS_SCRIPT = "/mnt/main/scripts/train/scripts_pl/make_feats.pl";

$IS_CHILD_EXPERIMENT = 0; $CHILD_EXP_PATH; $CHILD_EXP_DB_NAME;
 * 1) By default, we will set this as false.

$PART_2_READY = 0; $PART_3_READY = 0; $PART_4_READY = 0; $PART_5_READY = 0;
 * 1) Variables to notify if User is ready to move onto next step.

$help = "$SCRIPT_NAME: The Training Master Script.\n";
 * 1) Modify this help message with something more useful.

getopts('h', \%my_args) or die "$help";
 * 1) Get the help argument flag

if($my_args{h}) { print "$help"; exit; }

print_master_header;
 * 1) Print the Introduction header.

{	##################################################	# Part 1. Getting Information about this Experiment # We need to find out if this experiment is a new MASTER experiment OR a new CHILD experiment. # A MASTER experiment is one that someone creates a new Experiment Number on the /mnt/main/Exp directory. # A CHILD experiment is one that somone creates a new Experiment INSIDE a MASTER experiment # i.e. /mnt/main/Exp/0200/d12/s2000 #	# The reason why someone would create a CHILD experiment would be to use the same data and models as the MASTER # experiment, but change the Density (d) and Senone (s) values in the sphinx_train.cfg file. ##################################################

# Print Part 1 header. print_partone_header;

print "Is this a new MASTER or CHILD Experiment? (type 'm' for MASTER or 'c' for CHILD) "; $input = ; chomp $input;

if(!(defined($input))) { die "Error: No Experiment type given."; }

if($input eq 'm') { # If they want to make a new MASTER Experiment, then we can just go to Step 2 now. # In Part 2, we will automatically determine the next available experiment number and set our global variable. $PART_2_READY = 1; } elsif($input eq 'c') { $IS_CHILD_EXPERIMENT = 1;

# Ask for the MASTER experiment this will belong to. print "Please enter the existing Experiment Number that this new experiment will go inside: "; $EXPERIMENT_NUMBER = ; chomp $EXPERIMENT_NUMBER; # TODO: Check for bad input.

# Ask for the relative path this new experiment will be housed in. print "Please enter the relative path for this new CHILD Experiment (EXACT Format: d12/s2000): "; $temp_child_exp_path = ; chomp $temp_child_exp_path; # TODO: Check for bad input.

# Build the entire relative Experiment path with given information. Should be like # $EXPERIMENT_NUMBER/$CHILD_EXP_PATH (0200/d12/s2000) $CHILD_EXP_PATH = $temp_child_exp_path; chomp $CHILD_EXP_PATH;

# Build the DB Name for the CHILD Experiment to be used later in configuring the sphinx_train.cfg file. # Should be like: 0200_d12_s2000 # Need to set $CHILD_EXP_DB_NAME by replace the '/' in $temp_child_exp_path with '_' and append the $EXPERIMENT_NUMBER in the front. $child_epx_path_no_slash = $temp_child_exp_path; $child_epx_path_no_slash =~ tr{/}{_}; $CHILD_EXP_DB_NAME = $EXPERIMENT_NUMBER. "_" . $child_epx_path_no_slash;

# Print all gathered information and prompt User to continue. print "\nBelow is information regarding your new experiment ...\n"; print "Master Experiment number: $EXPERIMENT_NUMBER\n"; print "Child Experiment absolute path: ". $EXPERIMENT_DIR_ABSOLUTE_PATH. "/" . $EXPERIMENT_NUMBER. "/" . $CHILD_EXP_PATH. "\n"; print "Child Experiment db name: $CHILD_EXP_DB_NAME\n";

# Prompt user to continue along. print "Please type '1' to continue: "; $PART_2_READY = ; chomp $PART_2_READY; } else { # They entered invalid input - die. die "Please enter a 'm' or 'c' next time."; } }

{	##################################################	# Part 2. Setting up Experiment Directory # Scripts being used: exp_dir_setup.pl OR child_exp_dir_setup.pl	# # From this moment on, we have to first check if they are selecting to do a new MASTER or CHILD Experiment. # This is important to know because there is totally different information we have to give to sphinx_train.cfg and other important areas. # Below will direct to whichever Experiment the user decides to run and adjust accordingly. ##################################################

if($PART_2_READY) { if($IS_CHILD_EXPERIMENT) { # Print Part 2 CHILD Experiment header. print_parttwochild_header;

# We have all the information we need from above, so we can just call our child_exp_dir_setup.pl script to create the directory. # Params we need to send to child_exp_dir_setup.pl are: # -a // Gives the User read/write permissions to the new Experiment directory. # -x // MASTER Experiment Number. i.e. 0200 # -p // Relative path for CHILD Experiment. i.e. 0200/d12/s2000 # -n // The DB Name for the new CHILD Experiment. i.e. 0200_d12_s2000

$child_exp_dir_setup_arg_list; $child_exp_dir_setup_arg_list .= "-a 1 "; $child_exp_dir_setup_arg_list .= "-x $EXPERIMENT_NUMBER "; $child_exp_dir_setup_arg_list .= "-p $CHILD_EXP_PATH "; $child_exp_dir_setup_arg_list .= "-n $CHILD_EXP_DB_NAME";

chomp $child_exp_dir_setup_arg_list;

# Run script. print "Executing Part 2 - perl $CHILD_EXPERIMENT_DIR_SETUP_SCRIPT $child_exp_dir_setup_arg_list\n\n";

# This will print out all the outputs in the setup script as it runs. print "Output from script: \n"; print `perl $CHILD_EXPERIMENT_DIR_SETUP_SCRIPT $child_exp_dir_setup_arg_list\n\n`; # print "perl $CHILD_EXPERIMENT_DIR_SETUP_SCRIPT $child_exp_dir_setup_arg_list\n\n"; # Prompt user to continue along. print "Please type '1' to continue: "; $PART_3_READY = ; chomp $PART_3_READY; } else {

# Print Part 2 MASTER Experiment header. print_parttwomaster_header;

# Params we need to send to exp_dir_setup.pl are: # -a // Gives the User read/write permissions to the new Experiment directory. # -x // MASTER Experiment Number. i.e. 0200

$exp_dir_setup_arg_list;

# Need to find the next available Experiment Number in the /mnt/main/Exp directory. print "Finding next Master Experiment Number ...\n"; $EXPERIMENT_NUMBER = get_available_exp_number; chomp $EXPERIMENT_NUMBER; print "Using Experiment Number: $EXPERIMENT_NUMBER\n";

# If we don't find an Experiment number, die. if(!(defined($EXPERIMENT_NUMBER))) { die "Error: No experiment number found - please try running script again." }

# The -a argument which will allow us to have full write access to our new Experiment directory will always be true or 1. $exp_dir_setup_arg_list .= "-a 1 ";

# Prep the argument list dynamically for the exp_dir_setup.pl script. if($EXPERIMENT_NUMBER) { # Need to make sure we have the " " after "-x $EXPERIMENT_NUMBER " so any other arguments will be formatted correctly. $exp_dir_setup_arg_list .= "-x $EXPERIMENT_NUMBER "; }

# Clean the argument string up. chomp $exp_dir_setup_arg_list;

print "Executing Part 1 - perl $EXPERIMENT_DIR_SETUP_SCRIPT $exp_dir_setup_arg_list\n\n";

# This will print out all the outputs in the setup script as it runs. print "Output from script: \n"; print `perl $EXPERIMENT_DIR_SETUP_SCRIPT $exp_dir_setup_arg_list\n\n`; # print "perl $EXPERIMENT_DIR_SETUP_SCRIPT $exp_dir_setup_arg_list\n\n"; # Prompt user to continue along. print "Please type '1' to continue: "; $PART_3_READY = ; chomp $PART_3_READY; }	} else { die "Part 2 was not accepted."; }

print get_small_border. " End Part 2 ". get_small_border. "\n\n"; }

{	##################################################	# Part 3. Configuring the Sphinx Configuration CFG File # Scripts being used: exp_sphinx_config.pl OR child_exp_sphinx_config.pl	# # This part is configuring the sphinx_train.cfg file. ##################################################

$density_value; $senone_value;

if($PART_3_READY) { # Print Part 3 header. print_partthree_header;

# Params we need to send to exp_sphinx_config.pl are: # -x // MASTER Experiment Number. i.e. 0200 # -d // OPTIONAL - Density value we'd like to use. # -s // OPTIONAL - Senone value we'd like to use.

$exp_sphinx_config_arg_list;

print "Arguments needed:\n"; print "OPTIONAL - Please enter a value to be used as the Density (MUST BE MULTIPLE OF 2): "; # TODO: Add some error checking here to make sure number given is multiple of 2. $density_value = ; chomp $density_value;

print "OPTIONAL - Please enter a value to be used as the Senone (default: 1000): "; # TODO: Add some error checking here if it is needed - ask Colby J.		$senone_value = ; chomp $senone_value;

# Prep the argument list dynamically for the exp_sphinx_config.pl script. $exp_sphinx_config_arg_list .= "-x $EXPERIMENT_NUMBER "; if($density_value) { $exp_sphinx_config_arg_list .= "-d $density_value "; print "Using Desnity value: $density_value\n"; }		if($senone_value) { $exp_sphinx_config_arg_list .= "-s $senone_value "; print "Using Senone value: $senone_value\n"; }

if($IS_CHILD_EXPERIMENT) { # If it's a CHILD Experiment, we have to add the CHILD Path and the DB Name. $exp_sphinx_config_arg_list .= "-p $CHILD_EXP_PATH "; $exp_sphinx_config_arg_list .= "-n $CHILD_EXP_DB_NAME "; print "Using CHILD Experiment relative path: $CHILD_EXP_PATH\n"; print "Using CHILD Experiment DB Name: $CHILD_EXP_DB_NAME\n"; }

# Clean up the string. chomp $exp_sphinx_config_arg_list;

# Execute script. if($IS_CHILD_EXPERIMENT) { print "Executing Part 3 - perl $CHILD_EXPERIMENT_SPHINX_CONFIG_SCRIPT $exp_sphinx_config_arg_list\n\n"; print "Output from script: \n"; print `perl $CHILD_EXPERIMENT_SPHINX_CONFIG_SCRIPT $exp_sphinx_config_arg_list\n\n`; # print "perl $CHILD_EXPERIMENT_SPHINX_CONFIG_SCRIPT $exp_sphinx_config_arg_list\n\n"; } else { print "Executing Part 3 - perl $EXPERIMENT_SPHINX_CONFIG_SCRIPT $exp_sphinx_config_arg_list\n\n"; print "Output from script: \n"; print `perl $EXPERIMENT_SPHINX_CONFIG_SCRIPT $exp_sphinx_config_arg_list\n\n`; # print "perl $EXPERIMENT_SPHINX_CONFIG_SCRIPT $exp_sphinx_config_arg_list\n\n"; }

# Prompt user to continue along. print "Please type '1' to continue: "; $PART_4_READY = ; chomp $PART_4_READY; print get_small_border. " End Part 3 ". get_small_border. "\n\n"; } }

{	##################################################	# Part 4. Generate the Transcripts # Need to move to the base experiment folder we just created. # Scripts being used: genTrans5.pl	# # Arguments: # REQUIRED: Transcript Dictionary name (i.e. first5_hr/train, 10hr/train, tiny/train) ##################################################

$gen_trans_arg_list; $CORPUS_SWITCHBOARD_BASE_PATH = "/mnt/main/corpus/switchboard/"; $corpus_subset_path;

if($PART_4_READY) { # Print Part 4 header. print_partfour_header; if($IS_CHILD_EXPERIMENT) { # Move to the base CHILD experiment directory we are working with. print "Moving to our experiment directory ... \n"; $path = "$EXPERIMENT_DIR_ABSOLUTE_PATH/$EXPERIMENT_NUMBER/$CHILD_EXP_PATH"; chdir($path) or die "Cannot change to: $!\n"; print "Now inside our Child Experiment directory: "; print(cwd . "\n\n");

print "Arguments needed:\n"; print "REQUIRED - Please enter valid corpus subset train path (i.e. first5_hr/train, mini/train): "; # TODO: Add some error checking here to make sure the given input actually exists. $corpus_subset_path = ; chomp $corpus_subset_path;

# If they don't enter a Corpus Subset Path, die. if(!(defined($corpus_subset_path))) { die "Error: No Corpus Subset Path given!" }

$CORPUS_SWITCHBOARD_BASE_PATH .= $corpus_subset_path;

# Build the argument list - CHILD Experiment needs to pass the DB Name instead of ExpID $gen_trans_arg_list .= "$CORPUS_SWITCHBOARD_BASE_PATH $CHILD_EXP_DB_NAME"; } else { # Move to the base experiment directory we are working with. print "Moving to our experiment directory ... \n"; $path = "$EXPERIMENT_DIR_ABSOLUTE_PATH/$EXPERIMENT_NUMBER"; chdir($path) or die "Cannot change to: $!\n"; print "Now inside our Master Experiment directory: "; print(cwd . "\n\n");

print "Arguments needed:\n"; print "REQUIRED - Please enter valid corpus subset train path (i.e. first5_hr/train, mini/train): "; # TODO: Add some error checking here to make sure the given input actually exists. $corpus_subset_path = ; chomp $corpus_subset_path;

# If they don't enter a Corpus Subset Path, die. if(!(defined($corpus_subset_path))) { die "Error: No Corpus Subset Path given!" }

$CORPUS_SWITCHBOARD_BASE_PATH .= $corpus_subset_path;

# Build the argument list. $gen_trans_arg_list .= "$CORPUS_SWITCHBOARD_BASE_PATH $EXPERIMENT_NUMBER"; }

# Execute script - same script for CHILD and MASTER Experiments. # Only difference is we pass the DB Name for CHILD Experiments instead of ExperimentID. print "Executing Part 4 - perl $GEN_TRANS_SCRIPT $gen_trans_arg_list\n\n"; print "Output from script: \n";

# Below is code that allows me to print all the outputs from the genTrans5.pl script as it runs. # Reference: http://stackoverflow.com/questions/22031169/perl-calling-other-script-and-showing-output-as-it-runs # TODO: Put this code in a function and pass it the script we have to call ("perl ...")

# Forces a flush to stdout. $| = 1;

# Open a process for writing, reading, and error my $pid = open3(0, \*READ, 0, "perl $GEN_TRANS_SCRIPT $gen_trans_arg_list");

# Reading process output in non-blocking mode. my $select = new IO::Select; $select->add(\*READ);

do { # If there's an output to read, DO IT. foreach my $h ($select->can_read) { my $buffer = ""; sysread(READ, $buffer, 4096); if($buffer) { print "$buffer\n"; }			}			$kid = waitpid(-1, WNOHANG); } while $kid > -1;

close(READ);

print "\n". get_small_border. " End Part 4 ". get_small_border. "\n\n";

# Prompt user to continue along. print "Please type '1' to continue: "; $PART_5_READY = ; chomp $PART_5_READY; } }

{	##################################################	# Part 5. Create the Dictionary and Insert SIL Into # Need to move to the /etc directory before doing anything. # Scripts being used: pruneDictionary2.pl and genPhones.csh #	# cd etc # /mnt/main/scripts/train/scripts_pl/pruneDictionary2.pl _train.trans /mnt/main/corpus/dist/cmudict.0.6d .dic ##################################################	if($PART_5_READY) { # We have to first run the pruneDictionary2.pl script. # Params we need to send to pruneDictionary2.pl are: # _train.trans OR _train.trans # Master Dictionary path - $MASTER_DICTIONARY # .dic OR .dic

print_partfive_header;

$prune_dict_arg_list; print "Setting up the argument string for pruneDictionary2.pl ...\n"; if($IS_CHILD_EXPERIMENT) { $prune_dict_arg_list .= $CHILD_EXP_DB_NAME. "_train.trans "; print "1. Name of Transcript file: $CHILD_EXP_DB_NAME". "_train.trans\n\n";

# Get the Dictionary the user wants to use. print "We need to know which Dictionary you want to use for this experiment ... \n"; # List the current Dictionary files inside the root corpus AND the custom corpus directories. print "Here are the current Dictonary files inside $DICTIONARY_ROOT_PATH\n"; $cmd = "ls $DICTIONARY_ROOT_PATH"; system($cmd); print "\n\n"; print "Here are the current Dictonary files inside $DICTONARY_CUSTOM_ROOT_PATH\n"; $cmd = "ls $DICTONARY_CUSTOM_ROOT_PATH"; system($cmd); print "\n\n"; print "An example name for a dictionary would be: $EXAMPLE_DICTIONARY_FILE or $EXAMPLE_DICTIONARY_FILE_CUSTOM\n"; print "REQUIRED - Please enter the name of the Dictionary $DICTIONARY_ROOT_PATH:"; $SELECTED_DICTIONARY = $DICTIONARY_ROOT_PATH. ;			chomp $SELECTED_DICTIONARY; $prune_dict_arg_list .= "$SELECTED_DICTIONARY "; print "2. Selected Dictionary file: $SELECTED_DICTIONARY\n"; $prune_dict_arg_list .= "$CHILD_EXP_DB_NAME.dic"; print "3. Name of new Dictionary file: $CHILD_EXP_DB_NAME.dic\n";

# Change our directory to the /etc folder $path = $EXPERIMENT_DIR_ABSOLUTE_PATH. "/" . $EXPERIMENT_NUMBER. "/" . $CHILD_EXP_PATH. "/etc"; chdir $path or die "Error: $!"; print "Now inside directory: "; print(cwd . "\n\n"); } else { $prune_dict_arg_list .= $EXPERIMENT_NUMBER. "_train.trans "; print "1. Name of Transcript file: $EXPERIMENT_NUMBER". "_train.trans\n\n";

# Get the Dictionary the user wants to use. print "We need to know which Dictionary you want to use for this experiment ... \n"; # Show the current Dictionary files inside the root corpus AND the custom corpus directories. print "Here are the current Dictonary files inside $DICTIONARY_ROOT_PATH\n"; $cmd = "ls $DICTIONARY_ROOT_PATH"; system($cmd); print "\n\n"; print "Here are the current Dictonary files inside $DICTONARY_CUSTOM_ROOT_PATH\n"; $cmd = "ls $DICTONARY_CUSTOM_ROOT_PATH"; system($cmd); print "\n\n"; print "An example name for a dictionary would be: $EXAMPLE_DICTIONARY_FILE or $EXAMPLE_DICTIONARY_FILE_CUSTOM\n"; print "If you are using a Dictionary inside the custom directory, please type the name exactly like the example above - including the custom/...\n"; print "REQUIRED - Please enter the name of the Dictionary $DICTIONARY_ROOT_PATH"; $SELECTED_DICTIONARY = $DICTIONARY_ROOT_PATH. ;			chomp $SELECTED_DICTIONARY; $prune_dict_arg_list .= "$SELECTED_DICTIONARY "; print "2. Selected Dictionary file: $SELECTED_DICTIONARY\n";

$prune_dict_arg_list .= "$EXPERIMENT_NUMBER.dic"; print "3. Name of new Dictionary file: $EXPERIMENT_NUMBER.dic\n";

# Change our directory to the /etc folder. $path = $EXPERIMENT_DIR_ABSOLUTE_PATH. "/" . $EXPERIMENT_NUMBER. "/etc"; chdir $path or die "Error: $!"; print "Now inside directory: "; print(cwd . "\n\n"); }

print "\n";

# Run pruneDictionary2.pl - reading all print outs from it. # $| = 1 forces a flush to stdout. # TODO: Put this code in a function and pass it the script we have to call ("perl ...") $| = 1;

# Open a process for writing, reading, and error my $pid = open3(0, \*READ, 0, "perl $PRUNE_DICTIONARY_SCRIPT $prune_dict_arg_list");

# Reading process output in non-blocking mode. my $select = new IO::Select; $select->add(\*READ);

print "Executing pruneDictionary2.pl: $PRUNE_DICTIONARY_SCRIPT $prune_dict_arg_list\n\n"; print "Output from script: \n"; do { # If there's an output to read, DO IT. foreach my $h ($select->can_read) { my $buffer = ""; sysread(READ, $buffer, 4096); if($buffer) { print "$buffer\n"; }			}			$kid = waitpid(-1, WNOHANG); } while $kid > -1;

close(READ);

# We now need to copy over the Filler Dictionary to our Experiment's /etc directory. $filler_arg; $gen_phones_arg; if($IS_CHILD_EXPERIMENT) { $filler_arg = $CHILD_EXP_DB_NAME. ".filler"; $gen_phones_arg = $CHILD_EXP_DB_NAME; } else { $filler_arg = $EXPERIMENT_NUMBER. ".filler"; $gen_phones_arg = $EXPERIMENT_NUMBER; }

# Copy over the filler dictionary to the /etc directory which we're already in. print "Now inside directory: "; print(cwd . "\n\n");

print "Copying over the filler dictionary ...\n"; $cmd = "cp -i $FILLER_DICTIONARY_FILE_TO_COPY $filler_arg"; print "$cmd \n"; system($cmd); print "Success!\n\n";

# Copy over the genPhones.csh script print "Copying over the genPhones.csh script ...\n"; $cmd = "cp -i $GEN_PHONES_SCRIPT_TO_COPY"; print "$cmd \n"; system($cmd); print "Success!\n\n";

# Run the genPhones script passing gen_phones_arg. print "Executing genPhones.csh: ./genPhones.csh $gen_phones_arg\n"; $cmd = "./genPhones.csh $gen_phones_arg"; print "$cmd \n"; system($cmd); print "Success!\n\n";

print "Successfully created the Phones file located in the /etc directory.\n\n";

# Now we need to insert 'SIL' into the generated .phone file and sort it alphebetically - then save it. print "Now inserting 'SIL' into generated .phone file inside the /etc directory ...\n"; $phone_file_name; if($IS_CHILD_EXPERIMENT) { $phone_file_name = $CHILD_EXP_DB_NAME. ".phone"; } else { $phone_file_name = $EXPERIMENT_NUMBER. ".phone"; }

tie my @phone_file, 'Tie::File', "$phone_file_name" or die $!; my $to_be_inserted = 'SIL';

for my $i (0..$#phone_file) { if (($phone_file[$i] cmp $to_be_inserted) == 1) { splice @phone_file, $i, 0, $to_be_inserted; last }		}

print "Successfully inserted 'SIL' into the .phone file\n\n";

# Run the make_feats.pl script. # Have to move to the base experiment directory. # /mnt/main/scripts/train/scripts_pl/make_feats.pl -ctl /mnt/main/Exp/0028/etc/0028_train.fileids print "Setting up to run the make_feats.pl script ...\n"; $make_feats_args; $exp_fileids_file;

if($IS_CHILD_EXPERIMENT) { $child_exp_path = "$EXPERIMENT_DIR_ABSOLUTE_PATH/$EXPERIMENT_NUMBER/$CHILD_EXP_PATH"; $exp_fileids_file = "$child_exp_path/etc/$CHILD_EXP_DB_NAME". "_train.fileids";

# Change our directory to the /etc folder $path = "$child_exp_path"; chdir $path or die "Error: $!"; print "Now inside directory: "; print(cwd . "\n\n"); } else { $master_exp_path = "$EXPERIMENT_DIR_ABSOLUTE_PATH/$EXPERIMENT_NUMBER"; $exp_fileids_file = "$master_exp_path/etc/$EXPERIMENT_NUMBER". "_train.fileids";

# Change our directory to the /etc folder $path = "$master_exp_path"; chdir $path or die "Error: $!"; print "Now inside directory: "; print(cwd . "\n\n"); }

print "Using experiment train.fileids: $exp_fileids_file\n";

$make_feats_args = "-ctl $exp_fileids_file";

print "Executing make_feats.pl: $MAKE_FEATS_SCRIPT $make_feats_args\n\n"; print "Output from script: \n"; print `perl $MAKE_FEATS_SCRIPT $make_feats_args\n\n`;

print "\n". get_small_border. " End Part 5 ". get_small_border. "\n\n"; } }

print "You have successfully completed the configuration process of Running a Train. Please now continue along and run the RunAll.pl script\n\n";


 * 1) Sub-routines and methods used throughout the process.
 * 2) http://stackoverflow.com/questions/2972952/how-to-add-record-in-alphabetical-order - this is how to insert record based on alphabetic order.
 * 1) http://stackoverflow.com/questions/2972952/how-to-add-record-in-alphabetical-order - this is how to insert record based on alphabetic order.

sub get_large_border { return "\n==================================================\n"; } sub get_small_border { return "====================="; }

sub print_master_header { print get_large_border; print "Script Title: Training Master Script\n"; print "Version: $VERSION_NO\n"; print "Date Modified: 2/25/2014\n"; print "Purpose: This script is designed to make the process of Running a Train faster and more efficient as this process is one that ecompasses a lot of uses. Please follow the instructions for each step and carefully read which arguments its asking for."; print get_large_border; }

sub print_partone_header { print "\n"; print get_small_border. " PART 1 ". get_small_border; print "\n"; print "Before we begin, we must get some simple information.\n"; print "Terms we might ask:\n"; print "MASTER Experiment example: /mtn/main/Exp/0200\n"; print "CHILD Experiment example: /mnt/main/Exp/0200/d12/s2000\n"; print "\n";

}

sub print_parttwochild_header { print "\n"; print get_small_border. " PART 2 ". get_small_border; print "\n"; print "We now are going to create the new CHILD Experiment directory. We have all the input we need, so we'll let you know when it's finished and we can continue.\n"; print "\n"; }

sub print_parttwomaster_header { print "\n"; print get_small_border. " PART 2 ". get_small_border; print "\n"; print "Tasks:\n"; print "- Create the new Experiment directory on Caesar.\n"; print "- Prep the new Experiment directory with needed sub-folders and essential Perl scripts.\n\n"; }

sub print_partthree_header { print get_small_border. " PART 3 ". get_small_border; print "\n"; print "Tasks:\n"; print "- Modify the sphinx_train.cfg Configuration file.\n"; print " - Lines 6, 7, 8 we have to specify our Experiment number, our base Experiment directory path, and the base Experiment directory path on Casear.\n"; print " - Lines 79 and 80 we have to switch around which line is commented out and which one isn't. We want to use the line: CFG_HMM_TYPE = '.cont.'\n"; print " - Line 107 we can specify a custom value for the Density.\n"; print " - Line 120 we can specify a custom value for the Senone setting.\n\n"; }

sub print_partfour_header { print get_small_border. " PART 4 ". get_small_border; print "\n"; print "Tasks:\n"; print "- Generate the Transcripts we will use for this Train.\n\n"; }


 * 1) Part 5. Create the Dictionary and Insert SIL Into
 * 2) Need to move to the /etc directory before doing anything.
 * 3) Scripts being used: pruneDictionary2.pl and genPhones.csh

sub print_partfive_header { print get_small_border. " PART 5 ". get_small_border; print "\n"; print "Tasks:\n"; print "- Move to the /etc directory of current Experiment\n"; print "- Setup the argument list for the pruneDictionary2.pl script\n"; print "- Run pruneDictionary2.pl script\n"; print "- Copy over the filler dictionary and gehPhones.csh script to our /etc directory\n"; print "- Run the genPhones script to generate the .phone file\n"; print "- Insert 'SIL' into our .phone file\n"; }

sub get_available_exp_number { $next_exp_number; $exp_path = $EXPERIMENT_DIR_ABSOLUTE_PATH. "/";	opendir(EXPDIR, $exp_path) or die "Cannot open experiment directory";

@experiments = readdir(EXPDIR); closedir(EXPDIR);

@experiments = sort(@experiments);

# Need to strip out the non-experiment directories. for($i = 0; $i < scalar(@experiments); $i++) { $temp = $experiments[$i]; if($temp =~ m/\D/g || $temp =~ m/^0+$/) { splice(@experiments, $i, 1); $i--; }	}

# Need to find the lowest available experiment number. for($i = 0; $i < scalar(@experiments); $i++) { if($i == scalar(@experiments) - 1) { # The next available number is the next number in the sequence. $next_exp_number = $experiments[$i]; $next_exp_number++; last; } elsif(!(&isInArray(\@experiments, $experiments[$i] + 1))) { $next_exp_number = $experiments[$i]; $next_exp_number++; last; }	}

return $next_exp_number; }

sub isInArray{ @arr = @{$_[0]}; $exp = $_[1]; my $i = 0; foreach(@arr){ if($_ == $exp ){ return $i; }		$i++; }	return ""; }
 * 1) Determines if a given input is contained within an entry in a given array.
 * 2) Returns: The element where the first instance of the given item is. "" otherwise.