Speech:MakeTrain.pl

Summary
Title: makeTrain.pl Author: David Meehan Updates v2-v5 performed by Matthew Heyner Sp16 Location: mnt/main/scripts/user/ Usage: makeTrain.pl [-d]  Examples: makeTrain.pl [-d] switchboard 30hr/test makeTrain.pl switchboard 30hr/train

Description
This script automates the process of experiment creation up to feats generation (it does not update senone value, density or other trainer parameters), relying on using symlinks for audio files to reduce the amount of space. This script also makes use of the newest versions of genTrans and pruneDictionary which drastically improves the performance of generating the transcript and dictionary files.

Code
use Cwd; use File::Basename;
 * 1) !/usr/bin/perl

=begin comment makeTrain.pl (was prepareTrainExperiment.pl) Semester: Spring 2016 Start Date: 4/10/2016 Last Modified: 4/14/2016

Recent Changes: - Updated help information

=cut

if (($#ARGV != 2) and ($#ARGV !=1)) { print "Usage: makeTrain.pl [-d] \n"; print "Example #1: makeTrain.pl [-d] switchboard 30hr/test\n"; print "Example #2: makeTrain.pl switchboard 30hr/train\n"; print "Information: Run from the main experiment directory you wish to make a train for.\n"; print "Flag -d: Points to the /trans/dev.trans\n"; print "Flag -e: Points to the /trans/eval.trans\n"; print "Flag -t: Points to the /trans/train.trans\n"; exit -1; }

if($#ARGV == 2) { $flag = $ARGV[0]; #set Flag $corpus = $ARGV[1]; #set corpus $corpus_dir = $ARGV[2]; #set corpus directory } else { $corpus = $ARGV[0]; #set corpus $corpus_dir = $ARGV[1]; #set corpus directory }

$corpus_path = "/mnt/main/corpus/$corpus";

$path = getcwd; $exp = basename($path); print "$exp\n";

print "Creating directory structure...\n"; $cmd = "/mnt/main/root/tools/SphinxTrain-1.0/scripts_pl/setup_SphinxTrain.pl -task $exp > /dev/null"; system($cmd); $cmd = "/mnt/main/root/sphinx3/scripts/setup_sphinx3.pl -task $exp > /dev/null"; system($cmd); print "Done!\n";

print "Modifying sphinx_train.cfg...\n"; $cmd = "sed -i s/^.CFG_HMM/TEMP/g etc/sphinx_train.cfg"; system($cmd); $cmd = "sed -i s/\#.CFG_HMM/\\\$CFG_HMM/g etc/sphinx_train.cfg"; system($cmd); $cmd = "sed -i s#/root/speechtools/SphinxTrain\-1\.0/train1#$path#g etc/sphinx_train.cfg"; system($cmd); $cmd = "sed -i s/train1/$exp/g etc/sphinx_train.cfg"; system($cmd); $cmd = "sed -i s/^TEMP/\#TEMP/g etc/sphinx_train.cfg"; system($cmd); print "Done!\n";

$out = "data"; $filename = "etc/.train_info";
 * 1) prefix is the exp_id

$cmd = "rm wav/*.sph; rmdir wav"; system($cmd); $cmd = "mkdir wav"; if($? == -1) { print "Failed!\n"; exit 0; } system($cmd); if($? == -1) { print "Failed!\n"; exit 0; } print "Done!\n";

print "Preparing data input files...\n";

print "Generating transcript file and linking utterance files...\n"; $cmd = "/mnt/main/scripts/user/History/genTrans/15/genTrans.pl $flag $corpus $corpus_dir $exp"; system($cmd);
 * 1) if($flag eq '')
 * 2)  $cmd = "/mnt/main/scripts/user/History/genTrans/15/genTrans.pl $corpus $corpus_dir $exp";
 * 3)  system($cmd);
 * 4) } else {
 * 1) } else {
 * }

print "Generating dictionary file...\n"; $cmd = "/mnt/main/scripts/user/pruneDictionary.pl ". $corpus_path. " etc/". $exp. "_train.trans ". $exp; system($cmd); print "Done!\n";

print "Generating filler dictionary...\n"; $cmd = "cp -i /mnt/main/root/tools/SphinxTrain-1.0/train1/etc/train1.filler etc/$exp.filler"; system($cmd); $cmd = "echo '\[NOISE\] \+noise\+\n\[LAUGHTER\] \+laugh\+\n\[VOCALIZED-NOISE\] \+vocalized\+' >> etc/$exp.filler"; system($cmd); print "Done!\n";

print "Generating phones list...\n"; $cmd = "cp -i /mnt/main/scripts/user/genPhones.csh etc/. "; system($cmd); $cmd = "etc/genPhones.csh etc/$exp"; system($cmd); $cmd = "echo 'SIL\n\+laugh\+\n\+noise\+\n\+vocalized\+' >> etc/$exp.phone"; system($cmd); $cmd = "sort etc/$exp.phone -o etc/$exp.phone"; system($cmd);

print "Done!\n"; print "Preparation complete!\n";