Speech:SampleTrans.pl

=Summary= Title: sampleTrans.pl Authors: Jon Shallow -- Modeling Group SP16 Location: /mnt/main/scripts/user/ Usage: sampleTrans.pl [-r] 
 * Example: perl /mnt/main/scripts/user/sampleTrans.pl -r 100 train/train.trans

=Description= This script is used to create samples of a transcript.

Parameters:
 * Optional
 * -r
 * "remove". removes nth line from transcript, where n = , and places it in a file called train.trans-sampled. The complement of the nth lines are places in a file called train.trans-remaining.
 * Required
 * 
 * sample_rate is an integer value that defines at what rate you want to sample the transcript file. Every nth line from the original transcript is sampled, where n = . Example: "sampleTrans.pl 100 train.trans" will copy every 100th line (without removal) from train.trans and place it in train.trans-sampled.
 * path to a transcript file.
 * path to a transcript file.

Intent: The intent of the script is to allow command line sampling in order to create test/dev.trans, test/eval.trans, test/train.trans, and train/train.trans while creating new corpora.

From your corpus/info/misc folder (with a train.trans in it) Run as follows:
 * Run "perl /mnt/main/scripts/user/sampleTrans.pl -r  "
 * This will create train.trans-sampled (every nth line) and train.trains-remaining (the remainder)
 * Rename train.trains-sampled to dev.trans, move it into the /test/trans directory
 * Rename train.trains to train.trans-orig1 (archiving the untouched train.trans file)
 * Rename train.trans-sample to train.trans (allows us to repeat a sample on the trans file that has the dev.trans lines remove from it).
 * Run "perl /mnt/main/scripts/user/sampleTrans.pl -r  "
 * This will create train.trans-sampled (every nth line) and train.trains-remaining (the remainder)
 * Rename train.trains-sampled to eval.trans, move it into the /test/trans directory
 * Rename train.trains to train.trans-orig2 (archiving the untouched train.trans file)
 * Rename train.trans-sample to train.trans (allows us to repeat a sample on the trans file that has the dev.trans and eval.trans lines remove from it).
 * Run "perl /mnt/main/scripts/user/sampleTrans.pl  " NO -R HERE
 * This will create train.trans-sampled file, no train.trans-remaining will be created
 * Move train.trans-remaining /test/utt/trans and rename it train.trans
 * Copy /info/misc/train.trans to /train/trans/train.trans (this is the trans file remaining after all our samples, it is what we will use for the trains)

=Code= use Getopt::Long; #Getopt::Long module implements extended getop function called GetOptions
 * 1) !/usr/bin/perl

my $r = ''; #optional cmd line argument, sets default to false; my $h = ''; #optional cmd line argument, sets default to false; GetOptions ('r' => \$r, 'h' => \$h) or die ("Error in command line arguments\n");

if (scalar @ARGV != 2) {       die("Invalid argument, requires 2 arguments  !\n"); }
 * 1) Check number of arguments, make sure it equals 2

$samples = @ARGV[0]; $transcript = @ARGV[1];

if($r){ ##-r option creates a -remaining (train.trans-remaing) file that removes the sampled utterances from the $transcript file. IT DOES NOT ALTER THE $TRANSCRIPT FILE, the altered trans is the . #open $transcript for to read in       open FIN, "<", $transcript or die ("Can not open transcript file1\n"); #open two file-out to write to train.trans-sampled (every nth line where n = $samples) #$transcript-remaing that has the remainder. open SAMPLED, ">", "train.trans-sampled" or die ("Can not open or create train.trans-sampled\n"); open REMAIN, ">", "$transcript-remaining" or die ("Can not open or create $transcript-remaining!\n"); $i = 0; while (my $entry = ){ if($i % $samples == 0){ print {SAMPLED} $entry; }else{ print {REMAIN} $entry; }               $i += 1; }       close(FIN); close(SAMPLED); close(REMAIN); }else{ ##-r option is false, we do not want to remove lines from the transcript file open FIN, "<", $transcript or die ("Can not open transcript file!\n"); $cmd = "awk \'NR%$samples==0\' $transcript >> train.trans-sampled"; system($cmd); }

if($h){ print "help\n"; }