Speech:Summer 2012 createTranscript.pl


 * Home
 * Information

createTranscript Perl Script
When setting up training sessions, we need to be able to use a portion of the master transcript so that we don't run a train on the whole thing. The easiest way of doing this is to determine how much spoken dialog we want in terms of length of time that the people are speaking. This script will take a length, input in seconds, and then grab that much spoken dialog. The length of time elapsed is determined by the duration of each spoken phrase. The script will also let a user specify a start time, so that it is possible to start grabbing dialog from anywhere in the script.

Source Code

 * 1) !/usr/bin/perl


 * 1) This script will create a smaller transcript that is of a length of time specified by the user
 * 2) The length_of_time is in seconds.  This script will create a transcript where the spoken dialog
 * 3) lasts for the amount of time specified by length_of_time
 * 4) Start time indicates how far into the transcript the script should go before it starts to copy
 * 5) dialog to the new transcript.  Time is also in seconds.

if ($#ARGV != 3) { print "usage: createTranscript.pl    \n". "Duration is in seconds\n"; exit -1; }

$input_file = $ARGV[0];
 * 1) set input file name

$output_file = $ARGV[1];
 * 1) set output file name

$length = $ARGV[2];
 * 1) set the length

$start_time = $ARGV[3];
 * 1) set the start point

$current_count = 0;
 * 1) set counter to track how far we are in the script

open(MYINPUTFILE, "<$input_file") || die("can't open file: $!"); open(MYOUTPUTFILE, ">>$output_file");

while()               # read in file line by line {     $line = $_; chomp $line;

$start = $line;                         # copy line to new variable $start =~ s/sw[0-9]*[A-B]-ms98-a-[0-9]* //; # remove all characters up to and including the first whitespace $start =~ s/ .*//;                              # remove everything after the whitespace, this pulls out start time

$stop = $line;                           # copy line to new variable $stop =~ s/sw[0-9]*[A-B]-ms98-a-[0-9]* \d+\.(\d+) //; #remove all characters up to & including the 1st whitespace $stop =~ s/ .*//;                # substitute a blank for everything after the whitespace, this pulls out stop time

# get the duration $duration = $stop - $start;

# add that to the total count $count += $duration;

# if we are at the starting point, add the line to the new transcript if($count >= $start_time) {       print MYOUTPUTFILE "$line\n";                   # send transcript to new file }

# if we have reached the desired length, stop the script if($count > ($length + $start_time)) {       last; }   }

close(MYINPUTFILE); close(MYOUTPUTFILE); print "done\n";