Speech:CreateTranscript.pl

Summary
Title: createTranscript.pl

Author: Unknown

Location: mnt/main/scripts/user/createTranscript.pl

Usage: The user specifies the base transcript to trim, the duration in hours and the start time in hours. The script will produce a new transcript of [duration] hours, starting at hour [offset].

Description
This script works similar to createTranscript.pl. The algorithm they used for that script was based on the same algorithm I used to calculate the total time as 308 hours (which we determined was not accurate due to channel overlap). This script builds a new transcript using the algorithm I used to calculate 250 hours. It also allows for the user to specify a start time, thus allowing us to build transcripts of data in the middle of our full transcript rather than being constrained to the start or end.

Description
This script will create a smaller transcript that is of a length of time specified by the user. The length_of_time is in seconds.

Start time indicates how far into the transcript the script should go before it starts to copy dialog to the new transcript. Time is also in seconds.

Code

 * 1) !/usr/bin/perl


 * 1) This script will create a smaller transcript that is of a length of time specified by the user
 * 2) The length_of_time is in seconds.  This script will create a transcript where the spoken dialog
 * 3) lasts for the amount of time specified by length_of_time
 * 4) Start time indicates how far into the transcript the script should go before it starts to copy
 * 5) dialog to the new transcript.  Time is also in seconds.

if ($#ARGV != 3) { print "usage: createTranscript.pl    \n". "Duration is in seconds\n"; exit -1; }

$input_file = $ARGV[0];
 * 1) set input file name

$output_file = $ARGV[1];
 * 1) set output file name

$length = $ARGV[2];
 * 1) set the length

$start_time = $ARGV[3];
 * 1) set the start point

$current_count = 0;
 * 1) set counter to track how far we are in the script

open(MYINPUTFILE, "<$input_file") || die("can't open file: $!"); open(MYOUTPUTFILE, ">>$output_file");

while()               # read in file line by line {     $line = $_; chomp $line;

$start = $line;                         # copy line to new variable $start =~ s/sw[0-9]*[A-B]-ms98-a-[0-9]* //; # remove all characters up to and including the first whitespace $start =~ s/ .*//;                              # remove everything after the whitespace, this pulls out start time

$stop = $line;                           # copy line to new variable $stop =~ s/sw[0-9]*[A-B]-ms98-a-[0-9]* \d+\.(\d+) //; #remove all characters up to & including the 1st whitespace $stop =~ s/ .*//;                # substitute a blank for everything after the whitespace, this pulls out stop time

# get the duration $duration = $stop - $start;

# add that to the total count $count += $duration;

# if we have reached the desired length, stop the script if($count > ($length + $start_time)) {       last; }

# if we are at the starting point, add the line to the new transcript if($count > $start_time) {       print MYOUTPUTFILE "$line\n";                   # send transcript to new file }

}

close(MYINPUTFILE); close(MYOUTPUTFILE); print "done\n";