Speech:CreateSubTranscript.pl

Summary
Title: createSubTranscript.pl

Author: David Meehan

Location: /mnt/main/scripts/user/createSubTranscript

Usage: /mnt/main/scripts/user/createSubTranscript.pl /   > outfile.trans

Description
This script creates a new transcript file, derived from the base transcript provided, of the specified number of hours and starting at the specified hour. This script uses the same time calculation as corpusSize2.pl, which differs from the way Sphinx calculates time (Sphinx does not account for overlap in the audio files).

Code

 * 1) !/usr/bin/perl

if($#ARGV != 2) { print "corpusSize.pl   "; exit -1; }

$corpus = $ARGV[0]; $limit = $ARGV[1] * 3600; $offset = $ARGV[2] * 3600; open(MYINPUTFILE, "<$corpus") || die("Error"); $time = 0; $totalTime = 0; @times = ; for($i=0; $i<1000; $i++) { $times[$i] = 0; } while(my $line = ) { @temp = split(' ', $line); $file = substr($temp[0], 0, 6); $start = int($temp[1]); $end = int($temp[2]); if($offset < 0) { print "$line"; }	if($file ne $currentFile) { #print("File $currentFile:\n"); for($i=0; $i<1000; $i++) { $time += $times[$i]; $times[$i] = 0; }		$offset -= $time; if($offset < 0) { $totalTime += $time; if($totalTime > $limit) { #print("Total Time: " . ($totalTime/3600) . "\n"); exit -1; }		}		$time = 0; $currentFile = $file; } 	for($i=$start; $i<=$end; $i++) { $times[$i] = 1; } }