Speech:Summer 2012 parseDecode


 * Home
 * Information

parseDecode Perl Script
The decode log contains the hypothetical results created by running the a decode. When sphinx makes a prediction as to what the statement is, it adds a line in the decode log which starts with FWDVIT:. The line looks like this:

FWDVIT: LIKE FOR EXAMPLE LET'S SEE UH BAD COMPANY WHAT LIKE (sw2032A-ms98-a-0017)

This log has also lot of other information in it as well. What this script does is it grabs all of those lines and discards the rest. From there it takes those statements and adjusts them so that it matches the format used in the transcipt used by training. That format looks like this SOME STATEMENT (utterance ID). So the line above then gets written as:

LIKE FOR EXAMPLE LET'S SEE UH BAD COMPANY WHAT LIKE (sw2032A-ms98-a-0017)

We want the hypothetical transcript to be in the same format as the training transcript. The reason being is that we need the files to have the same format for the next step which is to run the scoring tool that will compare the two files.

Note: A lot of the code for this came from the genTrans.pl which was created by another class.

Source Code

 * 1) !/usr/bin/perl

if ($#ARGV != 1) { print "usage: parseDecode.pl  \n"; exit -1; }

$decode_file = $ARGV[0]; $output_file = $ARGV[1]; $temp_file = "temp.log";

$sysCmd = "cat $decode_file | grep FWDVIT >> $temp_file"; system($sysCmd);
 * 1) take all the lines that being with FWDVIT and dump them into a temp file
 * 2) these are the predicted lines that we need to look at to compare to the original statement

$sysCmd = "rm $output_file"; system($sysCmd);
 * 1) remove the parsed file if it exists

open(MYINPUTFILE, "<$temp_file"); open(MYOUTPUTFILE, ">>$output_file");
 * 1) open the temp file

while() { $line = $_;; chomp $line; $line =~ s/FWDVIT: //; $utteranceID = $line; $utteranceID =~ m/\(sw[0-9]*[A-B]-ms98-a-[0-9]*\)/; $utteranceID = $&; $line =~ s/\(sw[0-9]*[A-B]-ms98-a[0-9]\)//; $line =~ s/\($&\)//; $line = " ". $line. " " . $utteranceID; print MYOUTPUTFILE "$line\n"; } close(MYINPUTFILE); close(MYOUTPUTFILE);
 * 1) read and manipulate each line in the temp so that it matches the format
 * 2) that the transaction log is in and output the reformatted string
 * 1) close files

$sysCmd = "rm $temp_file"; system($sysCmd);
 * 1) remove the temp file