Speech:PruneDictionary2.pl

Summary
Title: pruneDictionary2.pl

Author: unknown

Location: mnt/main/scripts/user/PruneDictionary2.pl

Usage: pruneDictionary2.pl 

Description

 * This script prunes the master dictionary, creating a new dictionary with only the words we are interested in.
 * Sets variables from command-line arguments.
 * runs text2wfreq which gives a unique list of all the words that appear in the transcript, including how many times each word appears. Unfortunately that includes the (swxxx) statements.
 * Those results are sorted and fed to grep which yanks out the sw statement lines and outputs the results to a temp file.
 * The temp file is then opened for processing.
 * for each word in the temp word list this loop strips each word of any numbers after a whitespace (meaning that a word consisting of a numeric character will be allowed), it will also strip out any words which begin with a '<'. Such characters always precedes a non-word attribute which is not defined in the dictionary.
 * It then saves the line in a temporary pruned file.
 * Then it calls the dictionary2 script which create a new dictionary that only contains the words in the Pruned list.
 * Lastly it removes temporary files.