Speech:Exps 0283 019

Description
Authors: James Schumacher

Date: 5/8/16

Purpose: Part 1: Determine if s tags are necessary for training. The file that contains these is called _train.trans. I've removed the s tags for training to see if training fails or is altered: [jrs1036@caesar etc]$ head -1 019_train.trans-with_s_tags &lt;s&gt; RIGHT &lt;/s&gt; (sw2001A-ms98-a-0015) VS.   [jrs1036@caesar etc]$ head -1 019_train.trans RIGHT (sw2001A-ms98-a-0015)

Part 2: In addition, I'll perform another score on 0288/011 without the s tags because, as we found out in class, the s tags cause the score to be better than it should be.

Details:
 * Train configuration
 * Corpus: /mnt/main/corpus/switchboard/30hr
 * Default values, except npart is set to 4, not the default 2
 * Train start: 11:33 AM 8 May 16
 * Train end: 12:22 PM 8 May 16
 * Decode configuration
 * Decoding on: /mnt/main/corpus/switchboard/30hr/test/trans/dev.trans
 * Decoding at: 1000 senones to match the senone count in the train configuration
 * Decode start: 12:36 PM 8 May 16
 * Decode end: 2:26 PM 8 May 16

Results: Part 1: With s tags: SYSTEM SUMMARY PERCENTAGES by SPEAKER ,---.    |                             hyp.trans                             | |---|    | SPKR    | # Snt # Wrd | Corr    Sub    Del    Ins     Err   S.Err | |-+-+---|    |===================================================================|     | Sum/Avg | 3912  55254 | 47.8   41.3   10.9   11.0    63.1    92.4 | |===================================================================|    |  Mean   |  1.3   18.1 | 57.0   35.6    7.4   19.9    62.8    92.4 | | S.D.   |  0.5   16.2 | 22.6   19.5    9.3   38.9    40.9    24.8 | | Median |  1.0   13.0 | 53.8   35.5    3.8    8.6    62.5   100.0 | `---'

Without s tags: SYSTEM SUMMARY PERCENTAGES by SPEAKER ,---.    |                             hyp.trans                             | |---|    | SPKR    | # Snt # Wrd | Corr    Sub    Del    Ins     Err   S.Err | |-+-+---|    |===================================================================|     | Sum/Avg | 3912  47430 | 39.2   48.1   12.7   12.8    73.5    92.4 | |===================================================================|    |  Mean   |  1.3   15.5 | 40.4   50.8    8.8   42.7   102.4    92.4 | | S.D.   |  0.5   15.8 | 30.7   30.3   11.8  111.4   117.1    24.8 | | Median |  1.0   10.0 | 37.0   50.0    4.2   10.0    78.9   100.0 | `---'

Part 2: To remove the s tags: sed 's/&lt;s&gt; //' file | sed 's/&lt;\/s&gt; //' >> ./new_file To perform this test, the above command needed to be applied to the hypothesis transcript and the reference transcript. Old score with s tags on 0288/011: 41.8% New score without s tags on 0288/011: 48.4% <-- Quite a bit worse -- 6.6% difference