Speech:Exps 0305 003

Description
Author: Rose_Salemi

Date: 3-5-2018

Purpose: Comparing the same scripts on different servers. Prof Jonas discovered a mismatch relating to the presence of bracketed words such as [laughter] and dashes "-". A previous capstone class (Spring 2016) had tried to improve the WER by removing them from the dictionary and train_trans transcript, but they had not thought to also remove them from the language model (LM), so they got a lot of errors. This experiment 0305/033 aims to reproduce their results. This experiment is also a repeat of 0303/026 in Spring 2018's sandbox that happened to also reproduce the same results.

Details:

ssh obelix

cd /mnt/main/Exp/0305/003

makeTrain.pl switchboard 5hr/train

genFeats.pl -t

nohup scripts_pl/RunAll.pl &

mkdir LM

cd LM

cp -i /mnt/main/corpus/switchboard/5hr/train/trans/train.trans trans_unedited

parseLMTrans.pl trans_unedited trans_parsed

lm_create.pl trans_parsed

cd ..

cd etc

awk '{print $1}' /mnt/main/corpus/switchboard/5hr/test/trans/train.trans >> /mnt/main/Exp/0305/001/etc/001_decode.fileids

nohup run_decode.pl 0305/001 0305/001 1000 &

Monitor parseDecode until it completes in another terminal window (right-click on the toolbar and choose Duplicate Session) using the command tail -f /mnt/main/Exp/0305/003/etc/decode.log (replace the parent and sub experiment numbers with yours)

parseDecode.pl decode.log hyp.trans

sclite -r 005_train.trans -h hyp.trans -i swb >> scoring.log

tail -10 scoring.log

Results: I got a 34.8 on obelix. This is different than the 27.6 I got on idefix, but ought to be the correct score, since the parseDecode.pl and sclite commands were run after the decode was complete. I checked the line count to make sure I had a complete hyp2.trans file, it had the full amount

[ras1002@caesar etc]$ wc hyp2.trans 4172 64099 383999 hyp2.trans

[ras1002@caesar etc]$ tail -9 scoring2.log |=================================================================|     | Sum/Avg | 4172  60215 | 72.5   19.6    7.9    7.4   34.8   87.6 | |=================================================================|     |  Mean   |  1.3   19.1 | 75.6   18.6    5.8   15.3   39.7   88.0 | | S.D.   |  0.5   16.5 | 18.2   15.4    7.7   28.7   32.6   29.9 | | Median |  1.0   15.0 | 75.0   16.9    2.4    4.2   33.3  100.0 | `-'

To check my results, I did a search in the LM for bracketed words, then one for dashes:

[ras1002@caesar main]$ cd /mnt/main/Exp [ras1002@caesar Exp]$ grep -l '\[' ????/???/LM/tmp.arpa 0303/026/LM/tmp.arpa 0305/002/LM/tmp.arpa 0305/007/LM/tmp.arpa

[ras1002@caesar Exp]$ grep -l '\-' ????/???/LM/tmp.arpa 0305/003/LM/tmp.arpa

Then I did:

cd 0305/003/LM

[ras1002@caesar LM]$ more tmp.arpa

(got a huge list of words; here's a sample of those with dashes. I didn't see any brackets.)

-4.7115 VER-   -0.1165 -3.5072 W-     -0.1288 -4.1092 WA-    -0.1909

So my 0305/003 does not have brackets in the LM, but it does have words ending in dashes in the LM. This is the "mismatch" Prof Jonas was talking about. The dictionary and original train.trans transcript do have the brackets around words, although the hyp2.trans doesn't. (Some bracketed word also have dashes after them, which were left in when the brackets were removed.)

[ras1002@caesar etc]$ grep -l '\[' 003.dic 003.dic [ras1002@caesar etc]$ grep -l '\-' 003.dic 003.dic

[ras1002@caesar etc]$ grep -l '\[' hyp2.trans

(no output) (Had to run the decode a second time, so it's labeled hyp2.trans. I did look at the file visually with the "more" command (more hyp2.trans), but hyp2.trans does NOT have brackets) [ras1002@caesar etc]$ grep -l '\-' hyp2.trans hyp2.trans

[ras1002@caesar etc]$ grep -l '\[' 003_train.trans 003_train.trans [ras1002@caesar etc]$ grep -l '\-' 003_train.trans 003_train.trans

The problem Prof Jonas pointed out was that the LM, which is responsible for assigning probabilities to each word, is going to assign 0% to those words that have brackets, since it doesn't have them itself. That basically means that other words will be substituted for the bracketed words in the resulting transcript, words that the LM considers have a higher probability of being correct.

So when we score it, the hyp.trans file won't match the truth transcript (train.trans) very well, and the WER score will be lower.