Speech:Fall 2014 Adaptation


 * Home
 * Semesters
 * Fall 2014

Contributors

 * Justin Thibeault

Description
This adaptation method uses takes an existing model and adapts it for use with a new domain of audio files. In this case, I use the previous semester's best switchboard model and adapt it for use with audio that Marcel Filimon captured from NOAA broadcasts in Summer 2014.

The audio files and generated features for this experiment already existed, so to set up the audio I re-organized the audio and transcripts into 2 sub-corpi. I started by removing the first 20 minutes of audio because they were from a different recording session. I split the remaining 40 minutes of corpus into 2 approximately equal sub-corpi by placing alternating utterances into each corpus. One corpus which I called NOAAA/40_min_split/adapt will be used for adaptation and the other corpus is called NOAA/40_min_split/test will be used to test the effectiveness of the adaptation.

The model I started with came from Erol Aygar's work in Summer 2014. He had made improvements on previous semester's work and his best results was in Experiment 0253/B12. The setup involved making a full copy of the model in my local experiment directory because the adaptation process overwrites the model.

Step 1
This command generates 3 files - gauden_counts, mixw_counts and tmat_counts. nohup /mnt/main/Exp/0257/001/bin/bw \ -hmmdir /mnt/main/Exp/0253/B12/model_parameters/012.cd_cont_10000 \ -moddeffn /mnt/main/Exp/0253/B12/model_parameters/012.cd_cont_10000/mdef \ -ts2cbfn .cont. \ -feat 1s_c_d_dd \ -ltsoov yes \ -cmn current \ -dictfn /mnt/main/Exp/0253/B12/etc/B12.dic \ -fdictfn /mnt/main/Exp/0253/B12/etc/B12.filler \ -ctlfn /mnt/main/Exp/0259/etc/noaa_40min_split_adapt.fileids \ -lsnfn /mnt/main/Exp/0259/transcripts/noaa_40min_split_adapt.ref.txt \ -cepdir /mnt/main/Exp/0259/audio/mfc \ -accumdir. > /mnt/main/Exp/0259/logs/011bw.log


 * /mnt/main/Exp/0257/001/bin/ a normal Sphinx 3 executable folder created by the an experiment generation script. The server's executable folder was missing adaptation scripts, which is why I used a local script.
 * /mnt/main/Exp/0253/B12/model_parameters/012.cd_cont_10000 is the model that will be adapted
 * /mnt/main/Exp/0254/B12/etc holds the dictionary files
 * /mnt/main/Exp/0259/etc/noaa_40min_split_adapt.fileids fileids of the target audio
 * /mnt/main/Exp/0259/transcripts/noaa_40min_split_adapt.ref.txt transcript of the target audio

Step 2
Next, you adapt the model using the data from the Baum Welch analysis. This step overwrites the model:

/mnt/main/Exp/0257/001/bin/map_adapt \ -meanfn /mnt/main/Exp/0259/011/model_parameters_adapt/012.cd_cont_10000/means \ -varfn /mnt/main/Exp/0259/011/model_parameters_adapt/012.cd_cont_10000/variances \ -mixwfn /mnt/main/Exp/0259/011/model_parameters_adapt/012.cd_cont_10000/mixture_weights \ -tmatfn /mnt/main/Exp/0259/011/model_parameters_adapt/012.cd_cont_10000/transition_matrices \ -accumdir. \   -mapmeanfn /mnt/main/Exp/0259/011/model_parameters_adapt/012.cd_cont_10000/means \ -mapvarfn /mnt/main/Exp/0259/011/model_parameters_adapt/012.cd_cont_10000/variances \ -mapmixwfn /mnt/main/Exp/0259/011/model_parameters_adapt/012.cd_cont_10000/mixture_weights \ -maptmatfn /mnt/main/Exp/0259/011/model_parameters_adapt/012.cd_cont_10000/transition_matrices

/mnt/main/Exp/0259/011/model_parameters_adapt/012.cd_cont_10000 is a local copy of /mnt/main/Exp/0253/B12/model_parameters/012.cd_cont_10000 since this command overwrites model files.

Step 3
Finally I run another decode to see if the results are any better.

nohup /usr/local/bin/sphinx3_decode \ -hmm /mnt/main/Exp/0259/011/model_parameters_adapt/012.cd_cont_10000 \ -dict /mnt/main/Exp/0253/B12/etc/B12.dic \ -fdict /mnt/main/Exp/0253/B12/etc/B12.filler \ -lm /mnt/main/Exp/0259/LM/noaa_full.lm.DMP \ -ctl /mnt/main/Exp/0259/etc/noaa_40min_split_adapt.fileids \ -cepdir /mnt/main/Exp/0259/audio/mfc \ -cepext .mfc \ -hyp /mnt/main/Exp/0259/results/011.hyp.txt > /mnt/main/Exp/0259/logs/decode011.log &

Results
Results on test were

Baseline ,---.  |                          results/010.hyp.txt                          | |---|  | SPKR         | # Snt  # Wrd | Corr    Sub    Del    Ins    Err  S.Err | |--+--+-|  | noaa_162.450 |   57    2430 | 71.0    6.8   22.2    0.4   29.4  100.0 | |=======================================================================|  | Sum/Avg      |   57    2430 | 71.0    6.8   22.2    0.4   29.4  100.0 | |=======================================================================|  |     Mean     | 57.0  2430.0 | 71.0    6.8   22.2    0.4   29.4  100.0 | |    S.D.     |  0.0    0.0  |  0.0    0.0    0.0    0.0    0.0    0.0 | |   Median    | 57.0  2430.0 | 71.0    6.8   22.2    0.4   29.4  100.0 | `---'

and after adapatation SYSTEM SUMMARY PERCENTAGES by SPEAKER

,-.     |                           011.hyp.txt                           | |-|     | SPKR   | # Snt  # Wrd | Corr    Sub    Del    Ins    Err  S.Err | |+--+-|     | noa    |   57    2430 | 86.4    8.1    5.5    3.1   16.7   78.9 | |=================================================================|     | Sum/Avg|   57    2430 | 86.4    8.1    5.5    3.1   16.7   78.9 | |=================================================================|     |  Mean  | 57.0  2430.0 | 86.4    8.1    5.5    3.1   16.7   78.9 | | S.D.  |  0.0    0.0  |  0.0    0.0    0.0    0.0    0.0    0.0 | | Median | 57.0 2430.0 | 86.4    8.1    5.5    3.1   16.7   78.9 | `-'

Next Steps

 * Do multiple iterations of Baum Welch adaptation improve the results further?
 * Can adaptation techniques be used to improve results in the same domain, rather then adapt to a different domain?