Speech:Exps 0283 003

Description
Author: Ben Leith James Schumacher Ryan O'Neal

Date: 2/10/16

Purpose: Reproduce last year's score

Details: Attempting to replicate results of last year's experiments on a 125hr train using non-default senome and density values.
 * Train
 * Senome: 8000
 * Variance Normalization: no
 * Density: 64
 * Convergence Ratio: 0.004
 * LM
 * Default
 * Decode
 * Audio files: 10,000
 * Senome: 8000

Results:
 * In progress now! Determining Train Configuration options!
 * Started the generateFeats script
 * We realized that we didn't take into account how long this script would take and made the decision to kill it (forgot to use nohup and &, so had no choice)
 * We removed all of the contents in the /003 directory and reran the prepareTrainExperiment script
 * Once that finished, we remembered to use the nohup command and &, so that we could exit the terminal and still have the generateFeats script continue to run.
 * Checking Caesar again at 12:04 AM (2-11-16), the generateFeats script has finished.
 * Will start train tomorrow morning, which is expected to take between 2 to 3 days to run.
 * Re-edited the sphinx_train.cfg file to set the appropriate values (was forced to change permissions of file as root)
 * Started the train; however, the train failed due to permission errors. In retrospect, a recursive chmod command was likely necessary to run a successful train as I was not the owner.
 * In addition, there was another train going on at the time. After the train failed, I performed the top command and noticed another user had the cpu maxed out. Will have to wait until that train is finished before attempting to run this train again.
 * Okay, the other train that was running is done.
 * I have modified permissions in the 003/ directory including itself to 775. Full permission, except outsiders can't overwrite files.
 * Decided it was unnecessary to wipe out the 003/ directory and instead attempted to run the train again (hoping any changes made in the initial attempt were rolled back).
 * The real time factor of running a train is roughly 50%, so this 125 hr train should finish in 62.5 hr or 2.6 days. It's currently 11:30 AM on 2-11-16. So the ETA is 2-13-16 at 11:30 PM.
 * The train finished sometime in the AM hours of 2-14-16
 * I created the language model
 * Currently unsure of what values to use for the decode. Going to contact Ben about it.
 * The decode is currently running
 * The decode finished
 * Took approximately 50 hrs
 * Jonas told me that the decode should be around real time. This decode had roughly 12 hours of audio files to decode and this decode took over 48 hours. About four times real time. However, I set the senomes value to 8000 instead of 1000 like we did on our previous successful experiment. The decode on the previous successful experiment took about 30 minutes. So, this is my hypothesis concerning the senomes value effect on real time.
 * 1000 Senomes will result in 1/2 real time
 * 2000 Senomes will result in real time
 * 8000 Senomes will result in 4 times real time
 * I scored the decode and below are the results.

SYSTEM SUMMARY PERCENTAGES by SPEAKER

,-.     |                            hyp.trans                            | |-|     | SPKR    | # Snt # Wrd | Corr    Sub    Del    Ins    Err  S.Err | |-+-+-|     |=================================================================|      | Sum/Avg |10000  143752| 79.7   15.0    5.3   15.0   35.4   94.6 | |=================================================================|     |  Mean   | 57.5  826.2 | 79.8   15.2    5.0   16.0   36.3   94.8 | | S.D.   | 25.2  341.8 |  6.2    4.8    2.3    7.2   10.5    5.7 | | Median | 55.0  785.0 | 80.7   14.6    4.7   14.3   34.5   96.0 | `-'