Speech:Exps 0019

From Openitware
Jump to: navigation, search

Tiny Test Train 2


Author: Eric Beikman

Date: (Started) 2/7/2013

Purpose: To gain a familiarity with the Sphinx trainer and the steps necessary to complete a model. This experiment also served the purpose of introducing the model-building steps to the rest of the modelling team. Since we were using the same corpus subset and methodology as experiment 0018, we expected a similar result.

Details: Like Experiment 0018, instead of creating a new corpus subset, we used a pre-existing corpus subset (Tiny). This "Tiny" corpus contains about 15 minutes worth of data.

Results The first few attempt to run the experiment failed due to missing dictionary words. We added the words found in Experiment 0018 and continued onwards. The Train itself ran successfully. We created the Language model shortly afterwards. The total amount of time for both tasks was about 30 minutes.

Like experiment 0018, we ran into issues scoring and subsequently were not able to score the train.

After reviewing the logs created in Experiments 0018 and 0019, with the logs of the last successful train (0017), we were able to determine that the Train was creating a semi-continuous model in the Sphinx2 format. The decoder was looking for continuous models in the Sphinx3 decoder format, and thus were failing as it couldn't load these models. We were also able to determine that the configuration file for experiment 0018 and 0019 were set to make semi-continuous files in the sphinx 2 format.

Future trains will need to ensure that configuration file is configured to make models in the Sphinx 3 format. This can be achieved by uncommenting the appropriate variable in the configuration file.