Speech:Spring 2011 Appendix


 * Home
 * Spring 2011
 * Proposal
 * Report
 * [Appendix]

This appendix contains the scripts needed to convert the CMU Switchboard Corpus to the format needed by the Sphinx software.

SwitchboardToDecoderConverter Script
This script converts the Switchboard transcriptions into a format that the Sphinx decoder can interpret.


 * SwitchboardToDecoderConverter.pl

FileID Script
This shell script creates a file that lists all the audio files and their paths in the system. Sphinx uses this file as a lookup table, in order to associate the audio files with their transcripts. The file that is generated has the file extension ".fileids".

Dictionary Script
This script makes a pruned copy of the large dictionary that can be found on the CMU Sphinx website. The output file is a dictionary that contains the words and their phonemes for each unique word in the transcript files.


 * Dictionary.pl

Phoneme Script
This shell script generates a listing of the unique phonemes in the dictionary.