Speech:NOAA


 * Home
 * Semesters - Project Work by Semester
 * [Information]
 * System Description
 * Experiments - List of speech experiments

Project Notes

 * Unix Notes
 * Speech Corpus Setup - Switchboard, [NOAA]
 * Speech Recognition Related Readings
 * Experiment Setup
 * Scripts Page
 * Model Building - more info on data prep,  language models, &  building models
 * Step 1: Run a Train
 * Step 2: Create the Language Model
 * Step 3: Run a Decode

Audio Corpus Data
The audio files originate from radio weather messages broadcast by the National Oceanic and Atmospheric Administration (NOAA). This data was collected by Marcel Filimon during the Summer of 2014. The method he used to collect this data, as well as the results he found, can be found in his personal log: Marcel's Log. 185 total audio files were collected. They are split up into three directories: 'full', 'half', and '40min_split'. Unlike Switchboard, these audio files are .wav files instead of .sph files which means that within their directories, the files are placed in a 'wav' directory as opposed to a 'conv' directory like in Switchboard. The full file path to find the audio files is:

/mnt/main/corpus/noaa/full/audio/wav /mnt/main/corpus/noaa/half/adapt/audio/wav /mnt/main/corpus/noaa/40min_split/adapt/audio/wav

Transcript Files
The transcripts for each of the directories are located in the 'trans' directories.

Ex PORTLAND WAS PARTLY CLOUDY THE TEMPERATURE WAS SEVENTY THREE DEGREES (NOAA_162.450-001)