Speech:NOAA

From Openitware
Jump to: navigation, search


Project Notes


NOAA Corpus

Audio Corpus Data

The audio files originate from radio weather messages broadcast by the National Oceanic and Atmospheric Administration (NOAA). This data was collected by Marcel Filimon during the Summer of 2014. The method he used to collect this data, as well as the results he found, can be found in his personal log: Marcel's Log. 185 total audio files were collected. They are split up into three directories: 'full', 'half', and '40min_split'. Unlike Switchboard, these audio files are .wav files instead of .sph files which means that within their directories, the files are placed in a 'wav' directory as opposed to a 'conv' directory like in Switchboard. The full file path to find the audio files is:

/mnt/main/corpus/noaa/full/audio/wav
/mnt/main/corpus/noaa/half/adapt/audio/wav
/mnt/main/corpus/noaa/40min_split/adapt/audio/wav

Transcript Files

The transcripts for each of the directories are located in the 'trans' directories.

Ex
 PORTLAND WAS PARTLY CLOUDY THE TEMPERATURE WAS SEVENTY THREE DEGREES (NOAA_162.450-001)