Speech:Spring 2018 Proposal


 * The University of New Hampshire at Manchester -- Science and Technology Capstone Class of 2018
 * UNH Manchester Speech Project
 * With Dr. Michael Jonas

=Introduction= The goal of the Spring 2018 Capstone class is to improve upon the current models of speech and use them in combination with various tools to improve our speech recognition technology. Speech recognition was created to allow speech input such as voice and language to be understood by machines. The success of these techniques is measured by the Word Error Rate. This year's class aims to have a WER lower than all the previous classes before us. We aim to achieve a reduction of a 1-3% error rate by the end of the Spring Semester through the research and implementation of the Hidden Markov Model, Recurrent Neural Networks, and improvements on the previous models already established.

The current speech system is running on Red Hat, a Linux distribution, as well as CMU Sphinx Speech Recognition Tools. After minor adjustments by the Systems Team, we will be using seven drone servers centered around one parent device called Caesar utilizing Sphinx 3 to test. We also plan on adding three additional drones to increase the CPU and memory capabilities of the Drone cluster.

This 2018 class has been split into five distinct teams shown below. Each team has specific goals related to their respective responsibilities, as well as timelines of who will be completing what task each week. Our teams will build on the success of the previous semesters, as well as create and implement better models to gain higher accuracy in speech recognition.

=Modeling Team=

Overview
The purpose of the Modeling Team is to construct and implement different models to be used for speech recognition. Currently the system utilizes Acoustic Modeling and Language Modeling, combined with a phonetic dictionary, to conduct speech recognition analysis. Acoustic Modeling (AM) is the most difficult model to utilize as there are many different criteria to determine a relationship between transcripts and audio files. Statistical Language Modeling is the probability distribution over a sequence of words within an audio file.

This Team will incorporate LDA and, to whatever extent possible, RNNs into the HMM to augment it in order to achieve a WER that eclipses the 41.3% mark on a 300-hour eval.trans transcript achieved by the Spring 2017 Modeling Team. HMM is the basis for LVCSR systems (mi.eng.cam.ac.uk). LDA is a method of finding the best possible line that retains the maximum amount of data points to reduce dimensionality. RNNs grant the ability to add greater context to the model being developed by retaining memory that has already been calculated.

The Modeling Team's intent in augmenting HMM with LDA and possibly RNN is to obtain the best possible functional baseline. Additionally, the AM will be a challenge to implement to its full potential; creating the best reliable baseline, due to multiple parameters which can be adjusted to evaluate the relationship between transcripts and the audio files from which they were derived.

Configuring Sphinx
Previous work in the 2017 Spring Modeling Team yielded a WER of 41% on unseen data in a 300-hour experiment and was down from 47% on the previous semester’s 145-hour experiment. Improving the WER and increasing system accuracy may be attained by configuring various settings like the senone count which we are currently using at 1000 when in the past it has been 3000. The number of densities is also a consideration. After consulting the CMU documentation, the 2017 Modeling Team stated that a final count of 32 gaussian densities should be used when building features.

Using that as a baseline, we will have to determine which density works best on a 300-hour experiment. The number of states per HMM will also be a factor. This is the number of states for each HMM that is built into the AM. The 2017 Modeling Team's final number was 5 after determining that using 7 made their models too flexible. Further research will have to be conducted to determine if allowing the HMM to skip a state would provide improved results. Regarding the senone count, different values will have to be examined in correlation with different hour corpuses.

The 2017 Modeling Team's iteration used 8000 senones, but again, we will take that number as a baseline and see how well it applies to the full 300-hour experiment. The 2017 Spring Modeling Team did use a RNN LM to build n-gram LMs without being able to improve the WER in 5-hour experiments or in 30-hour experiments. The 2017 Modeling Team believed this was because an n-gram model must be built from sentences generated from the RNN model since the Sphinx decoder’s structure precludes taking in RNN models. It will be beneficial to attempt this ourselves on similar 5- and 30-hour experiments, and if we find that it is possible, implement it in the full 300 hours.

Parallelization Through Torque
Parallelization is the ability to run training in parallel processes by utilizing a multicore machine and would give us the ability to considerably reduce the time required to train and decode models. Sphinx is configured to use Torque which splits the processes of experimenting across all the Drones available for use and theoretically reduce the total experiment time to 1/7. The 2017 Spring Systems Team reports that they were able to utilize Rome as the main node for Torque and Majestix as the computer node. Consulting with Systems determined that Torque is indeed available for use on all of the drones and Caesar, so we will determine how to run trains on the PBS mom, Rome, and the CPU node, Majestix. Once an experiment has been run successfully across those two drones, we will attempt across all available drones.

Documentation
Documentation is critical, this allows future Capstones to have clear, concise and coordinated instructions on how to conduct experiments properly. Processes and procedures should be clearly outlined so that they can be easily followed by any person whom is unfamiliar with Sphinx, UNIX, directory tree structure and programming languages such as Python, Perl and C++. Additionally, this 2018 Modeling Team will meticulously document changes made to LMs and how those changes were made, step-by-step instructions on exactly how to properly train, decode and score seen and unseen data as well as the successful incorporation of LDA. Notations will be made to indicate that older information has been updated as needed. Pitfalls, known errors and lessons learned will be incorporated so future researchers will be better able to navigate experiments with a minimal amount of duplicated effort.

Task Timeline

 * Review artifacts, determine roles, accesses and responsibilities
 * Establish individual connectivity with Caesar and the wiki (Hannah, Brian Steve) 1/30 - 2/6
 * Successfully run a Train, Decode and Scoring (Hannah, Brian, Steve) 2/6 - 2/13
 * Compile a list of relevant scripts and any improvements that need to be made to these scripts (Hannah, Brian) 2/20 - 3/6
 * Devise documentation improvement plan (Steve) 2/13 - 2/20
 * Consolidate and enumerate the precise steps to run a Train, Decode and Scoring
 * Record the successful steps and command syntax to run on seen data (Steve) 2/13 - 2/20
 * Review artifacts to determine correct steps and command syntax to run on unseen data (Steve) 2/13 - 2/20
 * Adjust scripts as necessary in coordination with Experiment and Data teams to fix recurring errors (Hannah, Brian) 2/27 - 3/6
 * Coordinate with Data Team to resolve rerun/recreate 41.3% 300hr train with mismatch (Brian) 2/27 - 3/6
 * Research what was done to achieve the 41.3% 300hr train (Hannah, Brian) 2/20 - 2/27
 * Determine the successful steps and command syntax to run on unseen data incorporating LDA (Steve) 2/13 - 2/27
 * Coordinate with Data and run three 300hr trains [current/fixed/previous] (Hannah, Brian, Steve) 2/20 - 2/27
 * Improve baseline of Word Error Rate below the current baseline of 41.3% by running experiments with varied senone and density values
 * Change the senone value (default is 1000) and run trains to determine a trend toward the most efficient value (Brian, Steve) 2/27 - 3/6
 * Change the density (default is 8) and run trains to determine a trend toward the most efficient value (Brian, Steve) 2/27 - 3/6
 * Determine the combination of senone value and density value that yields the best results (Hannah, Brian, Steve) 3/6 - 3/20
 * Research other variables that may have a positive impact on improving the error rate (Hannah) 3/20 - 4/3
 * Research and implement methods to augment HMM
 * Research LDA models and how they can augment HMM machine learning (Hannah, Steve) 2/27 - 3/6
 * Determine if acoustic-feature based frequency warping for normalization is a viable option based on 30-hour testing vs 300-hour testing (Hannah, Brian) 3/6 - 3/13
 * Research RNN implementation and results by previous semester (Hannah, Brian) 3/6 - 3/13
 * Utilize LDA on 300hr train to establish a baseline (Hannah, Brian) 3/13 - 3/20
 * Utilize RNN on 300hr train to establish a baseline (Hannah, Brian) 3/13 - 3/20
 * Adjust values as applicable with LDA on 300hr train for comparison (Hannah, Brian, Steve) 3/20 - 3/27
 * Adjust values as applicable with RNN on 300hr train for comparison (Hannah, Brian, Steve) 3/20 - 3/27
 * Determine and implement procedures for using Torque
 * Determine availability/viability of using Torque (Steve) 2/27 - 3/6
 * Attempt to use Torque on a 300hr train (Steve) 3/6 - 3/13
 * If Torque used successfully, utilize on all further 300hr trains and create a Standard Operating Procedure for Torque configuration and use (Steve) 3/13 - 3/27
 * Ongoing processes, future iterations
 * Research areas of improvement for follow on researchers (Brian) 3/27 - 4/24
 * Finish and perform quality assurance/review all procedural documentation for future iteration use (Steve) 3/27 - 4/24
 * Create Model Team final report input (Hannah) 3/27 - 5/1

=Software Team=

Overview
This year, the Software Team was established with the goal of reviewing, understanding and documenting the specific files in the decoder. The team will map out how the sphinx decoder works and all of the source code files that are connected. One such file that has shown to be helpful is the main_decode.c.

This Team will initially start analyzing main_decode.c to ensure the quality of the decode. We will then take a look at different files each week as the semester progresses to discern the processes and the connectivity between them. Furthermore, this Team will collectively do extensive research on each of the Source Files and analyze the unique functions of the code; as well as how those functions impact the overall project. Presently, the Software Team has identified files in C/C++, Python and Perl.

The Team has found that this project is best explored in Visual Studio (not to be confused with Visual Studio Code). This Team also found it best to create a physical visual map to display how each file is connected and how they work off of each other. This will be helpful once more files are discovered and decoded.

Decoding Sphinx3
Look through the source files of sphinx3 and document what the code does and how the files interconnect. The team will start with main_decode.c and branch off from there. Currently the team will be moving onto corpus.c and sphinx3_decode.c. From here the team will branch off of these files to move forward.

Setup File Version Control
The team will do research into what would be the best command line driven version control software for the team's needs. The team will be looking at Revision Control System (RCS) and Subversion (SVN) as those two the best suited for this software. These were both suggested by Professor Jonas, however they both have their benefits. RCS is useful because it allows multiple users to edit the programs and make their own modifications. It is not only suitable for programming, but great for documentation as well. SVN will be beneficial for this project because it maintains past and present versions of files, including code and documentation. This feature will come in handy for mistakes or mishaps along the way. Both will be installed on one of the drones and then the system will be tested to ensure the install did not affect the speech test results. This will most likely be done using Majestix, because we will already be using the system for other objectives.

Recompile system software
Take control of a drone and disconnect it from the rest of the servers. From there the team will update features and programs to allow for efficient usability. Following that, the team will test the drone to see if we are able to make sufficient progress in the sphinx decoder. The team will assure that there are no conflicts with the system. The disconnected drone will then be reconnected with the other servers. This will allow the files to be moved onto Caesar to be accessed by the other servers.

Documentation
With this being the first year that capstone has a software team, it is crucial that we set a standard that should be followed for years to come. The team will document their findings on the team wiki page for all of the source code completed, so that future software teams will understand it. Toward the end of the semester we will need to create a format or a new wiki page that future teams can amend as they see fit. Lastly, the software team wants to figure out how the source code is affecting the overall decoder and what each specific file is doing. The team will be doing this by keeping specific individual logs about what each person has done.

Task Timeline

 * Decode Sphinx Decoder
 * Start decoding the main_decode.c because that is where the main function is located in (Team) 2/13 - 2/20
 * Comment each line of the source code and add overall conclusion of what the file does (Team) 2/13 - 2/20
 * Look at corpus.c to determine how the functions relate to main_decode.c (Team) 2/20 - 2/26
 * Look at sphinx3_decode.c to determine the connection to main_decode.c and why it searches through that file (Team) 2/20 - 2/26
 * Continue looking through different files that are connected to the main_decode.c and how they impact the overall decoder (Team) 2/20 - 5/8
 * Setup File Version Control
 * Look into Revision Control System (RCS) and Subversion (SVN) (Wesley, Danielle) 2/20 - 2/28
 * Determine which one is best suitable for Sphinx3 and set up files in version control system (Wesley, Danielle) 2/20 - 2/28
 * Set up desired version control system on Majestix, because we will be doing software updates on this drone (Wesley, Danielle) 2/28 - 3/7
 * Confirm it is compatible with Sphinx3 and overall Speech Recognition Software - (Wesley, Danielle) 2/28 - 3/7
 * Test to see if the version control system is working on Majestix, and record any problems it encounters - (Wesley, Danielle) 2/28 - 3/7
 * Recompile / Update System Programs
 * Run an experiment to get a baseline before any updates/recompiles (Josh, Lamia) 3/7 - 3/13
 * Disconnect one drone from Caesar. This drone will most likely be Majestix, because it has GCC already installed (Josh, Lamia) 3/7 - 3/13
 * Create files with system folder names to compare after the updates. To verify what has been changed and/or updated (Josh, Lamia) 3/7 - 3/13
 * Recompile software including the Decoder (Sphinx3) (Josh, Lamia) 3/9 - 3/18
 * Compare the current system file folders with the originals to verify changes (Josh, Lamia) 3/9 - 3/18
 * Reconnect drone to Caesar (Josh, Lamia) 3/9 - 3/18
 * Run Identical experiment to make sure updates did not adversely affect results (Josh, Lamia) 3/9 - 3/18
 * Create Physical Mapping of Paths used in Sphinx Decoder
 * Take Decoded Sphinx Decoder and the various files used and map them on a wall for visualization (Team) 2/20 - 5/8
 * Use visual mapping to create an easy to understand model for the other teams (Faruk) 2/20 - 5/8
 * Use visual model in conjunction with other teams to facilitate understanding of the rest of the sphinx decoder with all individuals and to review other useful features (Team) 2/20 - 5/8

=Data Team=

Overview
The primary responsibility of the Data Team is to oversee the organization and validity of the audio files and their corresponding transcript files located in the Switchboard. Previous data Teams have done an exemplary job in locating scattered corpora data, eliminating duplicate data and empty or corrupt audio files, fixing broken symbolic links, and creating a simple and easy-to-use file structure consisting of the original audio and transcript data as well as new corpora based on a specific number of hours (300, 145, 30 and 5).

The 2018 Team’s focus will build upon these accomplishments and seek to establish new ways to examine and verify the status of the corpora data. We also need to check the audio files for corruption and their corresponding transcript files for accuracy, and if problems are found, to write or adapt existing scripts to handle them.

The Team is also responsible for the creation of new corpora, if necessary. To that end, we will thoroughly investigate the 30-step Speech Corpus Setup that is in the Information section of the wiki. We will also be in contact with the other Teams to ensure they are aware that we can create a new corpus if one is needed.

Bug Fixing
One problem that has surfaced is with a script called parseLMTrans.pl, (formerly called ParseTranscript.perl), or possibly the genTrans.pl script, where regular expressions have been used to try to improve the WER (Word Error Rate) by filtering out characters such as brackets [] and non-word filler utterances such as [LAUGHTER] or other random noise.

Early examination of the resulting hyp.trans transcript file shows inconsistencies between the resulting transcript and the truth transcript. The brackets and anything inside them is filtered out. The resulting transcript either has a dash that connects an incomplete word to where the bracketed part was, (for example, s[he’s] becomes s—) or it has a different word that was chosen as a substitute by the language model, even though the bracketed word was added to the dictionary.

A total of three tests need to be done to address the bracket issue, by modifying the regular expressions to remove all bracketed information, to remove no bracketed information, and to run it as-is, with the inconsistent output we are seeing now. The option with the lowest WER will be the one kept.

Verify Audio Data Matches Transcripts
We also plan to look for more problems with the data by listening to the audio files using a media player called VLC and comparing them to the transcripts. We will listen to a sample set of 1% of the 250,331 .sph audio files. If we discover any issues, such as empty files, we will develop scripts to fix them, or use and/or adapt those from previous capstone classes.

Dictionary
We will add new words and pronunciations to the dictionary as we identify them by listening to the audio files and using the CMU Pronouncing Dictionary to obtain the correct phonemes.

Noise Cleanup
Finally, in researching new ways to aid in improving the WER, the use of a software tool for cleaning up background noise in audio files has been discussed, and was approved by Professor Jonas. We will research and acquire one that meets our needs and learn how to use it.

Corpora Creation
The Data Team will study the process for creating new corpora based on the 30-step Speech Corpus Setup, which has scripts from creating a new directory structure, copying in the full transcript and taking a sampling of the desired size, all the way through linking the utterance files. Associated concepts such as creating data sets (training, test, or adaptation sets) and Speaker Adaptive Training (SAT) will be explored as well, to have the necessary background information needed to determine whether a new corpus is needed.

Documentation
All changes to the scripts, such as those regarding regular expressions, will be commented and fully documented in student logs and the Data Team wiki page for future capstone researchers.

Task Timeline

 * Bug Fixing
 * Do trio of experiments with parseTrans.pl and gentrans.pl scripts to determine where the bug is (Tri, Rose, Isaac) 2/21 – 3/6
 * Verify Audio Data Matches Transcripts
 * Listen to 60 utterances per week and compare them to their corresponding transcript files to spot any discrepancies or other problems such as corrupt data (Rose, Tri, Isaac) 2/20 – 3/20
 * Dictionary
 * Document new words discovered during listening by adding them to the dictionary (Rose, Tri, Isaac) 2/20 – 3/20
 * Noise Cleanup
 * Research and obtain a reputable open-source application to clean up background noise in utterance files (Isaac) 2/13 -2/20
 * Test audio tool to clean background noise from audio files (Isaac) 2/20 – 2/27
 * Run experiments with cleaned utterance files (Isaac, Rose) 2/28 – 3/6
 * Compare resulting WERs with those from utterance files with noise (Isaac, Rose) 2/28 – 3/6
 * Determine whether cleaned utterance files yield a significantly better WER score than noisy files (Isaac, Rose) 2/28 – 3/6
 * If successful, continue cleaning audio files (Isaac, Rose) 3/7 – 3/20
 * If feasible, create a new corpus from them (Isaac, Rose, Tri) 3/7 – 3/20
 * Corpora Creation
 * Research the 30-step Speech Corpus Setup and related concepts such as creating data sets and Speaker Adaptive Training (Tri, Rose) 3/7 – 3/20

=Systems Team=

Overview
The System Team's primary responsibility is to ensure the performance, integrity and reliability of the system as whole. In past years, the focus was on upgrading the system hardware or tools to be the most current. However, system stability has been neglected and as a result the system will shut down when the power is over drawn, further; multiple users may access the same machine resulting in excessive usage. Working to resolve these issues and prevent future concerns, the Systems Team will endeavor to complete the following:

Performance
Our first objective is to improve the performance of the systems. In order to improve the system's performance, we intend to implement a SSH/TCP Load Balancer to distribute user-access on a Round Robin basis, as well as increase the total number of available servers; thereby expanding the CPU and memory capabilities of the cluster.

Integrity
Our second objective is to improve the integrity of the systems. This will be achieved through normalizing the servers by cloning the best performing unit to all existing drones, as well as creating a network of monitors to detect and report availability of servers (removing ‘down’ servers from the Round Robin). Further, we will research potential tools to install for error logging and reporting.

Reliability
Our third objective is to improve the reliability of the systems. The status of the backup system is not well known or well documented. We will evaluate, test and if needed repair the backup system as well as rebalance the power load of the servers to ensure it is not overdrawn.

Documentation
We will further thoroughly document the network and backup system architecture via a Server Map, in addition to standard operating procedures to prevent catastrophic failure: i.e., backup restoration, software installation, et cetera.

Task Timeline

 * Evaluate the state of the backup system
 * Ensure that the backup system works as expected (Chris) 02/20 - 2/27
 * Fix any issues that need to be addressed (Chris) 02/20 - 03/06
 * Document findings and/or changes in the wiki (Chris) 02/20 - 03/13


 * Install and configure new servers
 * Find out what is causing the monitor to not work with one of the existing servers (Daniel) 02/13 - 02/20
 * Evaluate the power use of the servers (Daniel, Yashna) 02/13 - 02/20
 * Add labels to the new servers (Yashna, Daniel) 02/20 - 02/27
 * Reconfigure the power distributions of the servers (Daniel) 02/20 - 02/27
 * Install and configure the three new servers (Daniel, Yashna) 02/27 - 03/06
 * Shift servers up to make a complete group (Daniel, Yashna) 02/27 - 03/06
 * Document changes in the wiki (Daniel, Yashna) 02/20 - 03/13
 * Create new ethernet cable for the server room (Yashna, Daniel, Camden, Chris) 03/06 - 03/13
 * Improve the cable management (Daniel) 03/13 - 03/20


 * Normalize all of the servers
 * Evaluate which server should be used for normalizing the others (Yashna, Daniel) 03/06 - 03/13
 * Normalize the servers (Yashna, Daniel, Camden, Chris) 03/13 - 05/15


 * Implement load balancing for when users log in
 * Receive permission to install Nginx as a load balancer and webserver on Rome as well as Nmap (Camden) 02/13 - 02/20
 * Install Nginx and Nmap (Camden) 02/20 - 02/27
 * Implement a SSH load balancing system + script to automatically forward users to the appropriate server (Camden) 02/27 - 03/06
 * Document how the system works in the wiki (Camden) 02/20 - 03/13


 * Find a tool to help evaluate the logs
 * Research potential tools to evaluate the logs (Yashna) 02/20 - 03/06
 * Get permission from Professor Jonas to install the error logging tool (Yashna) 03/06 - 03/13
 * Configure the error logging tool [if it has been approved] (Yashna) 03/13 - 03/20
 * Document how to use the tool in the wiki (Yashna) 02/27 - 03/20


 * Implement a new system to monitor the servers
 * Create a script to monitor the servers (Camden, Chris) 02/27 - 03/20
 * Create a dashboard to go along with the monitoring system (Camden, Chris) 02/27 - 03/20
 * Implement push and/or text notifications for the monitoring system (Camden, Chris) 02/27 - 03/20
 * Extensively test the monitoring system (Camden, Chris) 02/27 - 03/20
 * Document how to use the system in the wiki (Camden, Chris) 02/27 - 03/27


 * Create a map of the servers and the network
 * Create map of servers and the network (Camden, Chris) 03/13 - 03/20
 * Document the map in the wiki (Camden, Chris) 03/20 - 03/27

=Experiment Team= https://foss.unh.edu/projects/index.php/Speech:Exps

Overview
The overall objective of the Experiment Team is to enhance the current experiment creation and documentation processes to increase the efficiency of which other Team members may run experiments. By improving the current documentation, scripts and procedures, other Teams will be able to make significant gains towards the software’s baseline. The more ease with which other Teams can run successful decodes and receive their results, the quicker the Teams can increase the accuracy of speech recognition within the overall project.

The leading scripts our Team will work on are addExp and createExp, which were inherited from previous capstones. The former script assists the user by adding a new experiment entry to the wiki and the latter will successfully run a rudimentary experiment from anywhere on the server. As outlined below, our Team would like to add features to both scripts that will make them more user friendly as well as increase their functionality.

Moreover, the newly introduced Software Team, will be available to assist this Team with code alterations. Fixing the createExp script will require working with the Software Team as debugging the file has the potential to impact other pages on the Wiki. The same can be said in regard to the Systems and Data Teams as well; createExp is a powerful script that needs to be carefully debugged to ensure it does not create system-wide problems or corrupt existing data. This Team will also work with the Modeling Team to better improve documentation for experiment creation and execution.

Fix AddExp Script
Continuing where the 2017 Experiment Team left off, the most immediate tasks are ensuring documentation regarding experiment creation and individual scripts is correct. Additionally, this Team will fix the broken functionality present in the addExp.pl script. While the addExp.pl script will successfully create an experiment (either a root or sub-experiment) entry in the given directory on the Wiki, the user experience and feedback delivered by the script needs improvement. This Team will look to update the script to automatically find the highest numbered sub-experiment present in the given directory and add one, also known as auto-incrementing. In its current state, the script asks the user to input a number manually, which can easily lead to gaps in the experiment number sequence as well as present errors if the user does not first check the wiki to see what the most recent experiment number is.

Add Functionality to AddExp Script
Another feature this Team would like to introduce within the addExp script is the prefixing of the user’s name to the Wiki experiment link and experiment page. We would also like to add the UNH login name (ex: ajt2019) to the sub-experiment page for clarity. According to the script’s documentation on the Wiki, these functions were fully functional in previous versions. If possible, this Team will utilize archived versions of the script in order to restore the aforementioned behavior.

Add Unseen Data and Original Configurations to CreateExp Script
Furthermore, one of the major pain points when running an experiment in previous years was that it required the user to run at least a dozen commands one after the other. Among other milestones, last year's Experiment Team created a script titled createExp, which streamlines the process of experiment creation by walking you through it with user prompts rather than forcing you to manually execute commands listed on the Wiki. One caveat to the new script is that it can only create trains using the default configurations. Additionally, there is no means to run an experiment on unseen data.

Our Team would like to continue their work by implementing the previously mentioned components so that other Teams may rely on this script for every aspect of the experiment creation process. After a thorough examination of both the addExp and createExp scripts, our Team would further like to investigate how we could consolidate those scripts in order to both create the documentation for an experiment as well as run it from within the same script. This task will be the biggest goal of the semester for the Experiment Team and requires that addExp is fully operational.

Important links

 * 1.Building a train: https://foss.unh.edu/projects/index.php/Speech:Run_Train_Setup_Script
 * 2.Building a Language Model: https://foss.unh.edu/projects/index.php/Speech:Create_LM
 * 3.Decoding: https://foss.unh.edu/projects/index.php/Speech:Run_Decode_Trained_Data

Task Timeline

 * Documentation and archiving
 * Review documentation and learn about unseen data (Dan B., Jaden, Arias) 2/16 - 2/18
 * Work on the addExp script by adding comments to code and further understand the script as a whole (Dan B., Jaden, Arias) 2/12 - 2/18
 * Ensure outdated scripts on the server are properly archived in the correct folder (Jaden) 2/18 - 2/19
 * Review existing documentation to look for outdated or unclear commands to compare with the documentation created by our Team (Arias) 2/15 - 2/19
 * Debug and add functionality to addExp
 * Implement the ability to auto-increment experiment and sub-experiment numbers on the Wiki by editing the addExp script (Dan B., Jaden, Arias) 2/21 - 2/28
 * Add the ability to prefix the user's Wildcats login name to the sub-experiment entry when using addExp (Dan B., Jaden, Arias) 2/21 - 2/28
 * Run the createExp script and debug potential issues should they arise (Dan B.) 2/21 - 2/25
 * Edit the user prompts within the addExp script to be more straightforward. For example, replace "What is your experiment's name?" with text that gives more of a hint as to what a sufficient name would be (Jaden, Arias) 2/28 - 3/5
 * Optimize addExp, complete documentation, and begin createExp improvements
 * Further optimize functionality of the addExp script, clean up code which the Team collaborated on, and ensure the script is well-documented for future Teams (Dan B.) 3/2 - 3/7
 * Begin the process of making improvements to the createExp script, add comments and understand how to add the functionality mentioned in the Team overview (Jaden, Arias) 3/3 - 3/6
 * Work on the createExp script as well as help determine the best way to add features such as using configurations other than the default and running an experiment on unseen data (Jaden, Arias) 3/4 - 3/9
 * Optimize createExp and gather documentation feedback
 * Gather feedback from other Teams on the current state of experiment-related walkthroughs and documentation as well as the user experience of creating Wiki entries using addExp. After enough feedback is received, appropriate adjustments will be made to the documentation on the Wiki (Dan B.) 3/8 - 3/16
 * Review the Wiki edits made and revise if necessary (Jaden, Arias) 3/10 - 3/14
 * Continue to make improvements on the createExp script (Jaden, Arias, Dan B.) 3/10 - End of semester

=Conclusion= The collective objectives outlined above by the five Teams of the 2018 Capstone combine to complete our goal to improve and streamline the complex processes of speech recognition that have been established by previous semesters. Utilizing Caesar and the seven Drones, the Sphinx 3 trainer and decoder, SCLite, switchboard corpora, and Perl scripts for automation as well as the implementation of our processes and different established models, this capstone will be able to build upon the efforts of previous capstone members.

All components must have great efficiency to prevent a cascade of errors. Team members must become experts in their respective focuses to best progress the overall project as well as be able to educate other team members when the teams split up. Progressing past the end of March, the dynamics of the Teams will change so that two new Teams are formed. These two Teams will be purely focused on the goal of decreasing the WER. It is essential for each member to have expert knowledge of their initial Teams and processes so that they may educate the members of the new Teams to allow a general understanding of all aspects of the capstone project.

We will ensure the reliability, performance and integrity of the System instead of focusing on updating hardware and tools. Focusing on stability by increasing the total number of available servers as well as balancing the access to those servers will make the entire system faster. Implementation of a Monitoring-System and continued functionality of a backup system will further improve the reliability of the system.

Organized, valid data will contribute toward our Teams' efforts. The incorporation of LDA and RNNs into the HMM combined with the previously stated efforts will allow our Teams to complete our goal of reducing the WER to be less than 41.3% on a 300-hour eval.trans.

=Class Glossary= The items defined are representative of their scope in the speech recognition project. Some terms referenced from:
 * Acoustic Model (AM): a statistical model that maps textual speech in a transcription file with the audio that corresponds to it.
 * Baseline: minimum or starting point used for comparisons.
 * Cepstra: A type of file that has analyzed human speech; quefrency analysis.
 * CMU Language Model Toolkit: a set of Unix software tools designed to facilitate language modeling work in the research community.
 * CMU Dictionary: an open-source machine-readable pronunciation dictionary for North American English that contains over 134,000 words and their pronunciations (as phonemes) which is actively being maintained and expanded.
 * CMU Sphinx: a series of speech recognition systems developed by Carnegie Mellon University that utilize Hidden Markov Acoustic Models and n-gram Statistical Language Models.
 * Corpus: a database of audio files and their corresponding transcripts. Plural: corpora.
 * DAG search: A directed graph that has a topological ordering, a sequence of the vertices such that every edge is directed from earlier to later in the sequence.
 * Data
 * Seen data: data that has undergone a train and decode, so that the model has been trained on it.
 * Unseen data: data that has been decoded but not trained in the model, so that the model has not seen it. A worse WER is expected because the model has not accounted for this data.
 * Decode: using the three entities (acoustic model, dictionary, and language model) to deduce the most probable sequence of units in a given signal during speech-to-text translation.
 * Dictionary: a text file of words and their phonemes. It can be edited by hand.
 * Experiment: training and/or decoding an audio file where the system interprets the sounds, creates a transcript, compares that transcript to the official "truth" transcript, and presents statistics on two main items: Word Error Rate (WER) and Sentence Error Rate (SER).
 * Gaussian: Represents the probability density function of a normally distributed random variable.
 * Hidden Markov Modeling (HMM): an approach to speech recognition which is a complex mathematical pattern-matching strategy. It is an effective framework for modelling time-varying spectral vector sequences.
 * Linear Discriminate Analysis (LDA): a machine learning process that recognizes patterns
 * Language Model (LM): a model that represents word patterns and frequency in a body of text. A language model assigns a probability to each word based on how frequently it is seen, and will substitute more likely words based on this probability.
 * N-gram: a contiguous sequence of n items in a given sample of text or speech.
 * Phonemes: distinct units of sound in a spoken language. Examples: P, D, AH, OU.
 * Recurrent Neural Networks (RNN): type of artificial neural network designed to recognize patterns in sequences of data, such as text and spoken words.
 * RCS (Revision Control System): A set of UNIX commands that allow multiple users to develop and maintain program code or documents. Users can make their own revisions of a document, commit changes, and merge them together. It was originally developed for programs, but is also useful for text documents or configuration files.
 * SCLite: A tool for scoring and evaluating the output of speech recognition systems that compares the hypothesized text (HYP) output by the speech recognizer to the correct, or reference (REF) text.
 * Sox: an audio processing tool used to manipulate sections of audio before use in the speech recognition system.
 * Senone: A sound detector. A variety of sound detectors can be represented by a small amount of distinct short sound detectors.
 * Sphinx Decoder: takes a model, tests part of the database and reference transcriptions and estimates the quality (WER) of the model.
 * Sphinx Trainer: learns the parameters for the models of the sound units using a set of sample speech signals.
 * SVN (subversion): An open source version control system written in C. Used to maintain current and historical versions of files such as source code, web pages, and documentation. The goal is to be compatible successor to the widely used Concurrent Versions System (CVS).
 * Train: process of learning about the sound units t create the acoustic model.
 * Viterbi (algorithm): A dynamic programming algorithm for finding the most likely sequence of hidden states—called the Viterbi path—that results in a sequence of observed events, especially in the context of Markov information sources and hidden Markov models.
 * Word Error Rate (WER): a common metric of the performance of a speech recognition or machine translation system. This is generally calculated as
 * WER = S + D + I/N =
 * S + D + I/S + D + C
 * Where
 * S is the number of word substitutions
 * D is the number of word deletions
 * I is the number of word insertions
 * C is the number of corrects
 * N is the total number of words in the reference, N = S+D+C
 * https://foss.unh.edu/projects/index.php/Speech:CapstoneTerms
 * https://cmusphinx.github.io/