Speech:Spring 2011 Chris Log


 * Home
 * Spring 2011
 * Proposal
 * Report

Week Ending March 8th, 2011
I am tasked with keeping the wiki up to date and any other tasks my group leader might decide to delegate too me, as well as added extra content requested by Professor Jonas. I’ve added a start of a history of speech recognition section, indeed much more brief than I would have liked and will be expanded as I can find more reliable history resources. There was an earlier device that I’m tracking down than the “shoebox” but as of now I don’t have enough information on it to be included in the article. “In 1962 IBM at the World’s Fair in Seattle, displayed a device they built called the “Shoebox” boasting the capability to recognize 16 spoken words including 0-9 “plus”, “minus” and “total” It also had the ability to communicate with an adding machine that could process and print simple addition and subtraction problems. The Hidden Markov Modeling (HMM) approach to speech recognition was invented by Lenny Baum of Princeton University and shared with ARPA contractors including IBM. HMM is a complex mathematical pattern-matching strategy that eventually was adopted by many of the leading speech recognition companies including Dragon Systems, IBM, and AT&T.” Ive also added a section about the challenge of speech recognition, “The challenge that speech recognition presents is deceptively complex. How hard could it be right? I understand people even if they have heavy accents and the words are not necessarily in the right order. We as humans take for granted our innate ability to communicate developed over millions of years. Computers on the other hand are literal machines, generally to a machine things are on or off. Speech recognition doesn't listen for words it breaks them up into smaller parts called phones. The reality is that it is not so cut and dry. Depending on the word, its context and/or the speaker himself the meaning of a phone and the very sound itself can have a drastically different value.”
 * Task:
 * Results:

Next week I plan to delve deeper into the history of speech recognition, I will expand my search to safari and the other text resource available to unh students through the library databases. It is actually quite a challenge to find information about the history of speech recognition on the web. Many sources are companies themselves many claiming to have developed the first speech chip using pseudo-history as part of their marketing.
 * Plan:
 * Concerns:

Week Ending March 22nd, 2011
My task this week included implementing a new structure to the OpenITWare Speech wiki site updating existing content and adding the Spring 2011 Proposal and editing it for Wikipedia site. The new structure is to include:
 * Task:
 * Speech:Home
 * Speech:Spring 2011
 * Speech:Spring 2011 Proposal
 * Speech:Spring 2011 Report
 * Speech:Spring 2011 Log

In addition I will be exploring the server “Majestix” and attempting to run sphinx demo.

This consisted of adding a landing page that briefly describes the history and challenge of speech recognition with links to Capstone course pages. Currently there is only Spring 2011 and within this page are group descriptions and tasks. This page contains the links to Spring 2011 Proposal, Spring 2011 Report and Spring 2011 Log. I also edited the Spring 2011 Proposal; added wiki headers to nest it properly, removed all spaces tabs and replaced all bulleted items.
 * Results:

Next week plans include editing current content, updating current documents, adding new items to the wiki site. I also acknowledge new duties as describe in the most recent Proposal document:
 * Plan:
 * Run Mini Switchboard training, completed by Chris Reekie on April 5th
 * a tool that will run a decoding job given an experiment directory containing a test corpus will be created by Chris Reekie on March 29th

I noticed a typo in the Proposal document. I discovered it while adding the document to the wiki, in wiki formatting headings are nested logical order. Under section 3.1 it skips to sub-section 3.1.2 there is no 3.1.1.
 * Concerns:

Week Ending March 29th, 2011
One of my tasks this week includes a tool that will run a decoding job given an experiment directory containing a test corpus. I was also prompted to restart research on Content Management Systems by an e-mail from Professor Mihaela Sabin. After coming to a greater understanding of what a Content management system is and does namely:
 * Task:
 * Results:
 * Allow for a large number of people to contribute to and share stored data
 * Control access to data, based on user roles (defining which information users or user groups can view, edit, publish, etc.)
 * Aid in easy storage and retrieval of data
 * Reduce repetitive duplicate input
 * Improve the ease of report writing
 * Improve communication between users

I then took a new look at some of the offerings. I have found that while Sharepoint has added features the consensus seems to be in line with that of other Microsoft products(nothing personal Mike). That being said there are a few other options:

I chose the PHP modules for the simple reason that many of the higher level CIS classes here at UNHM use it extensively, of these options one of the most popular and well supported is Drupal.
 * Drupal: "Drupal is an open source content management platform powering millions of websites and applications. It’s built, used, and supported by an active and diverse community of people around the world."

No progress was made on the decoding tool

The plan I have is to gather feedback from the group about which content management systems is preferred and assist KC with the desired system. I also plan on communicating with my group leader to reassess time-lines for decoding tool creation and begin research and creation for said tool. Future deadlines also need adjustments Only concerns are those aforementioned timelines
 * Plan:
 * Concerns:

Week Ending April 5th, 2011

 * Task:
 * Wednesday: N/A
 * Thursday: N/A
 * Friday:
 * Studied PERL scripting language, did basic PERL exercise.
 * Fixed image link on Nick's Log page
 * Restarted mail server so users can confirm email, allowing them to be notified when others edit their pages.(must configure in "my preferences" page and then click watch tab on their page.)
 * Saturday: N/A server lag much?
 * Sunday: PERL practice
 * Monday:
 * Rooting around majestix, pun intended.
 * Added commentary to other members pages Brian Log,  KC Log Nick's Log.
 * working on Perl script for decoding.

My task this week included understanding and ultimately creating a decoding script that would run the decoder in an automated fashion

Other tasks included reading other users logs and commenting when I had relevant information that they might be in need of. I tried to understand the decoders by running some demos, the instructions from one that I followed Building
 * Results:

Check if the bin directory already has the HelloWorld.jar file. If not, type the following in the top level directory: ant -find demo.xml Running

First make sure that you have JSAPI setup correctly. Then, to run the demo, type: sphinx4> java -mx256m -jar bin/HelloWorld.jar


 * 1) Make sure that you give it a large enough heap by putting in "-mx256m".
 * 2) Make sure that you are using JavaTM 2 SDK, Standard Edition, v1.4 or higher.
 * 3) If you are running Linux and have problems with the audio, please read the Linux JavaSound section.
 * 4) If you have the source distribution, make sure that the JAR file lib/sphinx4.jar is built. If not, go to the top level directory and type: ant

I could not run it using this tutorial so I followed the link to ensure that JSAPI was setup correctly I followed a simple set of instructions there but it failed saying it could not find JSAPI.jar

I spent the remainder of my time on this to read up and try to gain enough knowledge to ascertain where the error is in my method. I also made several posts on others sites mostly to help them with the intricacies of wiki formatting and use of install extension such as Geshi Syntax Highlighter. I plan on seeking out personal help from Professor Jonas now that some of my other obligations have been met. Im still a bit lost on this but I have been unable to spend the time required until now to gain sufficient understanding.
 * Plan:
 * Concerns:

Week Ending April 12th, 2011
My task this week was to decode decoding and to figure out how to create a jar file, so that we can make our own decoder work without the default files over-riding them. I spent too much time working on foss which limited my time for homework especially with a baby shower on Sunday. I did attempt to create a jar file locally and realized I was lacking some fundamental understanding of what a jar file is. I gained some insight, really just need to sit down and play with it for a bit to really understand it. Because I went way over hours last week Im taking wednesday and friday off next this week to catch up, I hope to make some real progress this week My only concern is time is running out on this semester and Im not where Id like to be.
 * Task:
 * Friday
 * Fixed foss
 * N/A
 * Monday
 * Read logs
 * Played with decoding.
 * Formatted tables in Nick's Log
 * Results:
 * Plan:
 * Concerns:

Week Ending April 19th, 2011
I was tasked with gaining a greater understanding of and automating decoding.
 * Task:
 * Wednesday N/A
 * Friday N/A
 * Saturday
 * Installed sphinx4 on linux box at home.
 * Ran demos to test install.
 * Building my own demo.
 * Sunday
 * I think Ive made a valid demo, when I try to compile though I get "could not make parent directory" I believe either it already exist or its a permission issue.
 * Working on perl script
 * Monday
 * Added image and edited text in Speech:Home

I grew frustrated with searching for files and forgetting where in the directory certain things were, and the occasional arbitrary error. I decided in order to get my head around this program before I shuffle this mortal coil, I would have to get much closer to it.
 * Results:

I happen to have a desktop box sitting around that is the Guinea pig for various computer experiments which had Fedora 14 already loaded, so with much trepidation I set about installing Java SDK, ant and Sphinx4. Some hours, much configuration and confusion later I was quite sure I had done the deed so I fired up ye olde HelloWorld.jar and watched the prompt come up waiting for my very breath, only then realizing I was without recording apparati. Some time later digging in many forgotten boxes of devices that once I could not do without I had a microphone set up an ready. And so I spoke "Good Morning Will" after a thoughtful second and a slight wine from an overtaxed power supply unit, the screen read "You said: Good Morning Will" Huzzah! much drinking and celebration ensued. The next morning as I awoke on the front lawn rubber chicken in one hand and a well worn copy of "War and Peace" in the other, I set out to run the other demos and run they did. I then set out to create my own demo, "easy..." I thought. Little did I know how little I knew.

Unperturbed and with a new found understanding I set out to read the Sphinx documentation much was gleaned from its pale html pages but more questions arose than answers. Who should answer such questions? I set out with email to obtain said answers but with no replies I sit humbled and stymied at my cheap noisy keyboard.

I plan to plan a meeting with Professor Jonas and extract from him information vital to my tasks and to also try to secure some insight into java.
 * Plan:

I have many concerns but as of now they elude me, I shall update when can recall and formulate them into the proper string of characters.
 * Concerns:

Week Ending April 26th, 2011

 * Task:
 * Find the fundamental difference between my installs of Sphinx4 and that of majestix. This is necessary to understand why the Sphinx installs on the servers do not behave appropriately.
 * Contribute to the report.
 * Sunday:
 * Installed Spinx4 on laptop and tested, Works!
 * Monday:
 * Reworded the Overview section of the report for readability and to be more inline with capstones purpose.
 * Results:
 * One difference I came across and was not able to move past was the absence of uudecode utility part of the sharutils package. On both of my installs I had this issue and had to download and install this. Only when you install uudecode can you set up the Java Speech API. This could be the reason that we can not run the demos from the .jar files and is essential to further development of this project.
 * I am working on the arbitrary PERL section assigned to me by Matt, but I also edited some current content of the report.
 * Plan:The plan is to work with the team to find the best way to download and install uudecode. Once this has been remedied I will continue testing of Sphinx to see if we have obtained the proper configuration, if not I will keep searching for possible causes and to address those that arise.
 * Concerns:I am concerned about getting Sphinx fully operational before the end of the semester.

Week Ending May 3rd, 2011
Ive read several walk throughs on this and what follows is parts of each and what worked for me. There were several points where additional utility's and software was needed too many too list and most are system dependent, but generally if it cant find something you need to install it or locate it.
 * Task:Training is the task of the day.
 * Friday N/A
 * Monday
 * Set up SprinxTrain
 * Ran SphinxTrain using an4 data without decoder. Works!
 * Set up Sphinx3 and ran decoder on SphinxTrain's output from an4 data. Works!
 * Results:


 * Make a directory to install our stuff in "SphinxTrain1"
 * 1) cd SphinxTrain1
 * 2) Download the AN4
 * 3) svn co https://cmusphinx.svn.sourceforge.net:/svnroot/cmusphinx/trunk/SphinxTrain
 * 4) svn co https://cmusphinx.svn.sourceforge.net:/svnroot/cmusphinx/trunk/sphinxbase
 * 5) svn co https://cmusphinx.svn.sourceforge.net:/svnroot/cmusphinx/trunk/sphinx3


 * Compile SphinxTrain


 * 1) cd SphinxTrain
 * 2) ./autogen.sh
 * 3) make


 * Tutorial setup
 * 1) cd SphinxTrain1/SphinxTrain
 * 2) perl scripts_pl/setup_tutorial.pl an4


 * SphinxBase & Sphinx3 set up
 * Compile sphinxbase
 * If you used svn, you will need to run autogen.sh, If you downloaded the tarball, you do not need to run it.


 * 1) cd sphinxbase
 * 2) /autogen.sh <--SVN
 * 3) ./configure <--Tar Ball
 * 4) make


 * Compile SPHINX-3


 * If you used svn, you will need to run autogen.sh. If you downloaded the tarball, you do not need to run it.


 * 1) cd sphinx3
 * 2) ./autogen.sh --prefix=`pwd`/build --with-sphinxbase=`pwd`/../sphinxbase <--SVN
 * 3) configure --prefix=`pwd`/build --with-sphinxbase=`pwd`/../sphinxbase <--Tar Ball
 * 4) make
 * 5) make install


 * Running Trainer


 * 1) cd an4
 * 2) perl scripts_pl/make_feats.pl -ctl etc/an4_train.fileids
 * 3) perl scripts_pl/RunAll.pl


 * Decoding it


 * 1) perl scripts_pl/make_feats.pl -ctl etc/an4_test.fileids
 * 2) perl scripts_pl/decode/slave.pl

and the output eventually should be

MODULE: DECODE Decoding using models previously trained Decoding 130 segments starting at 0 (part 1 of 1) 0% This step had 6 ERROR messages and 1 WARNING messages. Please check the log file for details. Aligning results to find error rate SENTENCE ERROR: 59.2% (77/130)  WORD ERROR RATE: 19.7% (151/773)

Bmq29Chris. Tried to run this earlier this week and repeated it today to see if anything had changed. For whatever reason on Verleihnix it doesn't create the an4 directory or any of it's contents. Moving up a directory has an an4 directory, but it doesn't have a fileids file. Also, the data that was generated last Tuesday is on Caesar. Look in /root/speechtools/SphinxTrain-1.0/time/. In etc, there are transcriptions (etc/time.transcriptions), and wav and sph files in wav. If you look on verleihnix in /root/speechtools/SphinxTrain-1.0/train1, there is a script to generate the fileids file called genFileIDs_withoutPath.sh. Having that file, you should be able to generate your features.


 * Plan: I plan to take the training a bit further in the next week, I hope to receive some feedback from the rest of the group about how to proceed further with my task
 * Concerns:None