Speech:Summer 2015 Erol Aygar


 * Home
 * Semesters
 * Summer 2015

Week Ending June 10th, 2015
This document intends to summarize my individual activities about research on Automatic Speech Recognition under the instruction of Prof. Michael Jonas. I attended the Capstone Project course last year, and studied an independent course, “CS980 - Advanced Speech Recognition”, with Prof Jonas during Summer 2014, where I started contributing speech research by building models on an existing speech recognition system that is based on Carnegie Mellon's Sphinx recognition toolkit. We successfully built acoustic and language models using subsets of 250 hours of recorded audio corpus and their transcripts. We also tested these models to recognize discrete recognition tasks. Our main control variable was the error rate on the recognition process, and hypothetical objective was to improve the system to a point that we can generate a world class baseline. We started around .48 error rates, and concluded approximately at .43 level. Secondly we’ve concentrated on learning the fundamental automatic speech recognition terminology and methodology.

We plan to approach this study in twofold. First, the integral part of our study is to continue improving efficiency and effectiveness of the recognition process. Secondly, we will test the applicability of our research findings, where we aim to let utilize our research with a standalone application that interacts with users; answer phone calls, recognize the directives of the user, and take desired actions such as routing a phone number on a specific user on a system. We will continue using the existing speech recognition system, CMU-Sphinx recognition toolkit. Namely, we will continue researching the Natural Language Processing techniques and use these findings to design and implement a Call-Router by using an appropriate fork of CMU-Sphinx.

This week I concentrated on reviewing previous semester reports, as well as Prof. Jonas’ PhD dissertation [Tufts, 2003]. I read the abstract and introduction part in detail, and skimmed the rest of the document. Since the document was written a while ago, there likely exist technological improvement. So I online searched recent papers and online courses on adjacent fields. According to Wikipedia, Natural language processing is a field of computer science, artificial intelligence, and computational linguistics concerned with the interactions between computers and human languages. By following these keywords, I looked up articles that are related with our task. I also encountered online acidic course, in which the authors cover wide range of natural language specific tasks and underlying theory and fundamental algorithms under the hood about speech recognition; such as sequence models like Hidden Markov Models. The following text book: Speech and Language Processing, 2nd Edition Daniel Jurafsky, James H. Martin. I plan to find and review this book next.

On the basis of previous semester's findings and insights from our previous meetings, we envisioned the following directions in our study:
 * language model and acoustic model improvements
 * data cleansing: cleaning up the data and re-generating the models
 * benchmarking the frameworks: compare the performance of CMU-Sphinx versions: sphinx3 written by using C++, sphix4 based on Java, other alternatives)

Week Ending June 17th, 2015
Notes form meeting: update
 * research workflow framework (asterisk)
 * review last semester's artifacts
 * start setting up the environment
 * HW/SW installations
 * Dell PE 2950(development, takehome)
 * Dell PE 2850(mirror)
 * Dell PE 2650(backups)
 * HDD, RAM needed..etc.
 * Red Hat version
 * decide the place on the rack

Week Ending June 24th, 2015
initially, hard drives, trays and power supply units are identified as missing
 * carried the servers back to home, checking each system's configuration, and identifying missing hardware components


 * booted both machine, following are the configuration summary:

2850 2x Xeon 3GHz, 6 SCSI Slots, 800 MHz Bus Speed, 1Gb Ram [Dell Power Edge 2850]

2950 2x Dual Core Xeon 2GHz, 1333 MHz Bus Speed, 6Gb Ram [Dell Power Edge 2950]

Followed the instructions online (foss) start communicating previous semester team Obtained Linux Distro ISO files online, will update with Eduction Licenses Installed on Virtual Machine
 * Operating System: Red Hat Enterprise Edition 6.6


 * Obtained 5 disks, trying to fix the servers at hand
 * planning to get 2650's


 * Found alternative servers and will be checking them on Tuesday.
 * Need 2.5" hard-drives and caddies.


 * Installed RH6.6 on a virtual server and trying to get familiarized with the environment.
 * Planning to contact Prof. Jonas to get the rack space.


 * Hard-drives/HDD: As far as I read from the product specs online, 2950 and 2850's are shipped with the option using 3.5" SAS drives with SCSI interface, with 6 slots. I tried to find the parts online, but it looks they are very expensive because of their shipping costs.


 * Power-supply/PSU: One of the servers (2950) doesn't have a power-supply. There are two slots on the machines and, they look very clean. I've got a power supply but is seems it is not for that server. Other server (2850) looks good in this context.


 * Memory/RAM: One of the servers has 1Gb memory and the other has 6Gb. Both machine have 8 slots. So I am searching more physical memory.

Week Ending July 1st, 2015

 * Obtained Red Hat installation media from UNH-M IT (Red Hat 6.5 Enterprise Server x64)
 * setting up the development environment at home
 * 2950 and 2850 are not working well, but obtained new PE 1950s, testing HW
 * purchased SAS hard drives online, waiting their arrival
 * temporarily enabled one unit for development by using a 2.5" SATA disk

Week Ending July 8th, 2015

 * received SAS drives that was purchased online, tested all of them, and have started OS installation on
 * read about different types of RAID (0, 1, 2, 3, 5, 10, etc) online, decided to use RAID 0 for the application server to have a better I/O performance.
 * trying to find a gigabit switch to connect machines
 * installation still is in progress: trying to figure out Red Hat licensing; following the procedure that was received from UNH-M IT (Bruce Johnstone); Installing Redhat Server 6.6 64-bit
 * Bruce gave the following information about registration:
 * UNH still doesn't have a way to register version 7.0/7.1, so we'll use version 6.6.
 * UNH's "rhn.unh.edu" is down for a rebuild. Bob Kenney is working on the systems level at UNH Enterprise Computing, and he is not recommended to be contact directly.
 * Bruce checked redhat.com, and give the information we have sufficient licenses available
 * Received the keys and a script, which is written by Bruce, for the licensing the server and desktop version. According to the document, it is possible automatic grouping of the machines that will occur on licensing server.
 * Licensing is done. I followed the instructions on the Bruce's document, and have the server registered on redhat licensing server. Tested with yum and still updating.. (as of July 12th, 2015 )


 * started skimming the speech related articles that were posted under the following link: [Related Readings].
 * One of the articles was examining the performance difference between versions of Sphinx tools [[SphinxVerDifferences.pdf]]; found it interesting. The article introduced a question: Do we need portability--and is java more portable than C++-- or recognition performance? I will read the article in detail and considering to test both versions.

Week Ending July 15th, 2015

 * Setting up network for multiple servers at home, I received a cisco switch on Monday and trying to use it to create a home network
 * Online reading about the ways of integrating CMU-Sphinx with Telephony Servers - Asterisk/Freeswitch and OpenIVR(Zanzibar)
 * Reviewing the following paper [[Zanzibar OpenIVR]]

Week Ending July 22nd, 2015

 * By following our naming convention pattern from Asterix cartoon, [pattern] I picked following names for the call router servers.
 * Piraten - 192.168.10.13
 * Falbala - 192.168.10.14
 * checking the OS installation by using the following instructions from last semester. [Configuration] I found the desktop edition not necessary be installed, however, on the installation instructions the servers are configured by using this version. I updated the servers (piraten and falbala) with desktop features enabled.

Version 7 is not the recent version, however it is likely to be more stable. Therefore, JDK 7 (1.7) will be installed. yum install java-1.7.0-openjdk-devel yum install gcc yum install glibc.i686 // wget http://download.oracle.com/otn-pub/java/jmf/2.1.1e/jmf-2_1_1e-linux-i586.bin (this will not work) scp user@host:directory/SourceFile TargetFile scp jmf-2_1_1e-linux-i586.bin root@192.168.1.139:/mnt/install/ chmod +x jmf-2_1_1e-linux-i586.bin
 * Sketched the draft version of software and hardware architecture by using the recommendations on the OpenIVR article
 * installing following components, requirements
 * JRE - Java Runtime Environment: JRE is an implementation of the Java Virtual Machine (JVM), which allows you to run compiled Java applications and applets. JDK includes JRE and other software that is required for writing, developing, and compiling Java applications and applets. So, installing JDK.
 * The following fix is important to install JMF-2.1.1e - follow the instructions on this [post].
 * JMF Configuration [link]


 * openIVR [instructions]
 * JMF 2.1.1 (embedded MRCPv2 Server requires Java Media Framework (JMF) version 2.1.1)
 * JSAPI

Week Ending July 29th, 2015

 * continued setting up the development environment

export JMFHOME=/mnt/JMF-2.1.1e/ export CLASSPATH=$JMFHOME/lib/jmf.jar:.:${CLASSPATH} export LD_LIBRARY_PATH=$JMFHOME/lib:${LD_LIBRARY_PATH}


 * Installing JSAPI


 * Installing Cairo--an open source speech resource server written entirely in the Java programming language. [[title]]


 * Reviewing documentation about Media Resource Control Protocol Version 2 (MRCPv2) [MRCPv2]


 * Client (VOIP Client)
 * Front-end (Asterisk PBX, VOIP Server)
 * IVR Server (Open IVR)
 * Speech Application Server (JVoiceXML, Zanzibar Integration Components)
 * Media Resource Server (Sphinx4, FreeTTS, Cairo MRCP Server)
 * FreeTTS : speech synthesizer [FreeTTS]
 * CMU-Sphinx4 Speech Recognition
 * Cairo MRCP Server
 * back-end (DB Server, Web Server)

apt-get install linphone
 * installed SIP Client [linphone] open source SIP client. It also supports SIP protocol through web browsers.


 * installed Asterisk, and following the following [tutorial] to get familiar with the environment.
 * re-installed Asterisk, as there is a missing file called "app_hangup.so". In order to have the compilation generate the documents, system should have [Doxygen]; installed.
 * established the first phone call via two SIP phones, as of Aug 4th.
 * reading Asterisk documentation.

Week Ending August 5th, 2015

 * followed asterisk tutorial on official wiki [page].
 * installed another server for backup.

Week Ending August 12th, 2015

 * few progress due to busy schedule.
 * During our weekly meeting, we decide to compare the servers (PE 1750 1950 2950)

Week Ending August 19th, 2015
Server Benchmarks Integer (Processor integer performance); Floating Point (Processor floating point performance;  Memory (memory performance);
 * We choose the following criteria in our study to quantify our servers at hand:

Dell Power Edge 1750 Server (single-core) Operating System Ubuntu 14.04.1 LTS 3.13.0-32-generic i686 Processor Intel Xeon 3.20GHz @ 3.19 GHz 2 processors, 4 threads Integer 1193 Flops 909 Memory 412 Overall 923 Dell Power Edge 1750 Server (multi-core) Operating System Ubuntu 14.04.1 LTS 3.13.0-32-generic i686 Processor Intel Xeon 3.20GHz @ 3.19 GHz 2 processors, 4 threads Integer 2304 Memory 384 Overall 2009 Please consult the following [link] to obtain further details.
 * According to the benchmarks online, which can be found on the following [link], our servers have the following values:

Dell Power Edge 1950 Server (single-core) Operating Linux 2.6.32-504.3.3.el6.x86_64 x86_64 Processor Intel Xeon E5430 @ 2.66 GHz 2 processors, 8 cores Integer 1725 Flops 1551 Memory 779 Overall 1466 Dell Power Edge 1950 Server (multi-core) Operating System Linux 2.6.32-431.29.2.el6.x86_64 x86_64 Processor Intel Xeon E5430 @ 2.66 GHz 2 processors, 8 cores Integer 13040 Flops 11997 Memory 1235 Overall 10261 Please consult the following [link] to obtain further details.

Dell Power Edge 2950 Server (single-core) Operating System Linux 2.6.32-431.29.2.el6.x86_64 x86_64 Processor Intel Xeon E5405 @ 2.00 GHz 2 processors, 8 cores Integer 1410 Flops 1255 Memory 757 Overall 1217 Dell Power Edge 2950 Server (multi-core) Operating System Linux 2.6.32-431.29.2.el6.x86_64 x86_64 Processor Intel Xeon E5405 @ 2.00 GHz 2 processors, 8 cores Integer 10327 Flops 9137 Memory 1174 Overall 8020 Please consult the following [link] for the benchmark details.

Additionally, specification documents for each server can be found on the following link.
 * Dell PowerEdge 1750 Server [PE 1750]
 * Dell PowerEdge 1950 Server [PE 1950]
 * Dell PowerEdge 2850 Server [PE 2850]
 * Dell PowerEdge 2950 Server [PE 2950]

Week Ending August 26th, 2015
1. Caesar (Current) 2. Asterix (PE1950) 3. Automatix (PE 1950) 4. Miraculix (PE1950) 5. Traubadix (PE1950) 6. Majestix (PE1950) 7. Idefix (PE1950) 8. Lutetia (PE 2950)
 * The following are names are assigned to our servers.

Week Ending Sep 29th, 2015
1. Methusalix (192.168.10.3 / .33)
 * The following are names are assigned to our servers.

$ vim /etc/hosts 192.168.10.1   caesar caesar 192.168.10.3   methusalix methusalix 192.168.10.8   automatix automatix 192.168.10.11  lutetia lutetia 192.168.10.12  brutus brutus
 * replace the following

# 192.168.10.1   caesar caesar # 192.168.10.2   asterix asterix # 192.168.10.3   obelix obelix # 192.168.10.4   miraculix miraculix # 192.168.10.5   traubadix traubadix # 192.168.10.6   majestix majestix # 192.168.10.7   idefix idefix # 192.168.10.8   automatix automatix # 192.168.10.9   methusalix methusalix # 192.168.10.10  verleihnix verleihnix # 192.168.10.11  lutetia lutetia # 192.168.10.12  brutus brutus $ service network restart $ gedit /etc/sysconfig/network NETWORKING=yes HOSTNAME=methusalix NTPSERVERARGS=iburst HOSTNAME="methusalix.unh.edu" GATEWAY="192.168.10.1" FORWARD_IPV4="yes"

https://www.asterisk.org/sites/asterisk/files/mce_files/documents/asterisk_quick_start_guide.pdf https://github.com/cmusphinx/sphinx4 http://my.fit.edu/~vkepuska/ece5526/SPHINX/Sphinx4.pdf

Week Ending Oct 6th, 2015
Maven setup for build

download wget http://apache.mirrors.tds.net/maven/maven-3/3.3.3/binaries/apache-maven-3.3.3-bin.tar.gz tar xzfv apache-maven-3.3.3-bin.tar.gz

copy mv apache-maven-3.3.3 /usr/local/apache-maven setup linux export M2_HOME=/usr/local/apache-maven export M2=$M2_HOME/bin export PATH=$M2:$PATH source ~/.bashrc setup mvn (APPLE) open ~/.bash_profile # maven export M2_HOME=/usr/local/apache-maven export M2=$M2_HOME/bin export PATH=$M2:$PATH export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.7.0_75.jdk/Contents/Home

check if maven is working mvn -version

install sphinx4 $ cd  $ mvn install $