Speech:Spring 2018 Lamia Mukanovic Log


 * Home
 * Semesters
 * Spring 2018
 * Proposal
 * Report
 * Information - General Project Information
 * Experiments - List of speech experiments

Note for Mac users- when installing Pulse Secure do some research the link that is provided is for PC users.

Two websites I found useful: 

https://www.smith.edu/its/tara/networking/pulse_mac.html

http://www.lse.ac.uk/intranet/LSEServices/IMT/guides/workingOffCampus/installing-pulse.aspx

=Week Ending February 5th, 2018 (general and useful information)=

To Login to FileZilla: Host: caesar.unh.edu Username: this is from your school email (what is before the @ sign) mine is lm2020 Password: your unh email password Port: 22

To figure out which drones are available, ssh into Caesar then type: more  /etc/hosts

We refer to "seen data" as test on train.trans We refer to "unseen data" is dev.trans and eval.trans- Don't forget this!

Task
1/30- Today Jonas explained the project more and what we were going to do. My group and I (software group) will be the first semester of students to look at the code, therefore we don't have a direct path to follow. My team consists of Danielle LeBoeuf, Wesley Couturier, Faruk Durakovic, and Joshua Young. One of my team members created a discord channel for us to message in and do voice meetings. Another student from the class made a channel for everyone to join so we could all message each other and ask questions when needed. In addition, I spent some time looking into different directories of Caesar and what they contain. As the software group our job will be to decode the code and tell Professor Jonas what it does.

1/31-Today I spent the majority of the time getting more familiar with caesar and its directories. Since my task will be to decode the decoder I wanted to play around and see what all the directories in caesar contain. I also went through previous student's logs to read and get more familiar with that they did ad try to re-create the same stimulation. Even though we will be focused on the code portion I want to learn what the other groups will be doing and read up as much as I can from previous years/students. A helpful site I have reading in is


 * https://cmusphinx.github.io/wiki/tutorialconcepts/

It talks about sphinx and gives a great tutorial break down of what speech recognition is. Also, a couple of my group members and I talked with Jonas about our role. He told us to look at the mnt/main/root/sphinx3 path and look at the .c files, and try to find the main function. After we find the main it would be a little easier to trace and figure out what some of the files do.

2/1 - Today we had a little meeting with Danielle, Stephen, Dan B, and Jaden where we all worked on getting to understand more of the project, how scripts work and how to create an experiment. I went through some more .c files to find the main but some of the files are long to read on the terminal. Then I installed FileZilla to drag some of the files to my Desktop and read them. As I did that one of my team members Wesley, told me to use "grep" and search through the files. I tried that but I did not any results. I think I might have to "grep" into each file separately because they maybe too big to search through five or six files at once.

2/5 - One of my group members Wesley located the main files in the path:
 * "/mnt/main/root/sphinx3/src/programs"

I have been looking at all the .c files that are located in programs. Using FileZilla I took all of the files and put them on Desktop so I can look at the files in on big text editor. I can access the files through command line but there many lines of code and it is hard to read, therefore copying the file to your Desktop is easier to analyze the code. I have been reading about speech in https://cmusphinx.github.io/wiki/tutorialconcepts/ this is a great site that breaks each part and what it does. Also I looked through previous semesters logs to create an experiment. I would like to run my first one sometime this week.

Results
1/30- We ended up adding everyone in the group to a discord channel. We talked about meeting once a week to discuss the code we have found and answer any questions. At this point we think that weekly meetings on Sundays with our team will be helpful so we can regroup and gather everything.

1/31 - After speaking with Jonas towards the end of the day I have been thinking of an idea how we will split up all the .c files that are located in the decoder. I have been trying to go through the files to find the main, but so far I did not find it. In addition, I am continue to do reading about the set up of program.

2/1 - Wasn't able to "grep" through about 18 files so I think I will try individual files tomorrow. I also found out that the code was written in C++. I saw this when I was using FileZilla and I was in the the folder with the .c files. There is a column in FileZilla where it says "FileType" and for that it said C++, but I did see a C# file as well.

2/5 - I have not been able to decode any of the files located in the path /mnt/main/root/sphinx3/src/programs. It is still a little confusing what each part in speech recognition is actually doing. I have read some of the URC posters in the hallway and they are helping me understand this project more.

Plan
1/30 - Our plan is to get a tree of all the files in the decoder. Since there are a lot of files for us to go through we need to figure out how many files there are. One of my group members is going to create a tree with all of the folders, sub folders and files so we know the directory paths. 1/31- My plan is to continue the reading from the link provided above and to take down any notes that I find useful, and find the main function. I was thinking that it might be nice to get a group page going for my team because we need to document everything that we do. The plan for the next group of CS students is to take off where we left off and since we are first group doing this it will be important to outline our plans, results, and outcomes. Having everything in our logs will be great but I think a page where we can post code and write what it does will be helpful to us, Jonas and the next group. Having the code in one place and not on different logs will make it easier to remember who did what, and what files are completed.

2/1- To go through smaller .c files using "grep" or any other command line prompts I find to search for the main function. I also need to document which files do not contain the main function so that myself or my group do not check those files again.

2/5 - My plan is to run an experiment either tomorrow before class or sometime this week. In a discord channel some of the classmates gave a useful link for running their experiment. I plan on reading this and side by side creating my own. I figure if I run the experiment in school I can ask for any advice if I were to get stuck. Also my group, Jonas and myself need to figure out a way to split up the files we have found, and decide which are the most important files to begin with. I also want to talk to Jonas about having either a wiki page, or some other kind of documentation where the software group can keep a place of all the decoded- code. That way all the files we have completed and the description of each can all be in one place. I thought maybe a wiki page with images and comments but I need to check with Jonas and the group.

Concerns
1/30- Since we don't have much to go on I am concerned we might not be able to do everything that we want. So far we are all great at communicating and deciding what to do. As the weeks we go we will need to create a clear path on what we want to accomplish so we don't get off track.

1/31- I feel as though I am not doing a good job of documenting what I have done so far. Looking at previous log of students it is clear what they did and their steps. I need to make my logs more detailed and easy to follow for the next group to be able to pick up where I and my group left off.

2/1- One concern I have is making sure we don't relook at files that someone else has done. So far that has not been a problem but as we dive into the files more we will need to update each other on who has done what file. Another concern I have is not being to understand some of the code because I am not too familiar with C++ or C#.

2/5 - Looking at the code it is a little overwhelming, some files have 500+lines of code. Going through it will be a little difficult because we need to know a lot of background about each file. I hope that we as a group can come up with a way to divide each file and take on a little bit at a time so we don't have analyze 1,000 lines of code in a couple of weeks. Another concern I have is working parallel with other work. In class Jonas said we have to work on our own individual team but also know what the other teams do as well. I was thinking to tackle on one team per week that way I have knowledge how and what each team does.

=Week Ending February 12, 2018 (creating and running an experiment)=

Task
2/6- I had two tasks today, one was to run an experiment successfully and the other was to figure out how my group and I will parse through the main files. Since a couple of people in the classes were running experiments and professor Jonas said we all should be able to run an experiment at some point I figured why not earlier than later. I figured that after today's class I would be able to stay longer and ask my classmates for help on how they ran their experiments. We had some class time and that is where I ran my first experiment with other classmates.

2/8- Today I tried to finish up running an experiment. On the wiki page https://foss.unh.edu/projects/index.php/Speech:Create_LM it says
 * Model Building
 * Step 1: Run a Train- this I did by creating a 30hr train in my experiment folder and another 5hr train just for backup to use for a short time.
 * Step 2: Create the Language Model - Even though I created a LM folder I do not think I did it correctly. I wasn't able to run all the executed lines that are in this portion.
 * Step 3: Run the Decode- I am testing on trained data but I was unable to complete this last step.

After going through some of the wiki page I was getting confused where all the directory and paths were coming from. In my terminal as I would execute some of the commands it would say " Went to copy bin from source, and...  " I think this is because my corpus path is wrong. Instead of moving on and trying a bunch of different commands, I took a step back and went to look through the corpus. Looking through the corpus there are many trans, train files under different directories.
 * For example if I looked into /corpus/switchboard/full/train/trans the outputs would be different if I looked into
 * corpus/switchboard/30hrs/train/trans

Going through the wiki page is helpful but I also think knowing what is in each path is also important. If something is not working it might be because the path or directory its in is not correct.

2/11 - My task for the weekend was to successfully run an experiment, but I was not able to run it. I have deleted and started from scratch an experiment four times. I was on discord and my classmate Dan B told me that if I run into an 'error' to delete my main exp folder (in my case it is Exp/0303/004) and start all over again. In the beginning I was creating sub-folders in my main folder and that could of caused some problems when doing steps two and three (creating the LM and decoding). I did not want there to be any problems with the directory path ways so I started from the beginning and created 5hr trains whenever something didn't work.
 * However remember where you put your experiment in, I found it useful if I only had one experiment at a time. Therefore my path for my experiment  every time  I ran a LM was '/Exp/0303/004' any other variation would not work.
 * In addition my group and I had a meeting to finish the proposal for the class. Another one of our classmates, Camden said he would he make it sound as though one person wrote it. All of the groups had to write their own part (my group did that during the meeting) and then at the end we would make it one document. Overall I think the document sounds good for a rough draft and it does not sound like many people wrote it but instead it sounds cohesive.


 * Our rough draft can be found here: https://foss.unh.edu/projects/index.php/Speech:Spring_2018_Proposal

2/12- I went into my experiment folder and I did not find a scoring.log file. I tried once more to execute the steps that were posted in the Step 3: Run Decode Trained Data ( https://foss.unh.edu/projects/index.php/Speech:Run_Decode_Trained_Data ) but I still kept getting the same error. The error I keep getting is a question mark after I copy/paste the last line from the wiki page. I run
 * sclite -r 004_train.trans -h hyp.trans -i swb << scoring.log

As my last line in the terminal and doesn't do anything, no one else has received this error. I have reached out to the group in discord to see if anyone is available to help me tomorrow before or after class and through the steps with me, so I can see where I went wrong. I will document for next week what I did wrong and why I kept getting a " ? " as the output. I also went through my decode.log and compared it to Dan B who ran a successful experiment.

Results
2/6 - To run my first train I did the following:
 * 1. ssh into Caesar (I ran my first experiment on Caesar but it might be better to run it on another drone)
 * 2. go to /mnt/main/Exp/(your group exp folder-usually created by Jonas)/your individual exp folder you created mine was /mnt/main/Exp/0303/004
 * 3. Use https://foss.unh.edu/projects/index.php/Speech:Run_Train_Setup_Script but I will restate the steps below that I took
 * 4. cd into the group exp. folder
 * a. ls
 * b. cd in your own folder exp. folder
 * 5. Run the train
 * a.  makeTrain.pl switchboard 30hr/train   (Note**you can do 10hr, 30hr does take a little bit)
 * 6. Note ** in the Wiki link above it says to run makeTrain.pl -t switchboard 30hr/test, but myself and a few other classmates did not run because when we ran /test we encountered some problems
 * 7. Run Feats
 * a. genFeats.pl -t 
 * 8. Run the train
 * a.  nohub scripts_pl/RunAll.pl & (Note ** the & allows the program to run on the server even if your ssh connection breaks)

if you want to delete a directory (or messed up) you can do rm -rf your directory # (but be careful to be in the correct directory!) or using FileZilla, find the correct path in caesar (left hand side of FileZilla) mine was /mnt/main/Exp/0303/004 and right click on 004 and hit delete. I deleted mine through FileZilla because it was easier and I was certain I was only deleting the contents my exp folder, nothing else.

After completing these steps, using FileZilla I located the .html file on caesar mine was located in /mnt/main/Exp/0303/004, at the bottom of my 004 directory (right hand side of FileZilla), found a folder on my Desktop (left hand side of FileZilla) and then dragged over the .html from caesar to my Desktop. Then I located the folder on my Desktop and clicked the .html that opened up a webpage. I compared my first experiment with another classmates to make sure they had the same information.


 * Note ** if you get a Connection to 132.177.189.63 port 22: Broken pipe- don't panic! it means the ssh connection is broken but check your local folder for your .html file it should be there, the process will keep running on the server, even if the connection breaks

Note** if you are comparing experiments be sure they run on the same machine with the same environment (i.e caesar, Rome, asterix, etc),

Note **  to see all the machines available:
 * 1. ssh into caesar
 * 2.  more  /etc/hosts

2/8- For me writing down stuff helps a lot more than just memorizing it. As I went through the corpus I wrote down some paths I saw in the wiki about running an experiment. I think looking into these directories/paths more will help me understand what they contain and which path to follow when running an experiment. So far I did not complete the results on running a successful experiment. But I think with more knowledge about each directory it will help me understand the wiki page better.

2/11- I was not able to fully run a successfully experiment because I was not getting the 'scoring.log' as a final result.
 * I used : https://foss.unh.edu/projects/index.php/Speech:Create_LM for steps 2 and 3 (basically copy and paste into the terminal)
 * I ran the decode on 'trained data' https://foss.unh.edu/projects/index.php/Speech:Run_Decode_Trained_Data
 * I was able to get through the train, LM but at the decode portion I never received the correct output. Twice I ran the decode and instead of getting the 'scoring.log' in my terminal I would get a ' "?" '
 * I checked the errors section of step 3 and no one else saw a ' "?" ' as their output. I will let it run in the background and maybe tomorrow I will have the ' scoring.log' file in my /etc directory. Since I tried twice, and I deleted everything prior to starting both experiments, I am not sure why I am getting the same result for both times. I have been talking to Dan B and he walked me through the same steps he took and the code he ran.
 * Also, during our group meeting we were able to finish up our proposal and decide our tentative plan for the rest of the semester.

2/12- Looking at the decode.log I was not able to piece together much, because there was a lot of information. I want to check out other semester logs and see if anyone else went through the decode.log file. Going through it quickly, my decode.log and my classmates Dan B decode.log looked fairly similar. There were some different numbers but that is minor compared to what the whole document contains. I also took screenshots to show my classmates what I kept getting as my output, in case something happens to my terminals.

Plan
2/6- After talking with Jonas and my group we have set up a plan and a couple of questions for all of us to answer/complete. Our plan is to check CMU sphinx 3 main files and compare to ours, if that doesn’t work look up other sphinx 3 main files and compare to ours, find documentation on the individual files themselves, all go through main files- pick which ones you think are the most important main file and describe why (also written in our Software group folder under tasks). Since we need to parse through a bunch of main files that each contain 500+ lines of code we are trying to figure out which of the files are most important. Then as a group we will pick some of the files to analyze and decode- the code. I also want to check out some of the audio files and wha they do/what they contain. Then I need to finish up the reset of the steps in the Wiki file Model Building, because I only did the first step run a train and there are two more steps (Create the Language Model and Run the Decode ) 2/8- My plan is to run an experiment successful this week or next week before class. I still need to do some research for my software group, and look at the main files. Tomorrow if possible myself and Dan B from the experiment group will get together via online (Skype, Discord) and see if he can help me in anyway. Last week Dan B was able to help me run my train because he had some insight on what worked for him and what didn't work. I also plan to have a group meeting on Sunday at 4:30pm with my group to complete our proposal. For the next couple of days my plans are to run an experiment successfully, go through the main files, and finish our proposal.

2/11- My plan is to try to run a new experiment (from scratch) tomorrow night. I would like to finish up a successful experiment before class on Tuesday 2/13. I will check tomorrow if I have a scoring.log file in my directory, if I don't I will try once more.

2/12- My plan is to meet up with a classmate before class tomorrow and run an experiment successfully. Hopefully I am able to run an experiment successfully and then I can backtrack through my other failed experiments and check what I did wrong. I want to understand where I went wrong so I can tell others, and document it for next semester. I also plan on going through the main files and choosing one to work on with my group, we will probably decide this tomorrow in class. I created a small visual image of the path directory we will be working in and I will upload it to my log. This will be a quick picture for students to see some of the paths that are inside this speech recognition software.

Concerns
2/6- I think our group is off to a good start. A small concern I have is how we will be able to decode all the main files. I think it will be a little complicated in the beginning to figure out where it all starts but after that we should be able to trace the function calls and figure out what they are doing. Another small concern I have is being able to understand what every other group is doing as well as working on our decoding. My plan is to work at a slower pace than the other groups just so I am able to understand everything they do.

2/8- Only concerns are not having enough time to do what I want for this week. I have two plans that I want to finish which are running an experiment successfully and going through the main files. These two tasks will take some time because I am not familiar with either of them. If I am unable to run an experiment successfully tomorrow then I will put that off and focus on the main files and looking through them. If Dan B or anyone from the experiment group is able to run an experiment over the weekend, I may ask for some help next week so I can get caught up.

2/11- Couple of minor concerns I have. First concern is I am curious why I keep getting a '?' as the out after I run the decode. Second concern, is the actual decode but we are doing that portion as group which I think will be helpful. Also I wish I had more time to spend one some material that I enjoy, but I will work on my time management skills.

2/12- I am a little concerned that some of my classmates are not falling behind. I feel as though it is the same group of students who is always doing work while some do not put a lot of effort. I feel as though if those students put in at least 10 hours a week we would all be getting better results. Thankfully my group is being communicative and helping each other if we have any questions.

=Week Ending February 19, 2018(successful experiment & decode information)=

Task
2/13- Met before class and FINALLY ran a full experiment! Today before class I met with people from my class from different groups that helped me run a full experiment. I ran all three steps for an experiment about 5 times before I was able to successfully run a full experiment. Even though it took a little bit longer to run a successful experiment it taught me what each line means and what it actually does when executed.

2/15- My group and I plan to meet tonight and discuss our proposal in more detail. We need to fix the draft that we handed in to Jonas.

2/18- Meeting with my group tonight to see where everyone is and the plan for next week. I need to install Visual Studio 2017 but I think i have reached my storage capability. I think everyone in my group has downloaded but I am not sure if everyone has connected the Sphinx3 files to Visual Studio. I will need to figure out a way to clear out my memory in order to make room for Visual Studio because it will important for this project. I also need to look at the main_decode.c file and do my part of it. At the end of the file I want to put a conclusion in simple terms of what the file does.

2/19- Reading through the main_code.c to understand what is happening. Try to decode the last couple of lines of an if/else statement and write a conclusion about the whole file. So far think the file is taking audio files, turning them into text files then converting them into binary files for the computer to be able to read.

Results
2/13- Below are the full steps that I took when creating a successful experiment. The steps that I followed are from https://foss.unh.edu/projects/index.php/Speech:Run_Train_Setup_Script They have separated them into three sections, but I will combine them into on below. Steps for seen: First create a train:
 * 1. /mnt/main/Exp/(your class folder)/(your folder), ex. mine is /mnt/main/Exp/0303/004
 * 2. makeTrain.pl switchboard 5hr/train ** you can pick from 145hr 300hr  30hr  5hr (I suggest 5hrs for the first time, until your experiment is successful)
 * 3. genFeats.pl -t
 * 4. nohup scripts_pl/RunAll.pl &

Then create a Language Model (LM)
 * 5. /mnt/main/Exp/0303/004 /LM Note** create the LM in your Exp folder
 * 6. cp -i /mnt/main/corpus/switchboard/5hr/train/trans/train.trans trans_unedited
 * 7. parseLMTrans.pl trans_unedited trans_parsed
 * 8. cp -i /mnt/main/scripts/user/lm_create.pl.
 * 9. lm_create.pl trans_parsed

Lastly run the decode- be careful executing these lines
 * 10. awk '{print $1}' /mnt/main/corpus/switchboard/5hr/test/trans/train.trans >> /mnt/main/Exp/0303/004/etc/014_decode.fileids replace 0303/004 with your own experiment directory
 * 11. nohup run_decode.pl 0303/004 0303/004 1000 &       replace 0303/004 with your own experiment directory  ### actual decode command-run the DECODING COMMAND
 * Note- If desired in another terminal window in //etc, RUN tail -f decode.log to watch the decode process, which must run to completion before running the next command. BE AWARE, a 5hr will take about 3hours for this step, a 30hr takes 10hrs, A 300hr...WELL. If you continue without waiting your experiment will be not be correct!
 * 12. parseDecode.pl decode.log hyp.trans
 * 13. sclite -r 014_train.trans -h hyp.trans -i swb >> scoring.log

This may take a couple of tries, but it will work. Every time I had a problem with my train,LM or decode I would start all over again. Starting over is not ideal but it clears your experiment folder to allow you to start from the beginning. To delete/clear my experiment folder I would either do it through FileZilla.
 * 1. If FileZilla go to the top directory (mine would be 004), right click and hit delete.
 * 2. Through command line, you need to be in the folder you want to delete the contents in (navigate to your experiment folder, mine is 004). This command will delete the contents in the folder but it will not delete the folder itself.
 * rm -rf *

In order to see that you have run this correctly you will have a scoring.c file in your /mnt/main/Exp/(class exp folder)/(your experiment folder)/etc (mine would be /mnt/main/Exp/0303/004/etc/mnt/main/Exp/0303/004/etc) directory

2/15- As a team we went through the proposal and broke it down some more. Wesley download Visual Studio 2016 and confirmed that it would be beneficial for all of us to download it. I have used visual studio 2017 before and it is helpful when it comes to big projects. Since this project is a bunch of files it is hard to see how and where they all connect. By downloading visual studio each group member will have the ability to look at the code and each file, in a more organized way. We have started to edit some of the code in the main_decode.c file. We have copied/pasted the code from the main_decode.c and our proposal in our group wiki https://foss.unh.edu/projects/index.php/Speech:Spring_2018_Software_Group therefore everything is in one place.

2/18- Met with my group via discord and talked for about 45mins-1hour. We also submitted our proposal and Camden sent us a final draft of what it looks like. I like the final draft but it was hard for my group to follow the same format as everyone else. Since we don't have anything to look at from last year, we will mostly likely tackle assignments and files together for the first couple of weeks and see how it goes. We also need to talk to Jonas and see what we should do next. We are almost done decoding the main_decode.c (in directory /mnt/main/root/sphinx3/src/programs).

2/19- Almost finished with the main_decode.c file.We have gone through it line by line but we still need to figure out what the file does as whole and where it is being used. In the code it has a line at the end of the code that says
 * system("ps -el | grep sphinx3_decode");

Although this is commented out it talks about the searching through sphinx3_decode. Maybe we can talk about this file next because it is mentioned in the main_decode.c file. I will also need to talk with my group and see if they have any theories on what this main_decode.c is doing.

Plan
2/13- The main plan for today is to run a successful experiment. After working with my classmates before class I was able to run an experiment and I realized what I did wrong. The mistake that I kept making was that I did not change the < > in the last line of the decode step. Even though I had to start all over multiple times I have learned what each line of execution controls in the project.

2/15- The plan for the next couple of days is to download and set up visual studio 2017 on my computer. I am not sure I will have enough space on my computer to install it. I will to make sure I give my computer enough time to install and pull everything that is needed. I also plan to look into SVN and how it can be combined with this project. I think that a version control system will be helpful for this project because it allows students to edit. Before creating and working wit ha version control system I think it would be useful to install sphinx3 on a new server and see how SVN would work with the software.
 * In addition, I think that we should look into Git because that is where the future is and it will be more useful than SVN in a couple of years. In addition, I also want to look into log4net. It would be useful to have in this project because it would create a "log" of all the edits/history of the codebase from one semester to the next. Having a logging system would benefit future generations because it would have all the information about the software

2/18- The plan is to finish the main_decode.c file and to get familiar with what the file does. I need to finish the if/else statement at the bottom of the code. Looking at the code it seems to be a little complicated. I hope to read through it and get a better understanding so that I can write a conclusion at the end of the file. Decoding this line by line isn't too complicated but figuring out what it does and how it connects to the whole project (or even a part of it) is what we need to figure out.

2/19- Talk to my team about the main_decode.c and see if we all understand the primary focus of this file. Then we need to talk to Jonas and see what he want us to do next. We can move on to another file or look into this file some more. I am not sure how in depth Jonas wants us to be in the decode of the files.

Concerns
2/13- The only concern I have is to make myself and my group useful when we split up into two big groups. At this time we are only looking at the code, but I do want to get into the server room with the system group and see what they do. I am curious about all the aspects about this software and what each group does. I feel as though if I have a better understanding of what each group does I will be able to understand the codebase.

2/15- Only concern I have is when I download Visual Studio or any other program that my laptop is able to handle it. Since I have a Mac, I need to be careful about the applications I download to make sure they are the correct one. Also I need to make sure I leave enough time to do all the stuff I want for this project.

2/18- Just a little concerned with other student's logs, they seem to be empty. A lot of students write they have been in group meetings which is great but I want to know what that particular student has done and their plan for the week. I also realize that some students wait until Sunday/Monday to finish up every log entry. This is a little pointless because as their classmate I don't know what they have been working on for the week. Also a little concerned I don't have all the time to spend on Capstone as I want to. There is so much to learn about and I want my group and I do the best we can for this semester to get a great foundation for the next software group.

2/19- Similar concerns as before about not having enough time to do everything. I need to create some better time management so that I can do research on information I am unclear about. I think we are on a good path to decoding the files. I just don't know how much we need to know about each file and what it does.

=Week Ending February 26, 2018 (going through main_decode.c)=

Task
2/20- Get Visual Studio working on my mac and get feedback about proposal. I have download Visual Studio 2017 on my Mac to have the sphinx3 project on my computer. With Visual Studio it would organize the project and put the files in order so they are all in one place. As of now I just have a folder on my desktop for this class but with Visual Studio it would have all the files of the project and it would connect them together. For some reason I am getting errors when I upload the solution file to my Visual Studio. Hoping to get some help from one of my group members during class or after.

2/24- Need to look at the unanswered questions at the bottom of the main_decode.c file and try to see if I can connect the project to Visual Studio. If the connection to Visual Studio will take a couple of hours I need to see if it will be worth it. Since I am the only group member that has Visual Studio installed on a Mac, that could be a problem. One of my other group members was able to get it installed and the solution running on Visual Studio without many problems.
 * Asked Jonas about poster presentations at the end of April. Nothing was mentioned in class but one student brought it up and I wanted to email and ask if we have to create a poster.

2/25- Reading other student's logs in class and catching up on all the new experiments the modeling/data/experiment group have been doing. I only ran one experiment but I would of loved to run an experiment on unseen data and compare the results. Even though this is not something the software group does it would be interesting to compare the results.
 * I am also looking to find the corpus.c file to start decoding it. Once I find it, I will post it to the group wiki page where all of my group members can edit it.

2/26- Finish up the question on the group wiki, and check out what other groups have been up too. My group and I has not determined if we are going to start to decode the corpus.c file or if we will continue with the main files. I need to ask them this question tomorrow and also ask Professor Jonas about getting the sub pages for our wiki pages.

Results
2/20- Did not a chance to look at my Visual Studio problem with my group member. Instead myself and the whole class are worried about the proposal, since Professor Jonas did not like the layout. In class we went through our group sections and fixed what Professor Jonas suggested. I received a message on discord that we need to change the layout of the overall proposal. We are getting an extension on our due date because not all of the groups have the same format. We were told to follow either 2014 or 2015 format and the class choose 2014, but then Jonas said he wanted us to follow 2015 format. Therefore my group and I had to get together over discord to fix our proposal for the third time.
 * Note follow year 2015 project proposal https://foss.unh.edu/projects/index.php/Speech:Spring_2015_Proposal
 * This is our final project proposal we (the class) submitted https://foss.unh.edu/projects/index.php/Speech:Spring_2018_Proposal

I do think that ours looks better than when we first started. It takes a little bit of patience to complete this proposal because there are 19 people in the class and we all have to make it sound like one person wrote it. At the end we were able to pull it together and create a great proposal that sounds like one person wrote it.

2/24- Answered some of the questions on the bottom of the group wiki page and looking into corpus.c

2/25- corpus.c is located in /mnt/main/root/sphinx3/src/libs3decoder/libcommon

2/26- This is from our group wiki, these are the questions/comments I had about the main_decode.c file. I answered the questions that I could this week, but question 4 is still unanswered.


 * 1. look into corpus.c (any lines related to -ctl_lm)- did a CONTROL + F, couldn't find ctl_lm but did find many instances of ctl in corpus.c
 * 2. what is MLLR? Maximum Likelihood Linear Regression, uses linear transformation of Gaussian model parameters to adapt to a given speaker.
 * for more information about MLLR check out this PDF page 8 http://www.cs.jhu.edu/~juri/pdf/mllr-rwth-2005.pdf
 * 3. what does st -->tm mean?
 * 4. what is the string -ctl mean? what does it control? wmc - It appears to point to a file with acoustic WAV file references within the [Experiment]/[Sub-experiment]/etc folder. The file is typically [sub- :experiment]_decode.fileids
 * 5. file to possibly look at next, sphinx3_decode and corpus.c- found the corpus.c file it contains 820 lines of code, not sure if we want to start on this one or continue with the main files we found

Now that most of the questions have been answered, I will either talk to my group or Professor Jonas about the last one so it gets resolved. I also read up on other student's log and in particular the Exps folder created for this year.
 * https://foss.unh.edu/projects/index.php/Speech:Exps_0303

Read about 25-30 of the latest experiments that my class members have done. The experiments are detailed and I am able to see what the groups have been doing this week. I also read other student's individual logs. I talked with Dan R from the systems group about helping him tomorrow after class. Since I would like to know what each group does, by working with the systems group in the server room it would give me first hand exposure. In addition, I stayed active on discord and talked with other classmates when I had questions/ideas about anything.

Plan
2/20- The plan is to take a look at the main_decode.c and get the rest of the questions/comments answered.
 * https://foss.unh.edu/projects/index.php/Speech:Spring_2018_Software_Group

This is where we keep all of our findings for the main_decode.c file. We need to answer the questions about this file to move on and get a better understanding of what it does. If we cannot answer some questions we will need to reach out to Jonas. Also we need to determine what is the next file we will be looking at. We can either look at sphinx3_decode or corpus.c. I need to ask my group and see what they think.

2/24- Talk to group and see if they can answer the remaining the questions on the group wiki. If they can't we will ask Jonas for his help and see if he can clarify any other questions we have about the main_decode.c file. I also want to spend some time with the Systems group on Tuesday in the server room. I am curious what they do and what they have done so far. Last Tuesday, I couldn't go in the server room with them because the proposal was due and everyone was focused on that.

2/25- I need to talk to my group and Jonas about organizing our group wiki page. Our group wiki page so far has the decoding of the main_decode.c file. Since I have found the corpus.c file I don't to put them on the same page because it will be a lot to look at. The corpus.c file is a lot longer than the main_decode.c file. I was thinking that we can have one general page for the group and then have subpages after you click into the main group wiki file. Organizing the subpages would be determined by the file that we have decoded, i.e one subpage for main_decode.c, another page for corpus.c and so on. This would help our group by not confusing us which file we are coding, and it would be easier to read for the users.

2/26-
 * Talk with team tomorrow about answering question 4 on group wiki
 * Ask Professor Jonas about sub-pages for wiki
 * Pick a new file to start to decode/come up with group conclusion about what main_decode.c is doing
 * Print main_decode.c for the server room? (Need to check with group)
 * Meet with Dan R after class in server to discuss what his team has been up too/ any changes they have made in the server room

Concerns
2/20- So far the only concern is if Professor Jonas will like this final project proposal that we have submitted. It took us a little bit to get every group to fix their section and for multiple people to read over the document. But it does match the format of the 2015 proposal and we have a great table that links all the individual and group logs.

2/24- Just a minor concern that it will take a little bit until we figure out what the files do and how we can improve anything on this project. Since we don't have much guidance it is hard to understand what this project is doing while understand what each group does.

2/25- Only concern is decoding the corpus.c file. It is longer than the main_decode.c There seems to be a lot more going on in this file as compared to the main_decode.c. But I think that as group we will be able to work together to understand this file and hopefully find the connection to main_decode.c.

2/26- So far the only concern I have is to finish up the main_decode.c (i.e get a group conclusion and answer question 4), and then decide what file to look at next. We will need to work on decoding a file while also completing our individual responsibilities for the proposal.
 * Looking ahead in the week, this is our Team timeline that we submitted for the proposal:
 * Look into Revision Control System (RCS) and Subversion (SVN) (Wesley, Danielle, + Team) 2/20 - 2/28
 * Determine which one is best suitable for Sphinx3 and set up files in version control system (Wesley, Danielle, +Team) 2/20 - 2/28
 * Set up desired version control system on Majestix, because we will be doing software updates on this drone (Wesley, Danielle, +Team) 2/28 - 3/7
 * Confirm it is compatible with Sphinx3 and overall Speech Recognition Software - (Wesley, Danielle, +Team) 2/28 - 3/7
 * Test to see if the version control system is working on Majestix, and record any problems it encounters - (Wesley, Danielle, +Team) 2/28 - 3/7

For the proposal we broke off certain tasks together that group members can do. The tasks above my two team members said they would do, but we will also help them as group

=Week Ending March 5, 2018 (information about decode and questions asked by Jonas)=

Task
2/27- Had class today, and scheduled code review with Professor Jonas for the software group. 3/1- Google hangout with my team and Jonas to go over the main_decode.c file. We have decoded the main_decode.c but we still need a little more background information about it. We planned to have some code reviews during class but we haven't had a chance to meet yet. 3/3- Meet up with Dan R from Systems Group in the server room and figure out what the Systems Group has been up too. I have been reading their logs and they seem to be doing a bunch of stuff in the server room, such as installing/fixing the drones. I have been trying to meet up with the Systems groups for a while now but our schedules don't seem match, therefore today was the best day for both of us.

3/5- I want to fix up the group wiki log a little bit and read through other students logs.

Results
2/27- Didn't get a chance to have the code review with Professor Jonas but scheduled it for Thursday at 9:30pm over Google hangout. Wrote up an abstract for a conference about speech recognition software with one person from each group and Professor Jonas. This is basically a giant URC for students and we are trying to see if UNH capstone would qualify to attend. We wrote this abstract during the latter part of class and submitted it for March 1st.

3/1- The meeting was successful, we were able to talk about some of the questions we had and Professor Jonas gave us some more guidance into our group. During the meetings I took some helpful notes to refer too later on.

Notes from meeting:
 * Figure out if sphnix3 and sphinxbase are the same code, and why we have both files, do we need them both? or can we just keep one
 * Came to the conclusion to us sphinxbase over sphinx3, but why...


 * Look for redundant code files
 * - Document which files they are
 * - Check if we need both of the files


 * Look into the Test folder to see what is in there, and what the files mean
 * - Test folder in mnt/main/root/test


 * Look at usr/local and compare to mnt/main/local
 * - usr/local = bin games  include  lib  libexec  man  sbin  share  src  var
 * - mnt/main/local = bin games  include  lib  libexec  man  sbin  share  src  var


 * GCC complies sphinx3 (not sure what this exactly means, but we need to look into it
 * Different installations, on drones, we need a plan, they should all have the same programs on them
 * GRAM (at the end of the file) means it is a tool kit
 * - figure out where these files are (forgot path way)

used.
 * Is sphinx_fe being used?
 * Professor Jonas created a delete directory so we can put anything that we think needs to be deleted in it. This is to clean up some of the directories and figure out which directories/files are being
 * - Delete directory= mnt/main/root/ DELETE-sp18


 * mnt/main/local = good directory
 * mnt/main/root/test = CMU-Cam_Toolkit_v2
 * mnt/main/root/tool = CMU-Cam_Toolkit_v2
 * - Figure out if we can delete one
 * - Find other code that uses these paths


 * Look for other directories to delete and move them to DELETE-sp18
 * Look at the tiny_train doc for more insight
 * mnt/main/test/sphinx doesn't work, need to verify this
 * Look at Makefile (files)
 * - this tells you which files are being used when the program is complied
 * - the Makefile complies program by source files


 * Explained st --> tm
 * - st is a variable, but tm could be a method or value, need to figure out


 * From main_decode.c
 * - kb_init(&kb, config); line 159 is important to look at

3/3- Successfully meet with Dan R and Yashna from the Systems group in the server room today. Dan R taught Yashna and I a lot about servers and how to configure them. Today I learned a lot about what the Systems group has been doing in the server room and what their goals for the rest of the semester. This is a lis of of Dan R, Yashna and I did today
 * Relabeled all of the drones (front and back side of the rack)
 * Relabeled Methusalix (Professor Jonas gave it a new name)
 * Added a couple of more screws to secure the rack of servers
 * Changed the cmos battery for a couple of the drones
 * - Automatix, Asterix, Idex, and Miraculix have new cmos batteries
 * - Dan R personally taught in 4 new cmos batteries, he replaced 2 and allowed Yashna and I to replace the other 2


 * Manually configure date/time (under BIOS settings) after replacing the cmos battery
 * Changed power strip location for Asterix and Automatix
 * In Automatix we had to put a bootable O.S because it had nothing on there, and we had to partition it manually.
 * - did a bunch of work on this because we had to do it manually
 * - Dan R taught Yashna and I all the steps to take when manually configuring Automatix to copy over Asterix on it


 * Rome, Caesear, Obelix, Majestix need new cmos batteries
 * - Research cmos battery for Caesar (Power edge 2900)

3/5-
 * Fixed up group wiki page a little bit. The formatting was hard to follow and I moved the main_decode.c into its own sub page. Before it was hard to read with a lot of information split up between the sub page and the main page. https://foss.unh.edu/projects/index.php/Speech:Spring_2018_Software_Group
 * Read student's logs and I saw that the Data figure out how to auto increment the numbers using Perl! YAY! This will be helpful for our class and for the next generation.

Plan
2/27- Planning to meet with my group and Professor Jonas on Thursday to decide what we need to complete/ask any questions we have.

3/1- My group and I need to figure out a way to answer all these questions. All the information above Professor Jonas asked us about and we weren't sure how to answer it. In the beginning we weren't given too much guidance other than to decode main_decode.c and work our way from there. After this meeting we have a little more guidance into what we should focus on and complete by the end of the semester. If aren't able to complete everything then next year's Software group will be able to start completing that task. In addition, I plan to meet with Dan R from the Systems group on Saturday in the server room. I am curious what the Systems group does and all aspects regarding the servers.

3/3- During my time in the server room, I took a bunch of hand written notes and I had to transcribe them into my wiki. I took down some personal notes as well in case I had some time to go back in the server room so I could have notes to refer too. As for my group I need to look into the questions that Professor Jonas asked us during the Google hangout and see what I can answer. I need to dig around some of the directories so I can map out what they all do. I thin creating a visual picture of the directories would help.

3/5- I need to meet up with Josh and figure out a plan to Recompile / Update System Programs. That is our next big task for the Software group, and that is due in about 2 weeks or so. I want to go over some more detail with Josh and figure what the best and most efficient way to do this. If possible I want us to do this together so I know what is going and we can double check each other if needed.

Concerns
2/27- No major concerns, waiting for meeting on Thursday to see what the next steps are for my group and I. We have decoded one file and want to start another one but talking with Professor Jonas on Thursday will give us a good idea of what he wants and what we need to complete.

3/1- Only concern I have is about looking at the redundant files throughout the system. I think it would be great if we can find redundant files and delete them because they aren't being used anywhere. But in order to delete them or even move them into a delete folder, we need to verify that the file isn't being used anywhere else. But I think with the help of my group members we will be able to check all the directories/files and confirm a file isn't being used before we do anything to it.

3/3- As of now no major concerns, I just need to work on the questions that Professor Jonas asked my group. I want to be able to answer all the questions that Professor Jonas asked my group and I on Thursday so that we can leave documentation for next years group. Just a side note, I love to re-organize all the cables in the server room because I think it would be more helpful. I talked to Dan R about this and he says he has plan to fix the cables this semester.

3/5- Small concern I have is having to work in parallel when we split up into two big teams. I feel as though most of my classmates are on a great track on completing individual team tasks. It will be interesting to see what will happen when we split into two groups and what we can come up with.

=Week Ending March 19, 2018 (research & recompile) =

Task
3/12-Jonas sent around an email about our teams and what the software group has to continue to do for this semester.
 * I am in Avengers Team for the rest of the semester
 * Jonas has me doing the recompile with Wesley. From the email this is what I have to do for the rest of the semester.
 * Look in the directory mnt/main/install/tar/cmu
 * recompile in sphinx3 in the directory /root/sphinx not in mnt/main!
 * 1. There are multiple versions in this directory, for the recompile I need to figure out which version to use (and which version we are using).
 * Jonas says, "You'll need to first understand what we are using since I want you to re-create it and use the same versions."
 * Using tar -tvf filename.gz you can get a list of what's inside
 * Gcc and g++ is installed on Rome (version 4.4.7-18)- From Jonas's email
 * Create an RCS project and then re-compile the source
 * When you do(recompile), do a umount -a before the recompile to be sure thatCaesar is not mounting its file system
 * From Jonas's email- Note that /usr/local on Rome is its own directory and not a link to Caesar and that should not change but you can do a redundancy check by doing this command:
 * ls -ald /usr/local
 * it should not show up as a (->) link to /mnt/main

3/13- Starting to look at mnt/main/install/tar/cmu to first figure out which version to recompile. Jonas has told us multiple time that there is duplicate files and we need to find them.

3/15- Start to dive deeper into the 6 directories inside of mnt/main/install/tar/cmu to figure out which version I need to recompile. So far I have created a visualization using Xmind Zen to lay out what is inside of mnt/main/install/tar/cmu, and those findings are below. Now I need to read through the README files, the Makefiles, and anything else to figure out what version we are using for sphinx3.

3/16- Ended up sending Jonas an email last night asking which version he thinks to recompile.

Results
3/12- Taking these tasks I need to organize it in order to recompile sphinx within the next three weeks.

3/13- I started to create a visualization about the directories inside of mnt/main/install/tar/cmu. Below are the 6 directories that are inside of mnt/main/install/tar/cmu and the useful information that I found:
 * an4_sphere.tar.gz
 * From the README file- This directory contains the Census (AN4) database audio files. Some files from the original database were excluded, namely those
 * with filenames starting with "cen9". The AN4 database was recorded at Carnegie Mellon University circa 1991. For more detailes, please see "Acoustical and environmental
 * robustness in automatic speech recognition", by Alex Acero, published by Kluwer Academic Publishers, 1993.
 * The files are in NIST's Sphere format, where files have a 1024-byte header describing the data format.
 * The directories contain:
 * -wav/an4_clstk: training data set recorded on close talking microphone.
 * -wav/an4test_clstk: test data set recorded on close talking microphone.
 * -etc: directory containing the transcriptions, control files, dictionary etc.


 * speechtools.tar.gz
 * below are important directories that are located in speechtools
 * speechtools/CMU-Cam_Toolkit_v2.tar.gz
 * speechtools/sphinx4-1.0beta5-bin.zip
 * speechtools/sphinxbase-0.6.1.tar.gz
 * speechtools/SphinxTrain-1.0.tar.bz2


 * sphinx3.tar.gz
 * This contains the EXTACT SAME files/folders in the exact SAME order as cmusphinx-sphinx3. This might be a duplicate (need to check), need to move this


 * cmusphinx-sphinx3.tar.gz
 * From Release Notes- This version requires sphinxbase 0.3.
 * From Release Notes- This version has been tested on the following platforms:
 * - i686-linux (RedHat 9, Fedora Core 5, Ubuntu Feisty)
 * - x86_64-linux (Fedora Core 3, Fedora Core 5, Ubuntu Feisty)
 * From README- This is Sphinx 3.7 (s3.7), contains Linux, Unix, and Windows installation guide
 * From README- usr/local/share/sphinx3/doc/  (below are the important files to look at)
 * - s3_codework.html : Code review document
 * - s3_description.html : Code review document
 * - s3_overview.html : Code review document
 * - s3.2.ppt : The powerpoint presentation
 * - FAQ.html : Frequently asked question page
 * - sphinxman_manual.html : Manual for training
 * - sphinxman_FAQ.html : FAQ for training
 * - sphinxman_misc.html : Miscellaneous information for training
 * - models.html : Description of the default broad cast news model


 * sphinx3-0.8.tar.bz2
 * - Very similar if not the same README as cmusphinx-sphinx3.tar.gz
 * - There are more files and folders in this than in cmusphinx-sphinx3
 * - How can I which is being used, sphinx3-0.8 or cmusphinx-sphinx3???

3/15-Going through countless README files, Makefiles and tracing down directories (also matching different directories and files) trying to gets some more sight into what version to recompile. Haven't got far because there are many directories/folders/files to look at and compare. I have going through files I think are important and parsing through them to see if I can find anything useful. The files are not easy to understand, so I am not sure if I am looking for the correct thing.

3/16- Jonas replied back and said
 * Extract the ReleaseNotes for the two identical it say 3.7 (which I'm guessing is designated as 3.0.7 by the CMU folks). You can go to their source website for more info:
 * https://osdn.net/projects/sfnet_cmusphinx/releases/
 * Also, check out our own website as it also gives you info:
 * https://foss.unh.edu/projects/index.php/Speech:Software
 * extract and compile sphinx3-0.8.tar.gz which seems to be the latest. So extract that into Rome's /root/sphinx3 directory and go from there. Create and run a Caesar 5hour train    : and test-on-train experiment so you have your baseline. After re-compiling, use the new sources on Rome and re-run the test-on train.

Plan
3/12- Since this week is spring break I want to be able to start this compile and start the research to figure out which versions are duplicates and which version we are using. 3/13- Now that I have a better understanding about what is inside mnt/main/install/tar/cmu I need to figure out which version we are using and which one I need to recompile.

3/15- Keep looking through the 6 directories inside of mnt/main/install/tar/cmu, if I am unable to find anything by tonight, I will send Jonas an email and ask for his input so he can tell me which version we should compile from the notes I have taken.

3/16- My plan for the next couple of days is
 * - To read the links Professor Jonas sent me.
 * - Create and run a 5 hour experiment on Caesar for baseline (train and test-on-train experiment)
 * - Extract and compile sphinx3-0.8.tar.gz on Rome's /root/sphinx3 directory
 * - Recompile and re-run the test-on train experiment to compare to baseline results before (assuming everything is done correctly, the results should be the same)

Concerns
3/12- I am little concerned with working in the software group as well as in the bigger Avengers group. I will beed to manage my time well in order to work well in both groups. I am hoping that this week of spring break I am able to get a little head of the schedule.

3/13- I hope I am able to find which version to recompile, if not I will reach out to Professor Jonas and ask for his advice. Since some of the directories inside of mnt/main/install/tar/cmu are similar files I do not want to recompile the wrong version.

3/15- Hoping to do this recompile correctly! I think meeting with the class next week help because I can tell them what I have found and they can tell me what to do next, or their thoughts about what I should do next. I think it will be more helpful with other input.

3/16- I would like to get this recompile done sooner rather than later. Jonas gave us a due date for the recompile to be finished on March 29th. This is plenty of time to do the recompile, so I need to manage my time between my Avengers team and my software team in the next 2 weeks.

=Week Ending March 26, 2018 (recompiling)=

Task
3/20- We had class today, and I was able to meet with my software team to discuss this recompile. The Avengers team is suppose to meet today to catch up and figure out a goal for the next couple of weeks. I would like all of us to work together in order to solve the overall goal of the class and get the word error rate down.

3/24- I was talking and straightening out the cloning situation with my team and the systems group. Since we are split up into two teams now, the Guardians and Avengers we have 3 drones per team.
 * The Guardians have
 * Majestix
 * Miraculix
 * the cloned drone
 * The Avengers (my team) has
 * Asterix
 * Obelix
 * Idefix

When we first decided these drones my team originally thought that Asterix had LDA installed on it but that is not the case. During class last week Professor Jonas mentioned how he wanted the systems group to double check that Miraculix has LDA and to clone that on Automatix. This weekend we were straightening out who in the systems group would do that.

3/25- Looking my to-do list for the recompile I need to delete all the folders/files in /root/sphinx, then I need to copy over the sphinx3-0.8.tar.bz2 from Caesar to rome. I will be following the instructions from
 * https://foss.unh.edu/projects/index.php/Speech:Install

in order to do the recompile.

3/26- To unzip sphinx3-0.8.tar.bz2 on rome. Per Jonas's email I realized that I cannot unzip sphinx3-0.8.tar.bz2 because of the extension .bz2, instead I need to find another zipped file. I was able to unzip sphinx3-0.8.tar.gz on rome and recompile the sphinx3 decoder in /root/sphinx/sphinx3.

Results
3/20- In Rome, /root/sphinx I will recompile sphinx3-0.8.tar.bz2 because of the link below
 * https://foss.unh.edu/projects/index.php/Speech:Software

It says Sphinx Decoder, version Sphinx3.7. When this is clicked there is a link for sphinx3-0.7, but we have sphinx3-0.8, since that is the most recent one (and Jonas confirmed) I will recompile that in rome.


 * https://foss.unh.edu/projects/index.php/Speech:Install

Another good reference I will be reading to do this recompile. In the sphinx3-0.8.tar.bz2 there is an INSTALL file I will look at, and in the README file there is a small section about Linux/Unix installation.

3/24- We were able to figure out who would do the cloning which will be Dan R and Yashna from the systems team since they are on the Avengers. Dan R kept us updated on Saturday as he was in the server room in school about cloning Automatix. On Saturday he sent us a message on discord saying that Automatix is cloned and running, and he just needs to modify the IP scheme.
 * In additional from Professor Jonas email he sent us this about the drones that we picked and their current states
 * asterix
 * /usr/local is a link to /mnt/main/local
 * it has a full copy of speech binaries (since it's accessing the central store)
 * this means this machine is most dangerous if modified tell systems team group members Dan and Yashna


 * obelix
 * /usr/local is actual directory, not link to /mnt/main/local
 * /usr/local/bin has full copy of speech binaries


 * idefix (this is the same as obelix)
 * /usr/local is actual directory, not link to /mnt/main/local
 * /usr/local/bin has full copy of speech binaries

After the cloning takes place we will give the Guardians either obelix or idefix for their cloned machines.

3/25- I was able to delete all the files/folders in /root/sphinx to have clean area to work in. In order to delete I did the following steps
 * 1. ssh into Caesar as root
 * 2. ssh into rome as root
 * 3. went to /root/sphinx (its above /mnt/main)
 * 4. typed sudo rm -rf whatever (the whatever stands for anything you want to delete)
 * 5. you can also do rm -rf * ( be in folder, will delete all contents of folder but not folder)

I realized the easiest way to get a zipped folder which is what sphinx3-0.8.tar.bz2 is from Caesar to rome would be to make a folder
 * In mnt/main/ on Caesar I created a folder called sphinx3, where I copied sphinx3-0.8.tar.bz2 using Filezilla.
 * - I logged in as root on FileZilla then I dragged sphinx3-0.8.tar.bz2 which was unzipped from my Desktop to Caesar
 * After sphinx3-0.8.tar.bz2 was in /mnt/main/sphinx3 on Caesar, I logged into rome
 * From rome I went into /mnt/main/sphinx3 which contained sphinx3-0.8.tar.bz2 and I moved it to /root/sphinx/sphinx3
 * In order to move it from /mnt/main/sphinx to /root/sphinx/sphinx3 I used the command
 * - sudo mv frompath topath
 * - my case it was sudo mv /mnt/main/sphinx3 /root/sphinx/sphinx3
 * Now that I had sphinx3-0.8.tar.bz2 on rome in /root/sphinx/sphinx I need to extract the tar file on rome.
 * - using tar -xvzf ______ (the blank is for the file name) I tried to extract the zipped file on rome, but I ran into some problems. I tried to extract it a couple of times and a couple of different ways but I had no luck.
 * - I did figure that -xvzf stands for
 * v-verbose (what happens when you execute the command)
 * z- stands for .gz at the end of the tar file
 * f- stands for file
 * But I kept running into the same error every time I ran that command, in my terminal it would say
 * gzip: stdin: not in gzip format
 * tar: Child returned status 1
 * tar: Error is not recoverable: exiting now
 * I googled this issue then I tried
 * which gzip
 * gzip -V
 * After all that didn't work I emailed professor Jonas and he told me to find a gzip version of sphinx 3.0.8 on sourceforge (https://sourceforge.net/projects/cmusphinx/files/) the .bz2 file requires bzip2 to decompress, and he doesn't think that rome has bzip2.

3/26- Steps needed to install Sphinx 3 Decoder on Rome
 * 1. Go to Sphinx Download Page and browse the folders.
 * 2. Find - Older releases and files could be found on  SourceForge
 * 3. Since we still running Sphinx3, navigate to the Sphinx3 folder, then click 0.8, and lastly click sphinx3-0.8.tar.gz. This is the final page you should end up on Click to download for Sphinx3-0.8
 * 4. Once you click sphinx3-0.8.tar.gz it will download, I suggest placing it in a folder on your Desktop.
 * 5. Since Rome does not have same directories as Caesar, I had to make a folder on /mnt/main/ called sphinx3, therefore on Rome and Caesar I had a directory called /mnt/main/sphinx3
 * 6. On Rome, under /root/sphinx, Professor Jonas created an empty folder called sphinx for the recompile.
 * If there isn't folder you can create one using mkdir
 * 7. I have a Mac, and I used FileZilla for sftp program to put it on the desired server (i.e Rome)
 * 8. On the left hand side of FileZilla is your local machine, navigate to where the download of sphinx3-0.8.tar.gz is (should be in a folder on your Desktop)
 * 9. On the right hand side, navigate to the common directory of mnt/main/sphinx3.
 * 10. Drag the sphinx3-0.8.tar.gz from the left hand side to the right hand side to put it on the server.
 * 11. On the terminal, navigate to /mnt/main/sphinx3 on Rome
 * 12. Once you see sphinx3-0.8.tar.gz in /mnt/main/sphinx3 you can now move it to /root/sphinx
 * 13. To move it use the following command sudo mv frompath topath
 * In my case it would be sudo mv /mnt/main/sphinx3 /root/sphinx/sphinx3
 * It will already create the folder sphinx3 there is no need to create a folder called sphinx3 on /root/sphinx
 * 14. Unzip the sphinx3-0.8.tar.gz using the command
 * tar -xvzf
 * In my case it was tar -xvzf sphinx3-0.8.tar.gz
 * 15. A bunch of files will extract in /root/sphinx/sphinx3
 * 16. Run the command sphinx3-0.8/configure or ./configure
 * Be careful there are other configure files make sure it just says ./configure (no extension after)
 * 17. Per Professor Jonas's instructions I had to change the Makefile
 * 18. After running ./configure, if you ls into /root/sphinx/sphinx3 you will see a list of 12 folders
 * config.log | doc | libtool | model |  sphinx3-0.8(unzipped Sphinx3 decoder)  |  sphinx3.pc  | config.status|  include | Makefile | scripts| sphinx3-0.8.tar.gz (zipped Sphinx3 Decoder) | src
 * 18. In order to edit the Makefile
 * Use nano (for some reason emacs doesn't work on Rome)
 * nano
 * ctrl + W (searches)
 * current bin directory /usr/local/bin
 * I changed it to root/sphinx/sphinx3
 * ctrl + O, saves the changes, hit YES to save the edited file
 * ctrl + X to exit

Plan
3/20- To do the reading on the foss page, and in the files under sphinx3-0.8.tar.bz2 to get a better understanding on the recompile. Also I need to move sphinx3-0.8.tar.bz2 to /root/sphinx (on rome) and un zip it to start all of this. Hopefully getting it done within the next few days. 3/24- While talking to the class/group I want to figure out which drones have LDA on them. We know that the error word rate count will be lower if we run an experiment on a drone with LDA. I was told it has something to do with the python and it could be in one of the python libraries. I would like to find evidence of LDA being in a library or finding a path were LDA is installed. No one really knows which drones have LDA.

3/25-I need to look on https://sourceforge.net/projects/cmusphinx/files/ for the gzip version of sphinx 3.0.8 and this whole process all over again. For now Professor Jonas said to only recompile the Sphinx3 Decoder and not the Sphinx Train, CMU Cam Toolkit, Sphinx Base like it says on https://foss.unh.edu/projects/index.php/Speech:Install

3/26- I need to extract the sphinx base into Rome on /root/sphinx but Professor Jonas told me to just complete the Sphinx3 Decode, which I have done. I am currently waiting to hear back if I could extract sphinx base or run a 5hr experiment to test if I did the recompile correctly. In the email Jonas's says to run two experiments to compare the existing decoder (i.e. in /mnt/main/local/bin on Caesar) with the newly compiled one (i.e. will be in /root/sphinx/sphinx3/bin on Rome). You can run a small 5 hour non-LDA training experiment (say call it 0306/001) that builds your models. Then run a standard test-on-train decode using current installed decoder (this can be 0306/002) and then run another standard test-on-train decode using the newly compiled decoder in Rome's /root/sphinx/sphinx3/bin directory (this can be 0306/003). If all goes well, the results should be identical.

Concerns
3/20- I hope I don't mess up this recompile and I am able to do it successfully without running into too many errors!

3/24- Need to dive deeper into the recompile because it is due in 3 days. I wrote a to-do list for the recompile, it looks something like this
 * 1. Delete all the contents in root/sphinx there was some files/folders that need to be removed
 * 2. Copy/unzip actual sphinx3-0.8.tar.bz2.in /root/sphinx/sphinx3

3/25- Only concern is time. This recompile is due in 2 days and I am working all day tomorrow and I have a class. Therefore I need to find time to finish this recompile because it is due on Tuesday. Tomorrow I will look at the website gzip version of sphinx 3.0.8 and run the same process as I did above. I think I will be able to run through it more efficiently since I ran though it a couple of times today but kept running into the same error.

3/26- I am a little concerned because when I go in to Rome /root/sphinx/sphinx3 I don't see a bin folder or directory. While on Caesar there is a bin located in /mnt/main/local/bin. I will talk with Professor Jonas tomorrow and see if he know why I don't have a bin. Maybe it's because I didn't download/extract the sphinx base?

=Week Ending April 2, 2018(more on recompile and Avengers group duty)=

Task
3/27- Today we had class and I didn't get the best feedback that I would of hoped. Since the recompile is the only thing left for the software group to do that has been my main focus in the last couple of days. Though I think I understand what the recompile is meant to do I think I will face some difficulties along the way. Basically what the software group needs to do it to recompile sphinx3 and sphinxbase to rome on /root/sphinx recompiling will allow us to see if it will recompile without any errors or if it has errors what are they and how we can fix them to make this recompile successful.

3/29- Talked with my group and it would be great to know how to exactly check if any of the drones have LDA on them. We were told that Miraculix has it and that is the cloned drone of Traubadix which is what my team now has. Since we were just told that Miraculix has LDA we aren't sure how someone can tell, other classmates told me it could be a python library installed somewhere.

4/1- Diving back into the recompile on rome and finishing up the URC poster. Danielle and I have created a poster that we keep adding too, but I want to go through it once more and get it ready to be passed in on Tuesday.

4/2- Run a 5hr seen and unseen Exp to compare results, then run it using LDA and compare the differences. I need to run a 5hr unseen data to catch up with my team but I wanted to run a 5hr on seen to compare the results I get with unseen.

Results
3/27- Not the nest feedback from Professor Jonas during class. After completing the steps above I was told that I did not compile anything and that I was working in the wrong directory. In /root/sphinx there were some other folders and directories that were there with sphinx3-0.8. While going through the above steps I did encounter some issues and at the end I did not result a 'bin' folder under /root/sphinx/sphinx3-0.8. In Professor Jona's email he said to run a test on train that compares Caesar /mnt/main/local/bin and rome /root/sphinx/sphinx3-0.8/bin but I was never able to find a bin directory (therefore I don't think the recompile or compile worked at all). I guess it could be worse by taking down rome or messing up previous directories which I didn't do, but at the same time I didn't recompile/compile anything. Hopefully this week I am able to look into this more.

3/29- Haven't figure out too much about this. Because I pulled down zipped folders from /mnt/main/install/tar/cmu most of the zipped folders had a python folder which I was looking into. While in the python folder I saw a setup.py file that I took a look at. It talked about libraries called library_dirs, and define_macros. Next to library_dirs, and define_macros it gave a directory that I tried to follow but couldnt' re-trace completely, not sure why. I talked with one of my group members and we will try to figure out so that we can exactly back track if a drone has LDA or not.

4/1- Recompile/compile day! I spent all day reading about the compiling of sphinx3 and sphinxbase. Now that Professor Jonas has cleared up /root/sphinx/ on rome.
 * Currently on rome under /root/sphinx there is DELETE sphinx3-0.8
 * DELETE- has all the previous folders/files that were the incorrect ones that I "thought" I compile but didn't.
 * sphinx3-0.8- is the unzipped sphinx3 decoder I downloaded from https://sourceforge.net/projects/cmusphinx/files/ unzipped and moved to rome (this is where I need to compile and run an experiment that should be successful when the recompile is finished)

From professor Jonas's email- So, I think you need to download the sphinxbase that goes along with the 3.8 decoder (look at sourceforge for it). Then extract the sphinxbase in /root/sphinx. I don't believe you need to compile it. For configure, we have already changed the first instance of /usr/local to /root/sphinx. The other one to change is this line: -I/usr/include/sphinxbase -I/usr/local/include/sphinxbase" change it to: CPPFLAGS="-I/root/sphinx/sphinxbase/include"

Note here that sphinxbase will likely need to be replaced with something more specific like sphinxbase-0.6.1 or whatever the latest one is and also note that it should reside in the top level of /root/sphinx (i.e. don't add another level of directory).

Then you run ./configure (if you add ./ you can run it right form within the sphinx3-0.8 directory) DONE. That should generate the appropriate makefile and then you can try and run make. At this point you should see each of the .c files compile one by one. Then search for the .o files to see if you can locate them. Note you may run into some compiler errors in the source file.


 * Reading the README from /root/sphinx/sphinx3-0.8 it says
 * Linux/Unix Installation: Starting from Sphinx 3.7, you need to compile sphinxbase before
 * compilation of sphinx3.DO need to recompile sphinxbase


 * These are the steps I took next:
 * 1. Went to https://sourceforge.net/projects/cmusphinx/files/sphinxbase/0.8
 * 2. Saw sphinxbase-0.8.tar.gz which is unzipped in /root/sphinx- so i will use sphinxbase-0.8 seems to be the latest one
 * 3. Downloaded sphinxbase-0.8-win32.zip to my desktop, then unzipped it to my desktop:
 * 4. Connecting to FileZilla (caesar.unh.edu, root, root password, port 22)-moved unzipped sphinxbase-0.8 from my desktop to /mnt/main/sphinx3
 * 5. Using /mnt/main/sphinx3 because it is a common directory with rome and i can move it from /mnt/main/sphinx3 to /root/sphinx while in rome
 * 6. Moved from /mnnt/main/sphinx3 to /root/sphinx using command
 * sudo mv /mnt/main/sphinx3/sphinxbase-0.8-win32 /root/sphinx
 * 7. In /root/sphinx there is sphinxbase-0.8-win32 (good this is the sphinxbase)
 * 8. Moved zipped sphinx3-0.8.tar.gz file to DELETE folder in /root/sphinx- no reason to keep it, moved using command
 * sudo mv /root/sphinx/sphinx3-0.8.tar.gz /root/sphinx/DELETE
 * 10. Now sphinxbase is extracted to root/sphinx


 * in README it says (/root/sphinx/sphinx3-0.8)-
 * sphinxbase is used by other modules. The convention requires the
 * physical layout of the code looks like this:
 * package/
 * sphinxbase/


 * So if you get the file from a distribution, you might want to rename
 * sphinxbase-X.X to sphinxbase by typing
 * i did this > mv sphinxbase-X.X sphinxbase (where X.X being the version of sphinxbase)-- mv sphinxbase-0.8-win32 sphinxbase
 * moved to mv sphinxbase sphinxbase-0.8 - to know the version
 * before it was sphinxbase-0.8-win32 now its just sphinxbase-0.8


 * If you downloaded directly from the Subversion repository, you need to
 * create the "configure" file by typing
 * > ./autogen.sh


 * If you downloaded a release version or if you have already run
 * "autogen.sh", you can build simply by running
 * > ./configure
 * > make


 * If you are compiling for a platform without floating-point arithmetic,
 * you should instead use:
 * > ./configure --enable-fixed --without-lapack
 * > make
 * You can also check the validity of the package by typing
 * > make check
 * and then install it with
 * > make install


 * This defaults to installing SphinxBase under /usr/local. You may
 * customize it by running ./configure with an argument, as in
 * >./configure --prefix=/my/own/installation/directory


 * Note- I read the above and tried to implement the command lines that it was telling me but I didn't have any luck. I thought because I unzipped the file on my Desktop then dragged it over to Caesar and finally rome it created some problems. SO next I had to round 2 of compiling the sphinxbase because it says I need to do that before I compile the sphnix decoder.


 * Round 2 Steps
 * 1. Deleted unzipped folder from desktop I dragged over in FileZilla
 * 2. Dragged over the zipped file on filezilla in /mnt/main/sphinx
 * 3. From /mnt/main/sphinx I moved it to /root/sphinx/
 * 4. Now in /root/sphinx there is sphinx3-0.8 sphinxbase-0.8-win32.zip
 * 5. Will unzip in terminal and see if I can run the steps above
 * To unzip a .zip file I ran the command
 * unzip sphinxbase-0.8-win32.zip
 * 6. Output was a bunch of files
 * inflating: sphinxbase-0.8-win32/README
 * /root/sphinx/sphinx3-0.8/configure.in to change ...
 * 8. -I/usr/include/sphinxbase -I/usr/local/include/sphinxbase" changed it to:
 * CPPFLAGS="-I/root/sphinx/sphinxbase/include"
 * CPPFLAGS=-I/root/sphinx/sphinxbase-0.8/include"- I changed sphinxbase to be called sphinxbase-0.8 because its more descriptive than sphinxbase as it states in the README


 * THEN that didn't work and the sphinxbase still wasn't compile, I did didn't know what else to do.. went through Professor Jona's and took these next steps:
 * 1. Went to /root/sphinx/sphinx3-0.8
 * 2. Ran
 * ./configure
 * 3. Got a bunch of
 * checking for ...
 * and config.status .....
 * 4. Then ran
 * make
 * got a bunch of checking ....
 * and at the end got errors

What the errors look like make[3]: *** [adaptor.lo] Error 1 make[3]: Leaving directory `/root/sphinx/sphinx3-0.8/src/libs3decoder/libam' make[2]: *** [all-recursive] Error 1 make[2]: Leaving directory `/root/sphinx/sphinx3-0.8/src/libs3decoder' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/root/sphinx/sphinx3-0.8/src' make: *** [all-recursive] Error 1

4/2-
 * 5hr Exp on seen data to get a baseline to compare to unseen data, ran on Obelix
 * 1. makeTrain.pl switchboard 5hr/train
 * 2. genFeats.pl -t
 * 3. run "top" to see who else is running stuff
 * 4. nohup scripts_pl/RunAll.pl &
 * MODULE: 99 Convert to Sphinx2 format models
 * Can not create models used by Sphinx-II.
 * If you intend to create models to use with Sphinx-II models, please rerun with:
 * $ST::CFG_HMM_TYPE = '.semi.' or
 * $ST::CFG_HMM_TYPE = '.cont' and $ST::CFG_FEATURE = '1s_12c_12d_3p_12dd' and $ST::CFG_STATESPERHMM = '5'
 * if you see this you are done, hit enter
 * 5. mkdir LM
 * 6. cd LM
 * 7. cp -i /mnt/main/corpus/switchboard/5hr/train/trans/train.trans trans_unedited
 * 8. parseLMTrans.pl trans_unedited trans_parsed
 * 9. lm_create.pl trans_parsed
 * cd ..
 * ls
 * 10. cd etc
 * 11. awk '{print $1}' /mnt/main/corpus/switchboard/5hr/test/trans/train.trans >> /mnt/main/Exp/0306/001/etc/001_decode.fileids
 * 12. nohup run_decode.pl 0308/034 0308/034 1000 &
 * ls
 * 13. parseDecode.pl decode.log hyp.trans
 * 14. sclite -r 001_train.trans -h hyp.trans -i swb >> scoring.log

,-.     |                            hyp.trans                            | |-|     | SPKR    | # Snt # Wrd | Corr    Sub    Del    Ins    Err  S.Err | |=================================================================|     | Sum/Avg |   59    975 | 72.8   19.5    7.7    6.8   33.9   88.1 | |=================================================================|     |  Mean   |  1.3   20.7 | 76.0   17.8    6.2   12.7   36.8   86.2 | | S.D.   |  0.4   19.5 | 20.6   17.1   10.0   21.0   25.3   34.1 | | Median |  1.0   15.0 | 76.2   18.6    0.0    3.4   37.1  100.0 | `-'

Successful Completion

can also be found on https://foss.unh.edu/projects/index.php/Speech:Exps_0306_001

Plan
3/27- Feeling a little discouraged after hearing that I didn't do anything the past couple of days that will greatly affect the recompile, I think I will leave the recompile for a couple of days and come back to it. I worked on the recompile for a couple of days last week but I didn't effect the recompile on rome at all. I did learn a lot along the way and what I have to do in order to make this recompile successful. I need to focus on my group (the Avengers) for a couple of days and come back to the recompile towards the end of the week.

3/29- I think we need to create a software group poster for the URC which I am planning to do with Danielle. Professor Jonas didn't talk about it too much but I have poster due on Saturday so I want to complete the software group one as well. We have a template that we will go off from and then add information we think is necessary. This year the capstone class submitted an abstract to the CCSCNC2018 (http://ccscne.org) and we accepted. We need to create a poster for this event as well.

4/1-Sent Jonas a couple of emails throughout the day when I was stuck. Waiting to hear back on what to do next. I asked the software group to run an experiment on Caesar in /mnt/main/local/bin and on rome /root/sphinx/sphinx3-0.8. Faruk tried but said addExp wasn't working on rome therefore I don't think that recompile worked correctly. I need also do more work for my Avengers team, tomorrow I will ran a 5hr on unseen data to get caught up with everyone. I would like to put more time and energy into my Avengers team than be solely focused on fixing the recompile.

4/2- Professor Jonas replied to my email and said that we would have to fix the errors for the recompile to be successful. I will share all this with my group tomorrow and see what our plan will be. This might be good place for next year's Software group to pick up from. The recompiling might take out too much time out of our bigger groups were we should focus most of our time on, but tomorrow I will talk to the software group and see what they think. I need to also run an experiment on unseen data to compare to the seen data. The seen data is easier so I did that one first, now I need to do the unseen data. I will start with 5 hours and then 30hrs.

Concerns
3/27- Hoping to not lose motivation for this recompile assignment. There is a lot of work to be done to have this be successful. Hopefully my team and I will be able to come together and finish it before the semester is over.

3/29-Just making sure to complete everything I need to this week/weekend.

4/1- I just want to finish this compile/recompile on rome. I know the steps I need to take in order to finish it but I want to double check with Professor Jonas. I am also taking away time from my Avengers group and I would like to put more time and focus on them.

4/2- I need to catch up with the rest of my Avengers team. I think some of them are done running a 5hr unseen data experiment but I haven't completed it yet. I hope to this tomorrow and I want to start the 30hr unseen data as well. I need to do some research into how to lower the WER. We were unable to create the results of 29% from last years team therefore we have no idea how they were able to get a low word error rate count.

=Week Ending April 9, 2018 (CCSCNE and Avengers work)=

Task
4/3- Now that I have ran a 5hr seen experiment, I will run a 5hr unseen experiment to compare the results. After class we are having an Avengers meeting to figure out a strategy to lower the word error rate (WER) and pitch ideas. We have been talking about something but we need to figure out a clear path on how to execute it.

4/4- Meet up with Dan R and Jaden in order to solidify our path as a team on our strategy. Dan R had a great idea and he would like to show Jaden and I what he is thinking in order to get us coding on it and able to fix some of the files.

4/5- We working on finishing up the CCSCNE poster for the overall class. Professor Jonas had us apply with an abstract for the regional CCSCNE event, we applied and then we were accepted. This poster is due on Friday and we want start on it so that we have time to edit it and submit it before its due.

Results
4/3- Ran a 5hr unseen data experiment to compare the results to the unseen and to be able to run a successful unseen experiment. The instructions for an unseen experiment are a lot longer than for a seen experiment. I wanted to make sure everyone in the Avengers group was able to run a successful unseen data experiment that way we are all on the same page. After a couple of errors and with the help of my teammate (Tri) I was able to successful run a 5hr unseen data experiment. Since our overall goal is focused on unseen data our strategy will focus on how to improve unseen data over seen. As a group we thought of a team strategy that we all agreed on and how we plan on executing it. Since half the class is one team and the other class is on the other, it is hard to write about much because we cannot share too much. In addition we submitted our URC poster in class. We worked on it last week but during class were able to finalize it and submit, so we don't have to worry about it later.
 * After going through the steps of an unseen experiment
 * Go into the folder that contains the decode and scoring
 * Go into the etc folder
 * You will see a scoring.log folder
 * Use the command tail -8 scoring.log in command line and it will print your results on your terminal window
 * Discovered this command recently and I think it will be useful later on
 * To see the whole scoring.log on your terminal, I use the command nano [filename] replace file with actual filename such as nano scoring.log, and the whole file will be displayed on your terminal window.

,-.     |                            hyp.trans                            | |-|     | SPKR    | # Snt # Wrd | Corr    Sub    Del    Ins    Err  S.Err | |-+-+-|     | Sum/Avg | 4173  60938 | 46.6   43.7    9.7   10.5   63.9   92.6 | |==================================================|     |  Mean   |  1.3   19.2 | 55.8   37.1    7.2   18.2   62.4   92.6 | | S.D.   |  0.5   16.5 | 22.2   19.6    8.8   32.0   33.7   24.3 | | Median |  1.0   15.0 | 50.0   38.9    4.3    7.7   62.5  100.0 | `-'

4/4- Talked with Dan R and Jaden about Dan's idea and his research that he has been doing on. Dan represented his idea, the facts and information on how to lower the WER. I cannot talk too much about this idea but I think this would be do-able to implement. We plan to reach out to other good programmers in our group and talk about how to implement this idea the best and most efficient way. Dan did a great job showing us an idea of how he would go out lowering the WER, it is a little time consuming but with good programming we could do it. All I can say is that it involves a bunch of trail and errors until we are able to get the correct parameters and correct set of outcomes.

4/5- Finished the CCSCNE poser. We sent the poster to Professor Jonas to make the final edits and adjustments. We worked on this poster with one representative from each group to add information about all the groups.

Plan
4/3- In the next couple of days I want to look at some of the files that are produced while running a train and look at potential files to edit or change in order to better our WER. I also plan to meet with some of my team members tomorrow (Wednesday) to look at a path in order to execute our strategy. I believe the strategy that we have is great start, and it could hopefully lead us on a good path to discovering some other files as well.

4/4- My plan is to take what Dan has showed me and try to fix it up in order to get a lower WER. At this point to implement Dan's idea I can do it by hand but I think that after a while, if I see a pattern a script to automate the work would be helpful. At this point, it will be the best to work on smaller unseen data experiments and see if they change at all when we adjust some "stuff"(best way to describe our idea).

4/5- I have been working on the CCSCNE poster for the whole day and I need to switch gears to running experiments with changed scripts and different values.

Concerns
4/3- Time management... There is a lot going on since it is getting closer to finals and graduation. I think I have been putting in a lot of work so far in capstone but I need to make sure I keep it going.

4/4- Although I understand why the class is split up into two teams and why we are having a competition about who get a lower WER, I think it is distracting. It is hard to figure out what information to post and what information to keep within the teams. I do not want to store all my work and my logs be empty but at the same time I do not want to give too much away for the other team. I think it would be best if we came together as a class at the end of the semester rand combined our forces. It would help the overall project and research.

4/5- I need to get started with our strategy and run smaller experiments to implement our idea. Can't talk about it too much, but I have it on my local computer.

=Week Ending April 16, 2018 (Avengers work and important emails from Jonas) =

Task
4/10- Re-grouping with the Avengers to figure out who will be on what team to split up the strategies we are look at.

4/13- Received email from Jonas and Steve to not do any work tonight because they are looking into LDA.

4/15- Running 300hr exps for Avengers. Keep tracking of steps and errors on local copy.

Results
4/10- Can't write much because it is a team strategy.

4/13- Kept up on discord while I could. Jonas sent us a long email explaining everything and its current state and his findings. Below is the important information from the email
 * First off, we have not run a successful LDA train and had not recreated last years results on 300hours. The 28.4% and 40.9% that had been reported during status meetings was :merely a re-decode of the models that were built in 2017. Note that re-decoding models is simple and will give you the same answer. The difficult part is building models (training) :and that is what no one had been able to do. So there was a bit of confusion of what training means by some. When you copy an old experiment with its models, that is not re- :creating those models. Needing to run RunAll.pl in the scripts_pl folder successfully with the correct config parameter has to occur.


 * So I discovered what had happened in 2017. Miraculix was indeed the machine used to train LDA but you had to be logged on as root. Last year's group installed python2.7 (which :had libraries that LDA needed) under /root/usr/local and adjusted root's PATH variable to look in there before looking in /usr/local.
 * Kind of bad way of doing it as that's like installing Word in C:\Windows instead of C:\Program Files. So if you logged on as yourself you'd see python2.6 and as root you'd see :python 2.7. Of course, root accounts are not all alike so if you tried to run Miraculix's root on a user directory on /mnt/main (belonging to Caesar since that's where our main file :store lives) you'd have permission problems. So you had to log back into Caesar and change permissions on your training directory giving other (basically everyone) read & write :permissions. Anything created by Miraculix's root would show up elsewhere as "nfsnobody" as the owner. If you wanted to change it you'd have to change its owner to yourself or :give "group/other" read & write permission.


 * I also discovered that run_decode_lda.pl does not do anything different other than appending "mllt_" to the -hmm argument. So, the -lda and -ldadim flags that the decoder has :are still not being explored. If you type on the command line sphinx3_decode you will see it spit out 50+ flags that you can set. This is to ask everyone not to use run_decode.pl or :run_decode_lda.pl anymore. That script was a convenience that I wrote 4 years ago so that students could quickly run a decode.


 * Unlike training, which runs a ton of perl scripts and python code and java code and executable code and combines them in some iterative fashion (i..e RunAll.pl is a complex script :that does lots and lots of things), decoding is a single executable that you can run on the command line (as I said, decoding is simple and training is hard and it's the latter we as :researchers care more about. However, if you just run a plain old vanilla decode then you are not utilizing all of the things you may have created in a training job so by running the :decode by hand you can now maybe add -lda and -ldadim and look to other flags as well.


 * For instance, I could run a decode in experiment 50 and grab language models from experiment 48 and acoustic models from 49 and maybe a dictionary from 38. Using the :scripts forces you to use the same experiment throughout. And of course you can then integrate some of those other terrific arguments that those 50+ flags may give you. Makes :sense? Decoding on the command line is simple just type the command and add the flags, i.e.:
 * sphinx3_decode -lm /mnt/main/Exp/0305/012/LM/tmp.arpa -dict ...
 * where ... is the path for the dictionary and more arguments needed for filler dictionary (-fdict) acoustic model (-hmm) wave files to decode (-ctl, -cepdir and -cepext). Then you'll ::see you can add more flags.


 * Current state with regard to LDA training. First off, I also discovered that in scripts_pl we have not only RunAll.pl but RunAll_CDMLLT.pl. If you dug into LDA training you know that :it does LDA first and then generates MLLT from it and in fact the (perhaps flawed) run_decode_lda.pl script uses those models by appending "mllt_" to the model path. I ran it :successful as experiment 0304\042 and am presently decoding (it's a 5 hour train only). So we will see what I get. I did run two 5 hour LDA trains, one as root on Miraculix and :one as myself on Caesar. They gave different results: Miraculix 27.2% (0304/040) and Caesar 26.8% (0304/042). So that was a bit concerning. Investigating further, it looks like :when running on Miraculix as root, it is not using /mnt/main/local but a local copy under /usr/local, whereas on Caesar the /usr/local is linked to /mnt/main local. This bares some :investigation so please, someone look at how the local copy of Sphinx training differs on Miraculix from our main install on Caesar. Yes we are doing better using Caesar based :LDA but still, I would have expected the results to be identical since we haven't recompiled any software since 2012. So this points to a possible error and could mean that my :hotfix installation of python2.7 on Caesar is flawed.


 * So yes, I did get a hotfix installation of python2.7 on Caesar and that means it should now work on all the drones. Further caveats here as that is not the case unfortunately (more :trouble). Currently it runs on Caesar and Majestix out of the box. It will also run on Miraculix and Traubadix if you re-create the link to /mnt/main/local form /usr/local (right now :each of those has it's own local /usr/local directory with presumably copies of Sphinx).


 * So the other machines (Asterix, Obelix, Idefix, and Rome) are unable to run LDA because they seem to have a 32 bit installation of RedHat. This is concerning since all drones :have 16GB of memory installed so this suggests those 32 bit machines are only seeing 4GB. This ought to be fixed. I ask that the System group look into cloning and fixing at least :Aterix, Obelix and Idefix. One caveat, please be sure to understand if any of these machines are special (i.e. someone last year installed something unique) and if so, pull that disk, :mark it with a note an put it aside (we pulled an extra 73GB disk out yesterday that can be used in its place for a clone--Camden, did you check its contents to make sure it was :empty). You can perhaps clone it from Majestix since that seems to work (make sure that Majestix isn't one of those unique installations though).


 * Also, when cloning, make sure users and groups are set up properly. I had to fix groups as all users were given 500 or 501 for their group ids on Traubadix, Rome and I think :Majestix but should be 1001 (i.e. that is what the group for cis790 is). If you aren't consistent then weird file attributes/group permissions start showing in /mnt/main (they did and :I fixed some of them).


 * Rome is the bigger question. We are compiling Sphinx. We have backup running. We have CVS installed. We can use it as a wireless gateway. It also is running 32 bit of RedHat. :Cloning would mean undoing all that work. Can we somehow upgrade it to 64 bit while keeping the software base installed? Perhaps this isn't something critical to fix but only to :investigate so I can have someone do it over the summer when things have settled down. In any case, that means that the backup, CVS, and compiling tasks need to be well :documented so it can be repeated if someone else over the summer re-installs RedHat (64 bit) on it. (BTW, how much memory does Rome have, it's not on our Hardware page).


 * Idea for next year's group to look at this:
 * continue last year's research, move it forward, explore some new features (i.e. decoder flags like -lda and -ldadim or the new RunAll_CDMMLT.pl method of training)


 * couple of updates. So RunAll_CDMLLT.pl, compared to RunAll.pl on the same data set, seems to do slightly better on a very small 5 hour train:
 * 5 hour train with 5 hour test-on-train (i.e. seen(test.trans)) decode
 * 0304/041
 * uses RunAll.pl with LDA turned on (run on Caesar)
 * WER: 26.8%
 * 0304/042
 * uses RunAll_CDMLLT.pl with LDA turned on (run on Majestix)
 * WER: 26.0%
 * Neither decoding experiment used the -lda or -ldadim flags of the decoder (but they did use the "mllt" models (i.e. what run_decode_lda.pl does)


 * we refer to "seen data" as test on train.trans,
 * "unseen data" is dev and eval.trans- Don't forget this!


 * bit of confusion on running the decoder by hand. If you look at run_decode.pl at the bottom of the file (I hope people actually look at the scripts they run) you will see that the decode.log file is :created by piping the output of the decoder (via >) into it. It's a tried and true Unix/Linux method to capture the output of a running program. So when you run the decoder by hand you'd do:
 * sphinx3_decode blah blah blah >& my_log_file.log &
 * For the bash shell the syntax is different and you reverse it (i.e. &> in bash).BTW, you use bash when logged on as root and csh/tcsh when logged on as yourself. I use csh/tcsh mostly so any :examples I give are csh (note that tcsh is just an extension on csh).


 * Also note that you cannot just stick -lda and -ldadim flags into the list of flags as that will not work (a few of you tried that already). Each of these flags may or may not have additional :information they need (like a number or a file -- maybe the lda dimension flag wants an integer: -ldadim 4 -- but I don't know what it wants I'm only using it as an example). So you need to :investigate by checking CMU on what they mean and how they are used and if they are needed for LDA models that you built. It seems that one way to use LDA models is by using the MLLT :transformation model files created by an LDA train for the acoustic models during decode (i.e. via -hmm) and that is what the run_decode_lda.pl script was doing (but that was all).


 * Use the "which command" in Unix/Linux to see where it is coming from (i.e. type which sphinx3_decode and you'll see it's in /usr/local/bin)

4/15- Took a while to run Exp, might have messed up but will check tomorrow. Another email from Jonas:

4/16- Local copy with detailed instructions. Long story short, created 300hr seen data on obelix, this morning it seem to end. Let me group know, tried again, then got an error: sed: can't read etc/009_train.trans: No such file or directory sh: tmp.dic: Permission denied Could not open temp file! Done! Generating filler dictionary... cp: cannot create regular file `etc/009.filler': No such file or directory sh: line 2: etc/009.filler: No such file or directory Done! Generating phones list... cp: cannot create regular file `etc/.': No such file or directory sh: line 3: etc/009.phone: No such file or directory sort: open failed: etc/009.phone: No such file or directory Done! Preparation complete!

From Steve-you don't have a etc/009_train.trans. which should have been created when you executed "makeTrain.pl" Therefore, trying to run makeTrain.pl again on 300hr/seen


 * Continue running 300hr and 145hr experiments.

Saw error on 145hr seen data on obelix: MODULE: 20 Training Context Independent models Phase 1: Cleaning up directories: accumulator...logs...cannot remove directory for /mnt/main/Exp/0310/013/logdir/20.ci_hmm: Directory not empty at /mnt/main/Exp/0310/013/scripts_pl/20.ci_hmm/slave_convg.pl line 89 qmanager...cannot remove directory for /mnt/main/Exp/0310/013/qmanager: Directory not empty at /mnt/main/Exp/0310/013/scripts_pl/20.ci_hmm/slave_convg.pl line 92 models... Phase 2: Flat initialize Phase 3: Forward-Backward Training failed in iteration 1 Something failed: (/mnt/main/Exp/0310/013/scripts_pl/20.ci_hmm/slave_convg.pl) Training failed in iteration 1 Something failed: (/mnt/main/Exp/0310/013/scripts_pl/20.ci_hmm/slave_convg.pl)

This is under Exp/0310/013
 * I will let it run for about 20 mins then I will start a new 145hr exp.

Plan
4/10- Excited to implement the work and research and the results.

4/13- Look to run Exp for group, and take local notes in case anything goes wrong.

4/15- Looking forward to seeing what happens tomorrow morning regarding experiments. I need to run some more but I need to check with group and see which experiments to run.

4/16- Hoping to get one of these experiments to be successful. The most current 300hr is still running, but I think the 145hr got an error.

Concerns
4/10- Not much, I think the Avengers are in great shape. We have strong teammates who are working hard to implement our strategy.

4/13- There is a lot to be done now there is it coming to the last month of the class. Hopefully we are able to help Professor Jonas with the questions he has and also improve the WER with our research. Looking forward to our results and comparing them to the baseline.

4/15- Hoping to get everything done that is needed for Professor Jonas, Avengers, and Software group. A lot to juggle.

4/16- Just to get a 300hr seen experiment working (one is running!!)

=Week Ending April 23, 2018 (Avenger's strategy)=

Task
4/18- Local notes running exps.

4/19- Exp didn't work, wrote down errors, trying again, local notes.

4/20- Attending CCSSCNE with Camden and Hannah.

4/21- Local notes

Results
4/18- Local notes

4/19- Local notes

4/20- Local notes

4/21- Local notes

Plan
4/18- Work on strategy and run experiments

4/19- Working on strategy

4/20- Attending CCSCNE with Camden and Hannah.

4/21- will add later

Concerns
4/18- I can't write too much in logs because everything I am doing is my team strategy. I am keeping local notes but it is hard to share them with the rest of the class. My team (the Avengers) know I have been keeping up with our tasks and fulfilling my team duties.

4/19- Hoping to implement our strategy to get the best overall results!

4/20- No concerns, the CCSCNE went well, but we didn't win. It was very interesting seeing all the projects and research done by other students. I read over some posters before it was started and the top three winners had great research.

4/21- No major concerns but implementing our strategy on data to improve it. I know that getting a lower WER is great but I feel as though our strategy is even better. If we were given more time we would be able to achieve a great WER but it is coming down to the last 2-3 weeks of the course

=Week Ending April 30, 2018 (Strategy and report write up plans)=

Task
4/25- Did experiment for group, local notes.

4/27- Read others logs and tried to catch up with all the experiments (LDA-experiments) that have been done.

4/28- Looking at the reports that are due for the next two weeks. For the next two weeks there Team (Guardians, Avengers) and Group (Systems, Software, Modeling, Data, Experiment) reports to be done.

4/30- Have a group meeting with the Avengers to complete the Team Report draft. We have a draft due today and then the final report is due next week.

Results
4/25- Did experiment for group, local notes.

4/27- Gained more background/knowledge on what my classmates have been up too and the experiments being conducted.

4/28- After talking with Camden about the project he sent around an overall message about the reports that are due.
 * There are two reports we need to get done
 * 1. The Final Team Report (Such as the Rebel or Empire Report from last year, 2-4 pages)
 * 2. The Final-Class-Report, this is where the 5 groups write their sections like we did with the Class Proposal (update on what we actual did and what we proposed we would do).
 * The Draft of the Final-Class-Report is due May 8th, and whichever Team loses, has to finish the draft and submit the completed version. Between May 8th and I believe the 14th.- We need to ask Jonas about this because I have May 14th last the last day for this report.
 * (The 5 Group) Group Leaders - Please Look to have your sections completed by May 4th (Friday) So we can condense it into one document as a Draft by May 8th.
 * *Note we are trying to get the reports done before they are due so we don't have to cram them in last minite.

4/30- Using this link[] we have created a draft for the Avengers. We had an almost 2 hour meeting to create this draft in order to have it ready for class. Tomorrow before class I will print it out and give Professor Jonas a copy. Hopefully Professor Jonas is able to read through our Team report draft and let us know how it sounds. As I have stated before our strategy is great because Dan R thought outside the box and after talking with Professor Jonas we were able to implement it correctly. Our strategy is a little complicated if people do not know speech or the background story of how the strategy was created.

Plan
4/25- Did experiment for group, local notes.

4/27- I need to dive deeper and get in contact with the Avengers regarding out strategy and were we stand. Dan R is working on a final strategy for us to implement and use to get final results for the team. He has been working very hard, and I can't wait to see the results after we implement our strategy!

4/28- Looking to start the The Final Team Report for the Avengers. Professor Jonas has created a template for us on the syllabus site, it is called Results Report Example.doc

4/30- I have to meet with the Software group in order to finish the Class Report that is due this Friday. Camden, one of my classmates suggested that we have our sub group (i.e Software, Systems, Data, Experiment, Modeling) done by Friday. This gives Camden and Hannah the weekend to proof read the document and make any final revisions. I think this is great because we are not waiting until the last minute to complete the Class Report. At the end the "losing" team will have to finalize the Class Report to be submitted on May 14th. Even though the losing team has to finalize the Class Report, everyones grade will depend on the Class Report. Therefore completing the sub group reports by this Friday (May4th) gives us plenty of time before its actually due.

Concerns
4/25- Did experiment for group, local notes.

4/27- Camden sent around some notes regarding the Team Final Reports and Decode and I am a little confused on the real-time decode we need to complete. I have not looked into this more even though I should. I understand what real-time decode means, but I am unsure how to implement it and the variables we need to change to run a real-time decode.

4/28- Just hoping to have enough time to wrap up the final experiments that we are running. 300hr experiments take a little long to run (about 10-12days) so we are not sure if we will have all the 300hr experiments that we need.

4/30- Still a little confused on how to implement real-time results for that we need as the Avengers to the final results that we submit. I think part of the Avengers will have a meeting with Jonas to ask about our strategy and a minor problem we are having. After our strategy is fixed we will need to start the train and decode for 300hrs seen and unseen. Since unseen is the only result that Professor Jonas wants, we need to make sure that our strategy will lower the WER. Hoping for the best when running a 300hr experiment train and decode! 300hr experiments take a lot of time and energy, so I hope we are able to complete it correctly.

=Week Ending May 7, 2018 (Final report/final results)=

Task
5/1- We had class today, attended a decode session with Jonas and then discussed further plans for the Avengers.

Results
5/1- We gave Professor Jonas our Team Report Draft to look over and make any edits he thinks we need. The Software group and I had a meeting as well to complete our portion for the Final Class Report. This report is an update on the proposal that we completed in the beginning of the semester. As the software group we were able to regroup and write down what we have completed since our proposal and what needs more work to be completed.
 * For the results portion of the Final Class Report we have written:
 * Sphinx3 Decode
 * We started by looking at the file where the decode starts, which is s3_decode.c. Just this file alone sent the group down a rabbit hole of code. With the guidance of Professor Jonas we were able to go :through quite a few C files and toward the end we were able to start looking at how the decoder deconstructs data. We were also able to see the code work and ideas behind how the Voice Recognition in :CMU Sphinx works and how it operates using all of its variables.


 * Sphinx3 Recompile
 * With multiple attempts to get recompile working, we are only partially successful. There were some caveats that were not described within the documentation on foss.unh.edu for compiling. The first being :that we need to compile sphinxbase into Sphinx3, as Sphinx3 uses sphinxbase to successfully compile. The second obstacle was related to specifying the directory to which both Sphinx3 and sphinxbase :compiled to. We were successful in recompiling; however, we are not able to run the new decoder as it cannot find a library file. This issue has come up before and should be able to be resolved without :recompiling. After completing the necessary steps to recompile, we have documented the steps that we have taken for next year's group to continue. To fully finish the recompile of Sphinx3 on Rome, we :need to fix the library file issue, check for other issues, then create two experiments to test the regular decode process and the new recompiled process on Rome.


 * CVS Version Control
 * Version control began with the installation and the experimentation of Revision Control System (RCS), a command-line based version control that tracks changes on a file-by-file basis. This turned out to :be not the most optimal solution, as we wanted the ability to create, modify, and update a repository alongside multiple users, not to simply track individual file versions. With this in mind, is was suggested :to use Concurrent Versions System (CVS). It is a command line-based version control system that can track entire packages, and changes committed based on users. After installation, we confirmed that :CVS is what we want to use for future refactoring


 * Documentation
 * All of the documentation can be found through the group logs and on our group wiki logs that is located here []

Plan
5/1- As the Avengers I think we are finalizing our strategy to implement it on 300hr seen/unseen data. I will be in touch with my group members to see who is doing what and if they need help on the experiments.

Concerns
5/1- Hoping we have enough time to implement and conduct a 300hr experiment to get our baseline and unseen results. Can't wait to see the final results and the improvement of the WER.