Speech:Spring 2018 Danielle LeBoeuf Log


 * Home
 * Semesters
 * Spring 2018
 * Proposal
 * Report
 * Information - General Project Information
 * Experiments - List of speech experiments

Week Ending February 5th, 2013
1/31 Josh went through 70 directories and created a text file for the Sphinx Base Tree and a Sphinx3 tree and sent it to our group. Lamia, Faruk and I talked to Jonas and he suggest that we split up the work and dig into the individual code.
 * Task: 1/30 We met together as group. We created a group on Slack and Discord and made sure we could all communicate. We made comments in our #important-stuff log. Prof Jonas put us in the right direction after class. He suggest to look around Sphinx decoder and learn what everything does. We need to thoroughly research. (mnt/main/root/sphinx3 = path for decoder).

2/1 Lamia and I got together with Stephen, Dan, and Jaden. We all worked together in trying to get a clear direction. We passed around ideas. I also read some logs. Lamia and I started looking through .c files in sphinx. We are very certain that the programs are written in C++. We shared this information with our group.

2/4 Read other peoples logs, kept up with our group as well as the whole class group messages on discord the past few days.

1/31 Meeting with Jonas, Lamia, and Faruk (and messaging Josh on Discord) helped us see what tasks needed to be done this week.
 * Results: 1/30 After class we added some other people from the other groups to our Discord.

2/1 Lamia and I started looking through the .c files and got a feel for what we will be working with. We are pretty sure it is coded in C++. 2/4 Read other peoples logs, kept up with our group as well as the whole class group messages on discord the past few days.

1/31 For the next few days we are all going to explore the directories and files and start digging/documenting. On Tuesday we are going to talk to Jonas and he is going to help us split up the work, since there are so many files. Jonas wants us to research coding we do not know, write why it does what it does, is it important, etc. I will do some within the next few days and compare notes with my group and Jonas and see if what we have matches each others formatting and information. Hopefully next week we will all be going in the same direction and be able to take off from there.
 * Plan: 1/30 We need to figure out how the decoder works, and make it more efficient. Jonas is challenging us to see if we can compile it before the end of the semester.

2/1 Lamia and I will continue looking through the files and labeling if they are important or not.

2/4 Read other peoples logs, kept up with our group as well as the whole class group messages on discord the past few days.

1/31 My concern (and the rest of the group, I believe) is the amount of files we all need to dive in to. Some are simple while others are complex. We are not sure if we can decode all of it in this time.
 * Concerns: 1/30 My concern is that we need to be going in a certain direction, together as group. Hopefully using the tools we established today will help ensure this happens.

2/1 My concern is that we can't break down the files as thoroughly as I would like.

2/4 Read other peoples logs, kept up with our group as well as the whole class group messages on discord the past few days.

Week Ending February 12, 2013

 * Task: 2/6 Based off of https://foss.unh.edu/projects/index.php/Speech:Run_Train_Setup_Script, my group followed these instructions on running our first experiment. Wesley was sick, but Lamia, Faruk, and Josh all worked together on getting it going. The class collectively messaged each other on discord and with the help of other groups we all got on the same page and I learned a lot from others input. We decided that for our proposal we are going to split the main files within our group and work on them with the help of partners/others in our group.

2/8 Went through logs. Also kept up with everyone on discord. I read through each separate group's discord server to see what everyone was working on.

2/10 Went through logs. Also kept up with everyone on discord. I read through each separate group's discord server to see what everyone was working on. Lamia came up with our part of the proposal and I edited it and sent it to Camden, who is putting together the class proposal. Tomorrow night my group and I are going to videochat to put the final components of our piece of the proposal together and edit it.

2/11 Tonight we had an hour and a half video chat meeting. We discussed our part of the proposal. We all also updated each other on everything that was going on. Camden was the one who decided to put together our entire proposal. We set a deadline of first drafts by 6pm. We decided on how we think this semester should be going and what our final goal is going to be, since we do not have anything to reference from previous years. Our group came up with our portion of the proposal and sent it in. Camden then put together the finalized version to be passed in. I did not end up running parts 2 or 3 of the experiment, because after reading everyone's discord conversations and logs it seemed like everyone was having the same issue. I am hoping that we can all discuss this on Tuesday in person and get these issues resolved.

1. ssh into Caesar (we ran our experiment in Caesar and it worked, but it might be good to reconsider which server you use. On this day some of the      other servers were temporarily down). 2. Navigate directories to /mnt/main/Exp/(your group exp folder-usually created by Jonas)/your individual exp folder. For example mine was /mnt/main/Exp/0303/005 (0303 created by Jonas, 005 created by me last week following Jonas' instructions.  3. We followed the instructions in the link above (https://foss.unh.edu/projects/index.php/Speech:Run_Train_Setup_Script)   4. cd into your group experiment folder, then cd into your personal experiment folder.    5. Run the command makeTrain.pl switchboard 30hr/train (Jonas told us after we ran this that you can also run a 10hr train as well [makeTrain.pl       switchboard 10hr/train]. It won't take as long).   **Note: You do not need to run makeTrain.pl -t switchboard 30hr/test as it says in the instructions. We did this and it screwed some stuff up,       resulting in Lamia and I having to start all over and delete files.   6. Run the (feats) command genFeats.pl   7. Run the (train) command nohub scripts_pl/RunAll.pl & (the "&" allows the program to keep running on the server even if your ssh connection breaks).                                                                                                                                         Some important things to consider:         -Filezilla is a great tool to use for this project. You can download it here https://filezilla-project.org/download.php        -Lamia and I both received the error: Connection to 132.177.189.63 port 22: Broken pipe. We both left class while our experiment was running,          which is why this error occurred due to the connection being lost. However, the two of us compared our results with Dan's .html file (who ran all the way to module 99), and they were both the same.        -The .html is an important piece of this experiment and it is easiest read on Filezilla. This is how we made sure that me and Lamia's files          were not corrupted based on the errors stated above.
 * Results: 2/6 Lamia, Faruk, Josh, Dan B and some other classmates all seem to be getting the same results from this experiment. Our experiments are all currently running, but we are getting the same run times and places within the experiment. When me and Lamia's ended, it stopped at module 50 and it said that it timed out. However, with the help of Dan B, we compared out .html files and they all seemed to match up, even though his and Josh's ended at module 99. We did make note of that. Here are the exact instructions we followed:

2/8 Went through logs. Also kept up with everyone on discord. I read through each separate group's discord server to see what everyone was working on.

2/10 Went through logs. Also kept up with everyone on discord. I read through each separate group's discord server to see what everyone was working on. Lamia came up with our part of the proposal and I edited it and sent it to Camden, who is putting together the class proposal. Tomorrow night my group and I are going to videochat to put the final components of our piece of the proposal together and edit it.

2/11 Tonight we had an hour and a half video chat meeting. We discussed our part of the proposal. We all also updated each other on everything that was going on. Camden was the one who decided to put together our entire proposal. We set a deadline of first drafts by 6pm. We decided on how we think this semester should be going and what our final goal is going to be, since we do not have anything to reference from previous years. Our group came up with our portion of the proposal and sent it in. Camden then put together the finalized version to be passed in. I did not end up running parts 2 or 3 of the experiment, because after reading everyone's discord conversations and logs it seemed like everyone was having the same issue. I am hoping that we can all discuss this on Tuesday in person and get these issues resolved.

2/8 Went through logs. Also kept up with everyone on discord. I read through each separate group's discord server to see what everyone was working on.
 * Plan: 2/6 We are going to be creating our proposal based around the concept of working on the code in chunks to get a full CMU sphinx 3 main files functions elaborated. Jonas suggest that we look at last years main files and see what they discovered. We are also going to research some documentation on the Sphinx3 code and see whats out there. I also think we should compare code from other voice software and see what is out there. We will keep up with our ideas as the week goes on. Jonas also showed us the room we can use to set up a visual map with maps of the code and how it works, since we will be decoding thousands of lines of code and we want a visual representation of what the code is actually doing.

2/10 Went through logs. Also kept up with everyone on discord. I read through each separate group's discord server to see what everyone was working on. Lamia came up with our part of the proposal and I edited it and sent it to Camden, who is putting together the class proposal. Tomorrow night my group and I are going to videochat to put the final components of our piece of the proposal together and edit it.

2/11 Tonight we had an hour and a half video chat meeting. We discussed our part of the proposal. We all also updated each other on everything that was going on. Camden was the one who decided to put together our entire proposal. We set a deadline of first drafts by 6pm. We decided on how we think this semester should be going and what our final goal is going to be, since we do not have anything to reference from previous years. Our group came up with our portion of the proposal and sent it in. Camden then put together the finalized version to be passed in. I did not end up running parts 2 or 3 of the experiment, because after reading everyone's discord conversations and logs it seemed like everyone was having the same issue. I am hoping that we can all discuss this on Tuesday in person and get these issues resolved.
 * Concerns: 2/6 Some concern is that the modelling group is falling a little behind. We ran the first experiment and now we are waiting for the next two. Since this is the third week of class, it is understandable that people are still trying to get used to their new schedules and work loads. Our group looks in pretty good shape, however I hope that Wesley can get up to speed with what we have done since he was sick yesterday (which I believe he can). As this project goes on it is getting more important to stay at the same pace as other teams, which is easy since we are all on discord and message each other every day.

2/8 Went through logs. Also kept up with everyone on discord. I read through each separate group's discord server to see what everyone was working on.

2/10 Went through logs. Also kept up with everyone on discord. I read through each separate group's discord server to see what everyone was working on. Lamia came up with our part of the proposal and I edited it and sent it to Camden, who is putting together the class proposal. Tomorrow night my group and I are going to videochat to put the final components of our piece of the proposal together and edit it.

2/11 Tonight we had an hour and a half video chat meeting. We discussed our part of the proposal. We all also updated each other on everything that was going on. Camden was the one who decided to put together our entire proposal. We set a deadline of first drafts by 6pm. We decided on how we think this semester should be going and what our final goal is going to be, since we do not have anything to reference from previous years. Our group came up with our portion of the proposal and sent it in. Camden then put together the finalized version to be passed in. I did not end up running parts 2 or 3 of the experiment, because after reading everyone's discord conversations and logs it seemed like everyone was having the same issue. I am hoping that we can all discuss this on Tuesday in person and get these issues resolved.

Week Ending February 19, 2013

 * Task:

2/13 Today before class, Lamia, Dan B, Rose, Hannah, and Stephen all met and collaborated on running a 30 hour train and decoding the trained data. Last week I ran the train and this week Stephen assisted me in creating the LM and decoding the 30 hour train. I followed the instructions on these two links:


 * Create an LM: https://foss.unh.edu/projects/index.php/Speech:Create_LM
 * Run decode trained data: https://foss.unh.edu/projects/index.php/Speech:Run_Decode_Trained_Data

DO NOT COPY AND PASTE COMMANDS INTO COMMAND LINE

Create LM

1) " mkdir LM " in YOUR base experiment folder (mine is 005)

2) " cd LM " into the new directory

3) Copy over the transcript used from the corpus directory: " cp -i /mnt/main/corpus/switchboard/30hr/train/trans/train.trans trans_unedited "
 * NOTE I ran a 30 hour train, so my input is "30hr". This input depends on the length of your train.

Prepare/Execute the script that will build the language model 1) Prepare the transcript: " /mnt/main/corpus/switchboard/dist/transcripts/ICSI_Transcriptions/trans/icsi/ParseTranscript.perl trans_unedited trans_parsed "

2) Create the actual language model using this script: " cp -i /mnt/main/scripts/user/lm_create.pl . "

3) Execute the script: " ./lm_create.pl trans_parsed "

Setup decode directory and Run the decode

1) " cd etc " in your base experiment directory

2) After reading the wiki, the best way I did it was running: " awk '{print $1}' /mnt/main/corpus/switchboard/30hr/test/trans/train.trans >> /mnt/main/Exp/0303/005/etc/005_decode.fileids "


 * NOTE: Again, the " 30hr " depends on how long you ran your train for. " 0303" is our number for sp18 and " 005 " is my base experiment directory.

3) Next I ran: " nohup run_decode.pl 0303/005 0303/005 1000 & "


 * See above note

4) I then ran the command: " parseDecode.pl decode.log hyp.trans "

5) Lastly, I executed: " sclite -r _train.trans -h hyp.trans -i swb >> scoring.log "

After all of that, I then got an output, seen under the "results" portion.

2/14 Today I got started on decoding the code. Since I was the first one to start, I found some challenges that needed to be addressed. I asked my group to all choose a color font, so that we can all distinguish who wrote what. My color is purple. As I was breaking down the code, I noticed I was typing words that I wasn't sure the meaning of. So I decided that we needed a group dictionary, so that we could be more thorough with our decoding. Hopefully with these improvements our decoding can be more in depth. I found most of my decoding documentation on this website: http://www.speech.cs.cmu.edu/sphinx/doc/doxygen/sphinx3/cmdln__macro_8h.html#a24

2/15 The task today was editing and improving our portion of the proposal. My group and I videochatted online for about an hour and a half and beefed up the software section. Since we did not have last year's to go off of, it was important that we set the boundaries and standards for next years group. Since our task for this semester is pretty much to pick and choose the important files, it was challenging to put exact dates on when we will be done decoding the files.

2/16 Read through logs and discord. Edited more of the group code found on our group wiki. Also edited a bit of our section of the proposal.


 * Results: 2/13 Here are the results of my 30 hour train

,-.     |                            hyp.trans                            | |-|     |=================================================================|      | Sum/Avg |  123   1872 | 60.7   28.2   11.1    8.5   47.8   97.6 | |=================================================================|     |  Mean   |  1.4   21.8 | 63.8   26.9    9.3   18.3   54.4   98.4 | | S.D.   |  0.6   16.2 | 19.4   16.5   11.7   31.3   32.8    8.3 | | Median |  1.0   18.5 | 64.3   27.2    6.7    5.6   49.6  100.0 | `-'

2/14 The results of my decoding can be found in our group wiki in purple.

2/15 The results of the proposal can be found in our group document. We are expected multiple edits tonight and tomorrow (by us), plus the final edit by Camden on Sunday.

2/16 Read through logs and discord. Edited more of the group code found on our group wiki. Also edited a bit of our section of the proposal.


 * Plan: 2/13 Today in class we talked with Jonas and he told us we need to rewrite our proposal. We are meeting Thursday night to fix our portion of the proposal. He also said that he wants us to edit more "main" files than we had originally intended, so we put a main file on our group wiki so that we all can decode it and get bigger chunks done. This will help ensure that we get the most amount of code done for the semester. We will also initial it so we can see what portions everyone is doing.

2/14 The plan is that others will add on to what I have started. Others may have helpful feedback so that I can improve my documentation. Hopefully we can finish this first code up so we can move on to the main files. We are also meeting on video call tomorrow night, where we are discussing the proposal and improving our overall plan for the semester.

2/15 The plan is to have our portion of the proposal done by Saturday, so that Sunday can be reserved for final edits and any questions Camden has for us. We also need to finish decoding the code that we have by Tuesday. Tomorrow I am going to keep going on my section that I started 2/14.

2/16 Read through logs and discord. Edited more of the group code found on our group wiki. Also edited a bit of our section of the proposal.
 * Concerns: 2/13 A little concerned about fixing our proposal. We were pretty confident in what we had, so I just want to get it finished at this point. I also want to get it out of the way so that our group can focus on getting our decoding going. I am also concerned with the amount of decoding we have to get done now.

2/14 I don't have too many concerns at this point. Just worried about finishing up the proposal and hoping that it comes together shortly so that we can focus our energy into decoding main files.

2/15 I am a little concerned about Jonas' expectations. Sometimes we feel like we're doing exactly what he wants, and then other times we feel like we're going in the wrong direction. Since there is nothing to go off of from last year, it is difficult to know what is expected of us. If our portion of the proposal is to his liking, then we know the path we need to go on for the rest of the Spring 2018 semester.

2/16 Read through logs and discord. Edited more of the group code found on our group wiki. Also edited a bit of our section of the proposal.

Week Ending February 26, 2013
2/21 I woke up to a bunch of messages on discord. We have a "leaders" private group and everyone was chatting on it. Prof Jonas sent out an email and we were all trying to update the proposal another time. I took initiative and made more edits on the software group's portion. I also made edits on the glossary portion, as well as made suggestions to other groups as to make their sections better. After doing that, I kept up with everyone on discord throughout the day in case any other problems arose. At 9 o'clock that night, the group leaders and I came to the conclusion that the proposal was finally finished.
 * Task: 2/20 Had class with Jonas. We discussed our logs with him. As a group, we edited our portions of the proposal. That night, certain students received an email from Jonas about editing the proposal (After class I kept up with discord). There was a lot of chaos and confusion. The email was sent around to all groups. My group did a video call for two hours and fixed what needed to be done. We kept up with the class group chat as we edited and communicated with the other groups about edits.

2/23 Read logs. Asterix will be down today because there will be a 300hr train running.

2/26 Read logs. The software group doesn't have too much going on this week. For the past few days there has at some point been a 300hr train running on each server. I have been keeping up with all of the chatter and talk on discord as well as reading others logs, especially Wesley's. We are choosing which version control system to use, and are leaning toward RCS.


 * Results: 2/20 We believe that our portion of the proposal was completed and looked great.

2/21 We believe that this final version of the proposal is finished. We believe that the professor will approve and we will receive a good grade.

2/23 Read logs. Asterix will be down today because there will be a 300hr train running.

2/26 Reading up on RCS with http://jodypaul.com/SWE/RCSTutorial/RCSTutorial.html


 * Plan: 2/20 Wait for the other groups to finish their portions and make sure the whole proposal looks up to par.

2/21 The plan is to follow our timeline on the completed proposal.

2/23 Read logs. Asterix will be down today because there will be a 300hr train running.

2/26 To meet with the group tomorrow and talk about our next move with the code.
 * Concerns: 2/20 Concerns are about completing a proposal that Jonas will like and that we get a good grade.

2/21 Now that we feel confident about the proposal, we have no concerns at this time.

2/23 Read logs. Asterix will be down today because there will be a 300hr train running.

2/26 Decoding the code and making sure that we get it evaluated.

Week Ending March 5, 2013

 * Task: 3/1 Met with our group and Jonas on Google Hangouts. He went over some of the files he wants us to look at, as well as other tasks he wants us to do. He also explained how some of the files operate. He told us that experiments were run on Brutus, which has since been shut down. He went through the directories and showed us the information in the directories and files. Our task is to explore more files and directories to see what is in there. He wants us to find tiny_train.doc (under sphinx train doc), which shows a useful map of the training. This file is very useful because you can see the complexity of the software. The "make" file is important; it complies your program that contains many source files. It organizes what files do what. Will look into this. mnt/main/src/programs <- the place to start.

3/3 Read logs.

3/4 Read logs.


 * Results: 3/1 We now have a direction from Jonas. He named some important files that we are going to need to look at.

3/3 Read logs.

3/4 Read logs.


 * Plan: 3/1 The plan is to start looking at the files that Jonas pointed out to us.

3/3 Read logs.

3/4 Read logs.


 * Concerns: 3/1 I don't have too serious of concerns. I guess it is just mostly that we get the important files decoded, since Jonas has finally gone further in depth with our group about what he wants from us.

3/3 Read logs.

3/4 Read logs.

Week Ending March 12, 2013
3/14 Read logs. Kept up on discord. We had a software team meeting on Monday night and discussed what we need to do as a group. We also discussed the email that Jonas sent out. My task is to keep decoding the main files, along with Josh and Faruk. We were also split into teams. I am team Guardians. We discussed in discord what we will be doing We decided on three servers we would like to work with: majestix, miraculix, and automatix. According to Jonas' email to our team, the current states of them are:
 * Task:
 * majestix-
 * /usr/local is actual directory, not link to /mnt/main/local
 * /usr/local/bin has partial copy of speech binaries you may have some issues with training...not 100% sure
 * miraculix
 * /usr/local is actual directory, not link to /mnt/main/local
 * /usr/local/bin has full copy of speech binaries
 * automatix
 * not presently configured or running since it will be a clone of asterix
 * /usr/local will be link to /mnt/main/local
 * this means it will have a full copy
 * this means this machine is most dangerous if modified. (Jonas only wants the systems group working on building this, we are not to install any software without going through them first.

3/15 Read logs.

3/19 Read logs.

3/14 We have discussed this as a group and decided these are the best options. We will be meeting as a group after spring break, on Tuesday at 11 to get everyone on the same page.
 * Results:

3/15 Read logs.

3/19 Read logs.

3/14 We will meet next week to discuss our plan. For now, we are talking on discord and updating each other on projects.
 * Plan:

3/15 Read logs.

3/19 Read logs.

3/14 My concern is that since we are now split up into teams, how will I be an asset. We have some strong people on my team, and I am excited about that. I would really like to get involved and stay relevant throughout the next 8 weeks or so. I have asked to meet up with Dan on Monday after spring break to talk in person, and I am hoping we can get some others in on this as well.
 * Concerns:

3/15 Read logs.

3/19 Read logs.

Week Ending March 26, 2013

 * Task: 3/20 Met with our team today before class. We are team Guardians. We elected Camden as our leader. The meeting lasted about two hours. Afterwards we all talked about what we have been doing in our groups. I showed Rose some files from the software group that I believe are important:

/mnt/main/root/tools/SphinxTrain-1.0/doc <-- Within this directory, there is a document called "tinydoc.txt". This is a map of the training process /mnt/main/root/sphinx3/src/libs3decoder/libam <--- Within this directory, there is a file named hmm.c. Wesley will be decoding this. /mnt/main/root/sphinx3/src/libs3decoder/libsearch <--- Within this directory, Josh has decoded kb.c and kbcore.c

Faruk, Josh, and I had a code review today. Faruk and I are to know about the following: utt_res_set_uttfile(ur, uttfile), utt_res_set_lmname(ur, lmname), utt_res_set_regmatname(ur, regmatname), utt_res_set_cb2mllrname(ur, cb2mllrname), ctl_read_entry(fp, uttfile, &sf, &ef, uttid), and ctllmfile.

3/21 '''

The following is used in the corpus.h file.

'''
 * -utt_res_t is a structure to store utterance-based resource. Assume that most resource are string pointers, the string itself is pre-allocated somewhere.

The following are defined as char* as a public attribute in utt_res_t:
 * -uttfile- Utterance filename to be process (in its entirety).
 * -lmname- Language model filename for this utterance. Have seen pointers to this.
 * -regmatname- The regression matrix file name for this utterance.
 * -cb2mllrname- The code book to regression matrix file name for this utterance.
 * -fsgname- FSG file name for this utterance. For one utterance, one could only use either LM or fsg.

'''

 utt_res_set_uttfile(ur, uttfile)  Based on utt_res_t, I think it is safe to assume that utt_res_set_uttfile(ur, uttfile) is a structure that stores an utterance-based resource, sets the uttfile by passing in a pointer to the uttfile name (string) and the uttfile itself. In this case, ur->uttfile=name. Definition at line 129 of file corpus.h.

 utt_res_set_lmname(ur, lmname)  Based on utt_res_t, I think it is safe to assume that utt_res_set_lmname(ur, lmname) is a structure that stores an utterance-based resource, sets the lmname by passing in a pointer to the lmname name (string) and the lmname itself. In this case, ur->lmname=name. I saw lmname[1024] in a few lines of code. Possibly be an array? Definition at line 130 of file corpus.h.

 utt_res_set_regmatname(ur, regmatname)  Based on utt_res_t, I think it is safe to assume that utt_res_set_regmatname(ur, regmatname) is a structure that stores an utterance-based resource, sets the regmatname by passing in a pointer to the regmatname name (string) and the regmatname itself. In this case, ur->regmatname=name. Definition at line 135 of file corpus.h.

 utt_res_set_cb2mllrname(ur, cb2mllrname)  Based on utt_res_t, I think it is safe to assume that utt_res_set_cb2mllrname(ur, cb2mllrname) is a structure that stores an utterance-based resource, sets the cb2mllrname by passing in a pointer to the cb2mllrname name (string) and the cb2mllrname itself. In this case, ur->cb2mllrname=name. Definition at line 137 of file corpus.h.

The above were all found in the corpus.c file and the main_decode.c file that we went over with Jonas yesterday. However, in my research, I have found the following struct:

 utt_res_set_fsgname(ur, fsgname)  Based on utt_res_t, I think it is safe to assume that utt_res_set_fsgname(ur, fsgname) is a structure that stores an utterance-based resource, sets the fsgname by passing in a pointer to the fsgname name (string) and the fsgname itself. In this case, ur->fsgname=name. Definition at line 132 of file corpus.h.

 ctl_read_entry  Read another entry from a S3 format "control file" and parse its various fields. Blank lines and lines beginning with a hash-character (#) are omitted. Control file entry format: uttfile(usually cepstrum file) [startframe endframe [uttid]] Any error in control file entry format is FATAL. Return value: 0 if successful, -1 if no more entries left. Defined at line 400 on corpus.c.


 * Parameters:
 * fp 	In: an input file pointer
 * uttfile Out: (Cep)file containing utterance data
 * sf 	Out: Start frame in uttfile; 0 if omitted
 * ef 	Out: End frame in uttfile; -1 (signifying until EOF) if omitted
 * uttid Out: Utterance ID (generated from uttfile/sf/ef if omitted)

 ctllmfile  A parameter for S3DECODER_EXPORT ptmr_t ctl_process. In: Control file that specify the lm used for the corresponding utterance. It can be specified optionally. If it isn't, then NULL could be used.

 S3DECODER_EXPORT ptmr_t ctl_process  Process the given control file (or stdin if NULL): Skip the first nskip entries, and process the next count entries by calling the given function (*func) for each entry. Any error in reading the control file is FATAL. This is a function where ctllmfile is used as a parameter.

Helpful links: http://cca.nuigroup.com/docs/0.6/structutt__res__t.html#af072d2e3527498245a7fca97eaf79271 http://www.speech.cs.cmu.edu/sphinx/doc/doxygen/sphinx3/corpus_8h.html#a2 http://www.speech.cs.cmu.edu/sphinx/doc/doxygen/sphinx3/corpus_8h.html#a11

3/23 Camden has assigned us with some work to do before our meeting tomorrow night. Wesley, Arias, and I have been tasked with researching real world instances of how to make the dictionary run faster. I don't want to say too much on here in case the competition is reading my logs. I will be documenting my research in a separate word document to share with my group. I will post what I find at a later date.

3/24 We had our meeting and regrouped. We shared our information and talked about what the next steps are. Arias shared his document with me about what we are researching. We will compare notes at tomorrows meeting. Like I have said above, I will be posting my research when we feel comfortable with the other team. I want to collaborate with Wesley and Arias more and get us on the same page.


 * Results: 3/20 We made a team decision to meet every Saturday. We will be having our first meeting this Saturday. We snaked through a bunch of code with Jonas. The results are to know about the five functions and one file.

3/21 Written above is all the information I have found about what Jonas wanted us to decode. It is based on the the code review from yesterday. We ended at this chunk of code, which is where I got all the structs from:

ptmr_start(&tm); if (func) { utt_res_set_uttfile(ur, uttfile); if (ctllmfile) utt_res_set_lmname(ur, lmname); if (ctlmllrfile) { utt_res_set_regmatname(ur, regmatname); utt_res_set_cb2mllrname(ur, cb2mllrname); }           (func) (kb, ur, sf, ef, uttid); } ctl_read_entry(fp, uttfile, &sf, &ef, uttid)

3/23 I have created a useful word document containing information on how to increase run times. I have shared it with my group and post it here on a later date.

3/24 The results of the meeting are to keep working on optimization.


 * Plan: 3/20 We will be working on the 5 functions and 1 file.

3/21 I have decoded all of the structs. Hopefully the way I did it is how Jonas wants it. Faruk will hopefully have more insight as the week goes on. My next task is to define ctl_read_entry(fp, uttfile, &sf, &ef, uttid), and ctllmfile.

3/23 The plan is to go over my findings tomorrow during the meeting. I have found a lot of information, however I am not sure how to implement it. I am hoping that we can discuss what I found and figure out how to apply it to our project. If we can't it will definitely be useful for people to know and for future capstone students.

3/24 The plan is to meet on Tuesday and talk about more of our findings. Camden has sent out an email pertaining to all our tasks.
 * Concerns: 3/20 I am concerned with how all of that connects together.

3/21 My concern, as always, is "is this how Jonas wants it?". I am hoping that this is what he is looking for.

3/23 My concern is that the hours I spent researching will not be going toward anything productive. Though it is useful, I want to be able to use it to our advantage.

3/24 My concern is that the information that I have found is not applicable to the project. I really would like what I found to be useful and have us utilize it.

Week Ending April 2, 2013

 * Task: 3/27 We had our code review with Jonas. Faruk ran it this time. I am to run it next time. The work that I did last week was somewhat used, I was hoping he would ask more questions about what I had researched. Last week we had to look through the Corpus.c file and some specific pieces of code. This week we have to continue doing that. We have to start in the Corpus.c file and look at how the decode works. He wants us to know about path2basename, utt_decode, and kb_setmllr.

3/28 Read logs. Kept up on discord. We were sent an email from Camden about our assignments this week. I am working with Wesley and Arias. We are to examine the HTML log output from a Train (001.html) and consult with Rose about it, and start narrowing in on what if any improvements can be done to optimize.

3/30 My task it to be in charge of running the decode in class on Tuesday. I am preparing for it by winding through all the files and trying to find out everything I can about path2basename, utt_decode, and kb_setmllr. 4/2 Ran train under /mnt/main/Exp/0306/006 on Idefix.


 * Results: 3/27 Overall, Jonas wants us to keep looking through that code and see what those functions do and snake our way through. I am hoping that my code review goes well and he sees what I have done.

3/28 Read logs. Kept up on discord. We were sent an email from Camden about our assignments this week. I am working with Wesley and Arias. We are to examine the HTML log output from a Train (001.html) and consult with Rose about it, and start narrowing in on what if any improvements can be done to optimize.

3/30


 * Helpful link to decoder tutorial https://www.ee.iitb.ac.in/student/~daplab/resources/Decoder_Tutorial_DAPLAB.pdf
 * Helpful link for documentation https://cmusphinx.github.io/doc/sphinxbase/index.html


 * “   grep -r 'whatever' .   ” –how I am looking for certain variables in all files


 * path2basename- I found this defined in the path:
 * /src/libsphinxbase/util/filename.c >   path2basename(const char *path, char *base)


 * I found path2basename defined in a few places online. I am a little confused. Here is what I found with the sources:


 * In this link:  https://cmusphinx.github.io/doc/sphinxbase/filename_8h_source.html   I found this:


 * /sphinx3/src/libutil/filename.c
 * path2basename returns the last part of the path, without modifying anything in memory.
 * -Void function, takes in array.
 * -Defined in filename.c at line 53 >   const char *path2basename(const char *path);


 * In this link:  http://www.speech.cs.cmu.edu/sphinx/doc/doxygen/sphinxbase/filename_8h.html#d8a88d52ec0af498bc126a90871b2efe    I found this:


 * path2basename strips off leading path components from the given path and copy the base into base.
 * -Caller must have allocated base.
 * -Defined in filename.c at line 75 >

path2basename(const char *path, char *base) {    int32 i, l;     l = strlen(path); for (i = l - 1; (i >= 0) && !(path[i] == '/' || path[i] == '\\'); --i); for (i = l - 1; (i >= 0) && !(path[i] == '/'); --i); strcpy(base, path + i + 1); }
 * 1) ifdef WIN32
 * 1) else
 * 1) endif


 * Using grep, this is what I found:


 * The path2basename has been found in the following paths:
 * /test/unit/test_util/test_filename.c
 * /src/libsphinxbase/util/filename.c
 * /include/filename.h
 * /sphinx3/src/libs3decoder/libcommon/misc.c
 * /sphinx3/src/libs3decoder/libcommon/corpus.c
 * /test/sphinx3/src/libs3decoder/libcommon/.libs/corpus.o
 * /test/sphinx3/src/libs3decoder/libcommon/.libs/libcommon.a
 * /test/sphinx3/src/libs3decoder/libcommon/.libs/misc.o
 * /test/sphinx3/src/libs3decoder/libcommon/corpus.o
 * /test/sphinx3/src/libs3decoder/libcommon/misc.o
 * /sphinx3/src/libs3decoder/libcommon/misc.c
 * /sphinx3/src/libs3decoder/libcommon/corpus.c


 * Note that in all these files, the following variables are passed in:
 * Testname, testout, ctlspec, base, uttfile


 * Variable		/	Path			/		     Definition

(Ctlspec is complete utterance spec in the input control file, and uttid is the last component of Ctlspec)
 * *path:	           /src/libsphinxbase/feat/feat.c:                                   char *path;
 * *base: too many instances, however I did find 	/sphinx3/src/programs/main_ep.c:     char base_fn[1024];
 * Testname:      sphinxbase-0.6.1/test/unit/test_cmdln/_test_parse_goodargs.test:      testname=`basename $0 .test`
 * Testout:       sphinxbase-0.6.1/test/unit/test_util/test_filename.c:                 char testout[32];
 * Ctlspec:           test/sphinx3/include/misc.h:                                      char *ctlspec
 * Uttfile:          /sphinx3/src/programs/main_align.c:                  build_output_uttfile(char *buf, char *dir, char *uttid, char *ctlspec)

utt_decode-
 * The only instance that I could find where utt_decode was defined is in the path libs3decoder/ libAPI /utt.c and it was defined as:	->utt_decode
 * I also found in path /sphinx3/include/#utt.h#:       void utt_decode (void *data,  /**< A kb */
 * And in path
 * /test/sphinx3/src/libs3decoder/libAPI/utt.c:  utt_decode(void *data, utt_res_t * ur, int32 sf, int32 ef, char *uttid)
 * Description in utt_decode: Convert input file to cepstra if waveform input is selected


 * file:///C:/Users/Danielle/Downloads/SunyiHu-PhDThesis-2011.pdf on page 328
 * Defined in utt.c. Computes the feature vectors of the input speech and perform recognition.
 * Inputs:
 * 1) The global structure kb_t
 * 2) The utterance resource structure utt_rest_t
 * 3) Start frame for decoding
 * 4) End frame for decoding
 * 5) The ID of utterance
 * Procedure:
 * 1) Read the input speech cepstral file and build feature vectors for the entire utterance, by calling function feat_s2mfc2feat, defined in feat.c
 * 2) Record the total number of frames in the input speech utterance if the feature vectors are built successfully, otherwise report an error
 * 3) Initialise the recognition of the utterance, by calling function utt_begin, defined in utt.c
 * 4) Recognise a block of incoming feature vectors, by calling function utt_decode_block, defined in utt.c
 * 5) Finish the recognition of the utterance, by calling function utt_end, defined in utt.c
 * 6) Update the total number of frames processed, defined in the statistics structure stat_t.
 * Within utt_decode as well as my grep search and research online, I found utt_decode_block. This seems to be a very important function and is a main component of the utt_decode.
 * Within utt_decode as well as my grep search and research online, I found utt_decode_block. This seems to be a very important function and is a main component of the utt_decode.
 * Within utt_decode as well as my grep search and research online, I found utt_decode_block. This seems to be a very important function and is a main component of the utt_decode.

void utt_decode_block(float ***block_feat,  /* Incoming block of featurevecs */   	                 int32 no_frm,  /* No. of vecs in cepblock */   	                 int32 * curfrm,        /* Utterance level index of   	                                           frames decoded so far */   	                 kb_t * kb      /* kb structure with all model   	                                   and decoder info */   	    )


 * utt_decode_block: This function decodes a block of incoming feature vectors. Feature vectors have to be computed by the calling routine. The utterance level index of the last feature vector decoded (before the current block must be passed. The current status of the decode is stored in the kb structure that is passed in.


 * kb_setmllr- Sets MLLR
 * Found in kb.h
 * void kb_setmllr(const char * mllrname, const char * cb2mllrname, kb_t* kb)
 * Parameters:
 * mllrname 	In: The name of the mllr model
 * cb2mllrname 	In: The filename of the MLLR class map


 * Some interesting places where I found this function with comments:
 * /sphinx3/src/libs3decoder/libAPI/utt.c:       kb_setmllr(ur->regmatname, ur->cb2mllrname, kb);
 * /sphinx3/src/programs/main_continuous.c:       kb_setmllr(ur->regmatname, ur->cb2mllrname, kb);
 * /sphinx3/src/programs/main_livepretend.c:       kb_setmllr(ur->regmatname, ur->cb2mllrname, kb);
 * /sphinx3/src/libs3decoder/libsearch/kb.c: * 2, Moved most of the code in kb_setmllr to adaptor.c
 * /sphinx3/include/kb.h:void kb_setmllr(const char* mllrname, /**< In: The name of the mllr model */
 * /sphinx3/src/libs3decoder/libAPI/utt.c


 * From what I am seeing, it looks like it is defined in file main_continous.c and main_livepretend.c

4/2      .-.      |                            hyp.trans                            | |-|     | SPKR    | # Snt # Wrd | Corr    Sub    Del    Ins    Err  S.Err | |=================================================================|     | Sum/Avg | 4172  60215 | 73.1   19.1    7.8    7.4   34.4   87.5 | |=================================================================|     |  Mean   |  1.3   19.1 | 76.0   18.3    5.8   15.4   39.4   87.9 | | S.D.   |  0.5   16.5 | 18.1   15.3    7.7   29.1   33.0   30.1 | | Median |  1.0   15.0 | 76.2   16.7    2.4    4.2   33.3  100.0 | `-'


 * Plan: 3/27 The plan is for me to research those functions and see where they go. I really would like to create a map to where everything is. If I can do that this week I will be happy. This could depend on where the code takes me. It might not be located in any other file, however if it is I am going to generate a list of where the code appears.

3/28 Read logs. Kept up on discord. We were sent an email from Camden about our assignments this week. I am working with Wesley and Arias. We are to examine the HTML log output from a Train (001.html) and consult with Rose about it, and start narrowing in on what if any improvements can be done to optimize.

3/30 The plan is to hope Jonas likes this. I also hope that this is what he wants. We will be meeting as a team tomorrow. I am hoping to get specific tasks so that I can work more with my team. Arias and I are going to meet afterward and work on our portion of the team project.

4/2 Now that I have run my first train with my team, Guardians, I will start running more trains with different data manipulated to get a more accurate percentage.
 * Concerns: 3/27 My concern is always the same. Is what I am doing what Jonas wants and is it going to move us in a positive direction?

3/28 Read logs. Kept up on discord. We were sent an email from Camden about our assignments this week. I am working with Wesley and Arias. We are to examine the HTML log output from a Train (001.html) and consult with Rose about it, and start narrowing in on what if any improvements can be done to optimize.

3/30 Concern is that I have not done enough research to know what Jonas wants me to know during the code review.

4/2 Dan has helped me a lot today and I am more confident in helping my team.

Week Ending April 9, 2013
4/3 I was unfortunately sick today and did not go to class nor do my code review. I heard Jonas was late to class and left early and we did not do the code review at all. I kept up with discord, emails, and talked to my group members.
 * Task:

4/5 I went to run a 5hr train for my group. I hit a few hiccups. I was not allowed access to 0309 even by root. I also was getting weird errors. Camden helped get me through it. I also realized I edited the facets at the incorrect time. I did run a 5hr train, however I reached a weird point when running nohup scripts_pl/RunAll.pl &. I changed $CFG_STATESPERHMM=3 to $CFG_STATESPERHMM = 8. The results are recorded below.

4/7 Ran another train to see if my previous train was failed because of changed facet. It is under 0309/011.

4/9 Read logs, kept up with discord and kept up with my groups. Camden sent out a plan for us this week. I will be running 30hr trains combining different facets that will give us a lower word count.


 * Results:

4/3 I was unfortunately sick today and did not go to class nor do my code review. I heard Jonas was late to class and left early and we did not do the code review at all. I kept up with discord, emails, and talked to my group members.

4/5 MODULE: 50 Training Context dependent models       (2018-04-05 13:36) Phase 1: Cleaning up directories: accumulator... logs...  qmanager... completed Phase 2: Copy CI to CD initialize init_mixw Log File completed Phase 3: Forward-Backward <p class='result'>Baum welch starting for 1 Gaussian(s), iteration: 1 (1 of 2) <p class='result'>Baum welch starting for 1 Gaussian(s), iteration: 1 (2 of 2) <p class='result'>bw Log File</a> <p class='result'>bw Log File</a> <p class='error'>FATAL_ERROR: "main.c", line 1054: initialization failed <p class='error'>FATAL_ERROR: "main.c", line 1054: initialization failed FAILED <p class='error'>Failed to start bw FAILED <p class='error'>Failed to start bw <p class='error'>Only 0 parts of 2 of Baum Welch were successfully completed <p class='error'>Parts 1 2 failed to run! Training failed in iteration 1

All in all, I think that when I changed this facet it caused itself to fail at module 50.

4/7 The result after running the nohup script was it built successfully and got to module 99. This concludes that if you change T14 it will fail at 50. Currently 0309/011 only has up to the nohup script and that is it.

4/9 Read logs, kept up with discord and kept up with my groups. Camden sent out a plan for us this week. I will be running 30hr trains combining different facets that will give us a lower word count.


 * Plan:

4/3 I was unfortunately sick today and did not go to class nor do my code review. I heard Jonas was late to class and left early and we did not do the code review at all. I kept up with discord, emails, and talked to my group members.

4/5 I am going to troubleshoot and make sure that it failed because of what I changed and not just some weird error.

4/7 I am going to assist in running combination 5hr trains.

4/9 Read logs, kept up with discord and kept up with my groups. Camden sent out a plan for us this week. I will be running 30hr trains combining different facets that will give us a lower word count.


 * Concerns:

4/3 I was unfortunately sick today and did not go to class nor do my code review. I heard Jonas was late to class and left early and we did not do the code review at all. I kept up with discord, emails, and talked to my group members.

4/5 I am very curious as to why it failed when running the nohup command. I am concerned it does not have anything to do with what I changed and that it is a weird error.

4/7 I am hoping that the facets we have changed combined will give us good results.

4/9 No concerns right now, just about running the correct combo of trains.

Week Ending April 16, 2013

 * Task: 4/10 Wesley took me into the server room and showed me how to mount and unmount main and install software. The commands can be found under the software group wiki. We also had our code review with Jonas.

4/11 Though we have moved on from 5hr trains, I wanted to test why T14 failed for me but was successful for Rose. I ran a 5hr train that changed T14=3 to T14=7. Rose was successful with T14=5. Rose had originally said that T14 can take 3, 5, or 7, which is why I decided to try T14=7. It did fail again at module 50 but with a different error, which others had gotten as well.

4/12 I ran two 30hr trains today, one on Idefix and one on Majestix. Miraculix is being worked on at the moment by Jonas and Steve. I worked with Camden a bit to choose the best 30hr trains. They are running at this moment, will check back tomorrow morning when I get up.

4/13 I started the decode today on 0309/40 on Majestix. Seems to be going well. The train seemed to have an error on 0309/039, so I deleted it and started over and the decode is now running on that on Idefix. The results should be done by tonight or tomorrow.

4/15 The results of my trains and decodes can be found under experiments 0309/039 and 0309/40. We are deciding what results we want to do our 300hr trains on.

4/16 I looked at the code that Jonas asked us to last week. It was quite challenging trying to find information on the code he suggested. What I found is in the results portion below.
 * Results: 4/10 I learned how to install software on the servers and learned a little more about how the software on the servers work and the server room in general.

4/11
 * MODULE: 50 Training Context dependent models
 * Phase 1: Cleaning up directories:
 * accumulator...logs...qmanager...
 * Phase 2: Copy CI to CD initialize
 * Phase 3: Forward-Backward
 * Training failed in iteration 1
 * Something failed: (/mnt/main/Exp/0309/033/scripts_pl/50.cd_hmm_tied/slave_convg.pl)

I messaged Arias and he said that that error is because it is gaussian. He also said that for Sphinx3, the number of states for each HMM in the acoustic model is usually 3 or 5.

4/12 Will update when I get the results. The changes I made to the facets are recorded in our spreadsheet. On both trains, T10, T11, and T13 were changed, while one train has D2 edited and the other has T23 edited.

4/13 When I ran /0309/039, there was an error, which is why I started over and it went well the second time. Will post results when they come.

4/15 The results were not too great, but I still think they are useful in deciding what 300hr trains to run.

4/16


 * HMM_compute_lv1(void *srch_struct)
 * srch_fsg.c:    /* hmm_compute_lv1 */           srch_debug_hmm_compute_lv1,
 * srch_word_switch_tree.c:srch_WST_hmm_compute_lv1(void *srch)
 * srch_word_switch_tree.c:       /* hmm_compute_lv1 */           srch_debug_hmm_compute_lv1,
 * srch.c:           assert(s->funcs->hmm_compute_lv1 == NULL);
 * srch.c:           if (s->funcs->hmm_compute_lv1 == NULL)
 * srch.c:                   ("Search one frame implementation is not specified but srch_hmm_compute_lv1 is not specified\n");
 * srch.c:           /* This should be part of hmm_compute_lv1 */
 * srch_allphone.c:       /* hmm_compute_lv1 */           srch_debug_hmm_compute_lv1,
 * srch_flat_fwd.c:       /* hmm_compute_lv1 */           srch_debug_hmm_compute_lv1,
 * srch_do_nothing.c:     /* hmm_compute_lv1 */           NULL,
 * srch_debug.c:srch_debug_hmm_compute_lv1(void *srch)
 * srch_debug.c:  /* hmm_compute_lv1 */           srch_debug_hmm_compute_lv1,
 * srch_time_switch_tree.c:srch_TST_hmm_compute_lv1(void *srch)
 * srch_time_switch_tree.c:       /* hmm_compute_lv1 */           srch_debug_hmm_compute_lv1,


 * srch_utt_begin (srch->funcs->utt_begin(srch))


 * int32 srch_utt_begin(srch_t* srch); <-- found in srch.h


 * int32
 * srch_utt_begin(srch_t * srch)
 * int32 i;
 * if (srch->funcs->utt_begin == NULL) {
 * E_ERROR
 * ("srch->funcs->utt_begin is NULL. Please make sure it is set.\n");
 * return SRCH_FAILURE;
 * }
 * }


 * srch.c: * Disabled support of FSG. Added comments for srch_utt_begin and srch_utt_end.
 * srch.c:srch_utt_begin(srch_t * srch)
 * srch_utt_decode_blk
 * Decode one block of speech and provide the implementation of the default search abstraction
 * Parameters:
 * srch 	In: a search structure
 * block_feat 	In: a pointer of a two dimensional array
 * block_nfeatvec 	In: Number of feature vector
 * curfrm 	In/Out: a pointer of the current frame index
 * curfrm 	In/Out: a pointer of the current frame index


 * Plan: 4/10 For team guardians I am going to be running some more trains and want to go a little deeper into T14. I am also going to research the code for the code review next week. Jonas wants us to start with:


 * ../src/lib3decoder/libsearch
 * Look into HMM_compute_lv1
 * Look in srch.c at srch_utt_begin (srch->funcs->utt_begin(srch))
 * Look at the large else block in srch_utt_decode_blk

4/11 Plan is to come up with trains that will give the best results. (Being very vague in case the Avengers are reading).

4/12 The plan is to have good results with these trains and that these trains have useful information on moving on to the next step.

4/13 Analyze the results and see if they are helpful.

4/15 This week we are deciding which results are best to run our 300hr experiments on.

4/16 What Jonas wants us to do for next week's code review!


 * Concerns:

4/10 We are getting into the last weeks of capstone and I want to make sure I do as much work as I can and really help next years capstone.

4/11 No concerns at the moment.

4/12 My concern now is making sure these trains run correctly and are successful so our next steps give us the best results.

4/13 Concerns are how the decode is running and hopefully the results are positive and can help us.

4/15 Not too concerned right now.

4/16 I really did not find much this week on the code he wanted us to research, so hopefully he accepts what little I have.

Week Ending April 23, 2013

 * Task: 4/17 Faruk, Josh and I had out code review with Jonas. It was a bumpy start but once we got going we discussed HMM_compute_lv1 and that it was not used really anywhere. It was always set to null and not really called. After that we met with Guardians and Wesley and I were designated a new task. I am going to put it in a word document and post it at a later time so the Avengers do not see it.

4/19 Wesley and I have a task of researching a specific equation and seeing if when it is manipulated it will increase our results. I have been working with Dan and Wesley today on some of my hypotheses. I have put it in a word document to prevent the other team from looking. I will be posting it in discord for my team to see and will also be posting it at the end of the semester.

4/20 Running trains and such... I will be running a series of 5hr lda trains back to back, however I can only do one at a time and only on majestix, so it will take a few days.

4/22 Trains on trains on trains..... Lots of LDA trains.

4/23 My trains are not going that well. The variable I tried manipulating did not work when I tried to replicate it. I am going to try another approach. So more trains it is, just on another track.

4/24 After running about 7 5hour trains, I have got two to match by manipulating rand(s). Hooray!!!!

/mnt/main/root/sphinx3/src/libs3decoder/libsearch
 * Results: 4/17 Here is the code that we have to review for next week:

srch_TST_compute_heuristic(void *srch, int32 win_efv) {      srch_t *s; srch_TST_graph_t *tstg; pl_t *pl; ascr_t *ascr; mdef_t *mdef; s = (srch_t *) srch; tstg = (srch_TST_graph_t *) s->grh->graph_struct; mdef = kbcore_mdef(s->kbc); ascr = s->ascr; pl = s->pl; if (pl->pheurtype != 0) pl_computePhnHeur(mdef, ascr, pl, pl->pheurtype, s->cache_win_strt,                            win_efv); return SRCH_SUCCESS; }

4/19 It seems that what I have found is useful, however we cannot find the link between the files that we needed. This is good and bad news. The good news is that because mllt.py is referenced by pearl scripts and not binaries, there is no need to recompile. Bad news is that information really does not mean anything. I will be starting 5 hour LDA trains.

4/20 Will post results of each train as they come!

4/22 Comparing all of the trains I will be running when all 10 are done. I am running decode #4. My trains manipulate the random variable.

4/23 My 0309/061 and my 0309/069 do not match and they should. Therefore I did not manipulate the correct variable, so I am going to try something else...

4/24 0309/070 and 0309/073 are a match in results!


 * Plan: 4/17 Faruk, Josh and I are going to have a zoom meeting of our findings on Sunday. We will discuss the code and compare notes. Wesley and I are going to be focused on our task this week, which will involve running apprx 10 5 hour trains.

4/19 Plan is to change the variable 10 times to see what we can get. We are just going to hope it works since I could not find a range of numbers!

4/20 Keep running trains until they are all done.

4/22 Trains on trains on trains..... Lots of LDA trains.

4/23 I am going to manipulate another variable in the equation to try to control the randomness.

4/24 Now that I have made results match, I am going to try to see if I can get good results....


 * Concerns: 4/17 No concerns about the code review. A little concerned on finding some information on me and Wesley's task, but I think we will be able to.

4/19 Well, I am kind of winging this, so I hope one of these numbers works!

4/20 The length of time this is going to take...

4/22 Concerns are the semester is ending and I want to keep running trains.

4/23 CAN I MANIPULATE THE VARIABLE?!

4/24 I am running out of time now that the semester is coming to an end, so I probably won't get to the work I want to on this.

Week Ending April 30, 2013

 * Task: 4/24 We had the code review today. Faruk led the way. Next week it is up to Lamia and Wesley to run it. I talked to Jonas about the randomization I have been working on for LDA trains. He gave me some suggestions. I went home and tried them. They failed twice. I emailed him about it, and while I was waiting for a response I ran another train (076) to manipulate the randomization.

4/25 Still trying to work on the LDA randomization. I tried Jonas' suggestions and they failed each time. I had a total of 6 attempts today and they all had an error, 5/6 had mdef errors. Jonas told me to add this code to the mllt.py file (0309/079/python/sphinx/mllt.py):

if A == None: # Initialize it with a random positive-definite matrix of                      # the same shape as the covariances

my_seed = random print "MLLT SEED: " + my_seed seed(my_seed);

s = self.cov[0].shape d = -1 while d < 0: A = eye(s[0]) + 0.1 * random(s) d = det(A)

4/26 I will be running a series of trains to keep trying to find the range of the randomization value. Yesterday I ran 7 different trains and decode attempts (I kept the successful ones, I documented the errors in the experiment logs). I kept getting mdef errors when adding code to the mllt.py file, even when it was a simple print statement. Not sure why... So today instead of trying to determine the value by printing it, I am just going to keep running trains and decodes with setting the numbers to constants. When I say that, I mean by taking this equation: A = eye(s[0]) + 0.1 * random(s) in the mllt.py file and changing it to A = eye(s[0]) + 0.1 * (some number).

4/27 So, these past few days have been frustrating. I am manipulating LDA randomization to get steady results, however the numbers I have been picking are not doing much to the results. I have been picking larger numbers, 150-10000 and they have not one anything significant. Today I am going to pick lower numbers instead and see if that does anything.

4/30 This weekend (4/28 and 4/29) I worked with Camden and Dan on running 300hr experiments. Friday night Brian ran some. Saturday, Camden had me run a 300hr experiment trying to replicate 072 to confirm. At this moment it is still running. He also wanted me to run an LDA train with 072 parameters with LDA, D2 & D3 = 1e-80, and the randomization set to .5. Those can be seen under 0309/089 and 0309/090. Sunday we realized that running LDA on Majestix does not give consistent results. I had to restart the train over because it did not fill everything under model_parameters. I restarted it last night.


 * Results: 4/24 So far I have run 9 trains, 1 baseline (0309/058), 4 manipulating .1 (0309/061, 066, 067, 069) and 3 manipulating rand(s) (0309/070, 073, 075, 076). The 4 I have run that manipulate rand(s) have given me similar results with small variations. I am going to try bigger numbers to see if I can get more of a variation.

4/25 Well, after many attempts, they all had an mdef failure when decoding. The trains all ran but the decode failed instantly with that error. My struggle can be read here: https://foss.unh.edu/projects/index.php/Speech:Exps_0309_078

4/26 The results are not as I had hoped. In experiment 0309/082, I set the number very high to see if it would do anything. The results were pretty much the same, if not different by .1.

4/27 The results I got were great! I had a WER of 25%. This was when I changed the randomization from rand(s) to .5. The results can be seen in 0309/085.

4/30 Both 300hr trains are still running.


 * Plan: 4/24 My work for this semester is going to be ending on 4/27 so I can start my portion of the final report. I am waiting for Jonas to give me his notes so I can hopefully have a better idea of the randomization factor before capstone ends. In three days I hope to have a better idea of the randomization factor.

4/25 I have one more thing I am going to try to implement. All I want is the random number to print. I added a print statement and it failed. I added another one and hopefully it works this time. I just want the random number to print!!!!

4/26 I am now going to set the number extremely low and see if that does anything.

4/27 I let Camden know about my results. We will be running 300hr trains and decodes this weekend.

4/30 We are coming to a close and are starting final reports. Camden has got a good chunk of it done, and I sent him my portion of the Guardians final report two days ago.


 * Concerns: 4/24 Having a better idea of what it does so I can explain it to the students next year.

4/25 Printing the random number. PLEASE.

4/26 None really cause this semester is ending.

4/27 TIME.

4/30 TIME.

Week Ending May 7, 2013

 * Task:

5/1 Had class today. Wesley and Lamia did the code review. I am currently waiting for my 300hr to be done training/decoding for Guardians. Had software group meeting, we created our part of the final proposal.

5/2 Met with Dan and Steve. Talked LDA for a few hours, ran some trains, under 0311. 2 of my decodes finished!

5/3 My third decode finished, and the results are great. They can be found under 0309/090. I did expect them to be that great. That is not good news though because my unseen decode results for that experiment (0309/094) were so bad.

5/4 Our team has finalized which experiment to go ahead and use for the final 300hr. My decode finished, got very weird error. Will update when I find out more. It is 0309/089. It had finished on 5/2. I did everything correctly, and I don't want to call it an error, because everything was done the right way, but the scoring.log was:
 * sclite: 2.3 TK Version 1.3
 * Begin alignment of Ref File: '089_train.trans' and Hyp File: 'hyp.trans'
 * sclite: 2.3 TK Version 1.3
 * Begin alignment of Ref File: '089_train.trans' and Hyp File: 'hyp.trans'
 * sclite: 2.3 TK Version 1.3
 * Begin alignment of Ref File: '089_train.trans' and Hyp File: 'hyp.trans'
 * sclite: 2.3 TK Version 1.3
 * Begin alignment of Ref File: '089_train.trans' and Hyp File: 'hyp.trans'

I reran it and the results came in today and that was also the only thing in the scoring.log.


 * Results: 5/1 Waiting for my 300hr decodes to be done.

5/2 The results of my decodes are found under 0309/093 and 0309/094. 0309/093 results are great, 0309/094 results are not good.

5/3 The results of my seen decode (0309/090) were great results but my unseen (0309/094) were really bad. :(

5/4 The results of my weird experiment are found at 0309/089.


 * Plan: 5/1 Wait for my 300hr decodes.

5/2 Waiting for 2/4 more decodes to finish.

5/3 One more decode to finish. Hoping it finishes tonight or tomorrow!

5/4 Running that last 300hr and finalizing everything.


 * Concerns: 5/1 My 300hr decodes need to hurry!

5/2 Well, my unseen LDA decode was really really bad. I guess it is not the best way for us to go...

5/3 Why was LDA unseen so bad?!

5/4 Why did I get that error for 0309/089.