Speech:Spring 2018 Joshua Young Log


 * Home
 * Semesters
 * Spring 2018
 * Proposal
 * Report
 * Information - General Project Information
 * Experiments - List of speech experiments

Week Ending February 4th, 2018
1/30 - Talked with Prof. Jonas more about what we as a group are trying to do. Create a communication channel (Discord?) to get that going. Get familiar with the layout of sphinx in terms of file structure and file type. Need to look into project and get more info on the sphinx decoder to hopefully gain a grasp of what is going on in the decompiler.
 * Task:

1/31 - Today I wanted to go through the file structure to see if there was anything that stood out. I am mainly looking in the sphinx folder, however there is a sphinx-base folder was well with seemingly the exact same structure as the aforementioned folder. Going to see if I can spot any major differences in the two locations. Send the group a tree of both locations. to get a better visual of what we are looking at.

2/2 - Looked at log files, and checked discord

2/4 - Looked at updated logs

1/30 - After talking with Prof. Jonas I feel a bit better about the project. He gave a bit more direction for our part. Setup a discord channel and the group decided that a voice chat on Sundays would be a good way to regroup. Discord allows us to have multiple text and voice channels so it is an all in one communication solution and makes it easier to drop new messages anytime. Started looking into the file structure of sphinx. I think it may take us a few days if not longer to comb through for the more important files that should be looked at before others. Did not end up looking into the sphinx decoder in terms of documentation today.
 * Results:

1/31 - Started today out by getting a directory tree of sphinx3 and sphinx-base. After comparing the two, the file structure is identical. so unless there are changes in individual files, I think it would be better to focus on the sphinx3 folder.

2/2 - Saw that Wesley found Main files via discord in /mnt/main/root/sphinx3/src/programs

1/30 - Talk to Prof. Jonas about details of project. Log into Caesar and start looking around to see what I can find. See if anything strikes me as important so I can take a look at it this week.
 * Plan:

1/31 - Do a deeper dive of files in the sphinx3 directory send group tree file of the folder for a bit of a visual aid.

2/2 - Look at logs and discord 1/30 - Still a lot of unknowns. We are the first group to do this and there are no notes from previous years for the group. The assignment does seem a bit open ended which could lead to some confusion and may slow us down a bit.
 * Concerns:

1/31 - Not sure why there is a sphinx-base and sphinx3 folder in the same location with seeming the same files in them. If they are the same one should possible be removed. I get keeping a backup, but if that is the case then it should be in a different location.

Week Ending February 11, 2018
2/6 - Today I want to at least start an experiment and as a group we are going to start figuring out how we are going to go through the files. On the Experiment side I wanted to get a train to complete.
 * Task:

2/7 - My goal for today is to look for the two files I feel we should look at first.

2/10 - I want to try and finish running an experiment. On Tuesday I only completed the train. So today I want to create the Language Model and run the decode.

2/11 - Today we are going to have a group meeting via discord. the two main points we are going to iron out are:
 * Finishing the group proposal
 * Choosing the two files the group will start with

2/6 - I successfully ran a train. It did take a while once I got it going, however it did finish. As far as I can tell it generated the files needed. I assume this will show when I try and finish the Experiment. The documentation on running the train was a bit confusing to follow. I included the commands and steps I used below. (Steps 1 and two were already completed at this point)
 * Results:

Step 3 - Create the Directory Stucture Make sure you are in your sub-experiment directory. Ex: "/mnt/main/Exp/0303/007" makeTrain.pl switchboard 30hr/train

The doc says to skip step 4 when creating your first experiment.

Step 5 - Generate Feats Data Make sure you are in your sub-experiment directory. Ex: "/mnt/main/Exp/0303/007" genFeats.pl -t

Step 6 - Run the Train Use Top command to see if there is anyone else using the server you are logged into and if so, see if there is a different server with no-one using it. nohup scripts_pl/RunAll.pl & After running this you just have to wait for the script to finish, and if all goes well the Train should be done.

2/7 - Did not have time today to look too deeply into the files. Took a look at logs and checked in with discord.

2/10 - SUCCESS! Got a result after running a decode. This took me a while due to the documentation being a bit light. One suggestion is to add a file tree so you can see what everything should look like. Don't want to overstep my bounds by changing it, but it may be good to have. What took me a while was needing the LM folder in the right folder. After looking at vatali's logs from last year, I figured out my issue, which was having it in the wrong place, or rather having my experiment folder structure incorrect. I had 0303/007/001/LM, and I needed 0303/007/LM, and have the bulk of the files in the 007 folder. Once I did that ti seemed to work. the only errors I saw in my decode.log, were "NOT A WORD" errors, which sounds like there were spelling mistakes in the transcript.

2/11 -

2/6 - I am going to follow the documentation written on the wiki for running a train.
 * Plan:

Found at | Run a Train

For the group We are going to look at the best way to get a handle on the sphinx decoder. There are a lot of files, and for me at this point it is hard to see how they all (or just some) tie together. Our best idea at the moment is for all of us to look though the files and all choose which files we feel are the most important and explain why. We have decided to meet via discord on Sundays, so we are going to talk about the files we found this Sunday. During this we will hopefully find a starting point.

2/7 - Dont have a concrete plan for this task. I am just going to start looking at the files and see how / if they link together. Hopefully get a grasp on what is going on in the files.

2/10 - I am going to go though steps 2 and 3 on the wiki for running an experiment, and if all goes well complete my first experiment.

Steps are found at |This Link.

2/11 - 2/6 - The documentation is not the best, and I am concerned that doing the last two steps will have similar quality of documentation.
 * Concerns:

2/7 - No Concerns

2/10 - The quality of the wiki at least for running an experiment seems a but low, but Prof. Jonas did warn us about that.

2/11 -

Week Ending February 18, 2018
2/13 - Take a look into main_decode.c which is where the MAIN function is. Figure out how to split it up into sections for everyone. setup a way to document the notes on it. Helped some people that were having trouble with their first experiment to get everyone done with them.
 * Task:

2/16 - Looked at logs and discord

2/17 - Look at section of main_decode.c to get a better understanding of part of it and look at the gruup notes on it, for other sections.

2/18 - Have meeting on discord. talk about main_decode.c, and look at our part of the proposal to add anything that we think it needs to look better.

2/13 - Danielle added the code to our group wiki page to start adding note to it. I believe that everyone that needed help is all set now. Seemed to be mostly typing errors, so I dont think anyone has any major problems with experiments, they just needed another set of eyes.
 * Results:

2/16 - Looked at logs

2/17 - Danielle and Wesley worked on part of the file, so I started to look into the section between the code they looked at. It was mostly macro calls, with some args being set, toward the end. they seemed to be mostly being set to NULL, possibly to be used later / in other files.

2/18 - We added edits per Hanna, how ever seemed very repetitive. I guess it looks like the other groups? Changed some of the details as well. Finished up notes on the file. Lamia has an idea to write a quick summary about the file at the bottom, which we all agreed with.

2/13 - Sit with a group before class and work out any issues others are having with their experiments. Talk with group to see how we all want to tackle this first file.
 * Plan:

2/16 - Have the file open in atom (for ease of reading). Use the documentation page Danielle found for the macros, then just look through the code for the rest.

2/17 - look at logs and discord

2/18 -Connect to discord with everyone, first start with the proposal to iron out any other things we want to change. Then talk about main_decode.c to try and get everyone on the same page with it. 2/13 - We had been looking into the wrong items, but I think we are on the right track now.
 * Concerns:

2/16 - none

2/17 - none

2/18 - Having some push back regarding the proposal. We changed it to be inline, but I dont feel it is the right way to do it.

Week Ending February 25, 2018
2/20 - Fix the proposal. After not great comments from Prof. Jonas the class decided to rework the proposal. After guidance from prof. Jonas we started to re-format the proposal in line with the structure of the 2015 proposal. Due to some issues, we have a deadline extension, so our group will work our part out tonight to be set to go.
 * Task:

2/23 - Looked at logs and discord

2/24 - Looked at logs and discord

2/25 - Start looking into corpus.c to get a handle on what is going on. I don't plan on finishing it today. Going to try a new format for note taking as I felt the why we did main_decode was a bit messy. 20 - As a group we reformatted our proposal section, then waited for more edits to come in to make it flow with the whole document. Seemed like multiple people held off editing until very close to the dead line, so an extension was asked for and given. At this point our group got on discord to discus our part and to try and get the wording to match other groups that we were also talking with on discord. We changed our timeline to be tasked based instead of date based. We also added start and estimated completion dates along with who owns that particular task. We also changed our objectives section by adding sections for each overall objective. This cleaned it up and made it inline with the rest of the document. We also reworded the overview to match the document.
 * Results:

2/23 - Looked at logs and discord

2/24 - Looked at logs and discord

2/25 - Got part way through the file before stopping. I found that corpus.h's documentation online has some notes on what each function does. One function  wasn't there but it was easy enough to figure out what it did. The new structure is laid out below: corpus_t* corpus_load_tailid	( 	const char * 	file,                                 int32(*)(char *str) 	validate,                                  int32(*)(char *s1, char *s2) 	dup_resolve                                  ) Parameters: file 	Input file name, the file must be seekable and rewindable Notes: Similar to corpus_load_headid, but the ID is at the END of each line, in parentheses.

2/20 - Work on the proposal as a class while we are all in the building. Due to completion concerns we also got on discord to work out the document and get our section to match the rest of the it.
 * Plan:

2/23 - Looked at logs and discord

2/24 - Looked at logs and discord

2/25 - Start by copying all the function names and parameters that go along with them to a text file, and separate them to start writing notes. I want to have a section for parameters, and one for general notes on each function. 2/20 - People waited to work on their sections it seemed then all worked on it at the same time. I feel that there needs to be better communication between groups to avoid issue like this.
 * Concerns:

2/23 - No concerns

2/24 - No concerns

2/25 - No concerns

Week Ending March 4, 2018
2/27 - Finish notating corpus.c. Get a status for majestix from the systems group so that we can start working with it. Sign up for the URC.
 * Task:

3/1 - Today we are having a meeting with Professor Jonas, via google hangouts. We have planned code reviews for the past couple weeks, but the class meetings have been running long so we havent been able to meet in person. Not 100% Sure what I am looking for with this meeting, however I hope that it will give us a bit more direction

3/2 - Looked at logs

3/5 - Posting notes for corpus.c until it can be added to the group wiki

2/27 - I was able to complete my notes for corpus.c. I will add them to the wiki in the next couple days. The group is registered for the URC. Systems said majestix is set to go for us. Wesley and Danielle, will be looking into RCS, While Lamia and I look into GCC and the possible updates needed to be done. We want to try and get both those tasks done in the next couple weeks.
 * Results:

3/1 - I would say that the meeting went well. (And green screens are fun). See below for my notes from the meeting.
 * How are Shinx3 and Sphinx base related.
 * Seems there are two copies of sphinx base... Do we need both or can we remove one?
 * Look at the Test Folder and see what it was used / is used for.
 * Delete Directory is at mnt/main/root/DELETE
 * Seems that there is a lot of redundancy all over the place
 * Find tiny_train doc
 * Take a look at the Make file/files to see if we can get a handle on the files we should be looking at.
 * Did a C pointer crash course.

//Java Money cash; cash = new Money; cash.amount; //C Money cash; cash = new Money; cash.amount; doStuff (&cash); ... public doStuff(Money *c) {   (*c). amount; //Line above and below do the same thing c->amount; }

3/2 - Looked at logs

2/27 - Continue noting like shown above. It looks cleaner and makes it easier to follow in my opinion. Check in with the systems groups. Lamia is signing the group up for the URC.
 * Plan:

3/1 - I called Professor Jonas and he did not answer, however shortly after I received an email with a join link. Perhaps to speed up everyone joining.

3/2 - Looked at logs

3/5 - Notes main(int32 argc, char *argv[]) {   corpus_t *ch, *ct; char id[4096], *str;
 * 1) if _CORPUS_TEST_

if (argc != 3) E_FATAL("Usage: %s headid-corpusfile tailid-corpusfile\n",               argv[0]);

ch = corpus_load_headid(argv[1], NULL, NULL); ct = corpus_load_tailid(argv[2], NULL, NULL); for { printf("> "); scanf("%s", id);

str = corpus_lookup(ch, id); if (str == NULL) printf("%s Not found in 1\n"); else printf("%s(1): %s\n", id, str);

str = corpus_lookup(ct, id); if (str == NULL) printf("%s Not found in 2\n"); else printf("%s(2): %s\n", id, str); } }
 * 1) endif

Notes: Entry point for the corpus.c file

This is the maiin function for cprpus.c. I think its interesting that while it does use some of the functions in the file, not all are used internally. 2/27 - None right now.
 * Concerns:

3/1 - I think that what we have been doing is missing the mark of some of the tasks Professor Jonas would like us to do. I think this meeting, and being able to "sit down" with Professor Jonas as a group helped clear some of the details of those tasks up.

3/5 - None at the moment

Week Ending March 11, 2018
3/6 - Today we finally had our first code review with professor Jonas. He showed us what he was looking for in our searching. He cared less about commenting each and every detail, and more about how files connected, and what parts are actually important, which makes sense.
 * Task:

3/8- Read Logs and looked at discord

3/10- Read Logs and looked at discord

3/12- Today we were assigned into teams of the remainder of the semester. We need communications setup, and professor Jonas asked each group what drones they wanted and why.

3/6 - I felt that the code review with professor Jonas was very informative. Instead of the guess and check method we were using to see if we were doing it was how professor Jonas wanted, he showed us how he wanted it done. It seems straight forward. "Skim through the more important files to find what we feel are important parts. Find those parts in the cluster that is the sphinx decoder, and one document the connection, the use, and figure out if it is actually useful to know what it does. Professor Jonas wants us to start with sphinx3_decode, and start from the line unlimit. line 155. Then work our way down the file to find important pieces.
 * Results:

3/8 - Read Logs and looked at discord

3/10 - Read Logs and looked at discord

3/12 - So I am a part of the Avengers.

The group includes:
 * Jaden
 * Lamia
 * Tri
 * Steven
 * Daniel R
 * Faruk
 * Isaac
 * Yashna
 * Me

We have setup a discord channel to talk privately, away from the other team, and are in to process of figuring out what drones we want to use.The Software group also had a quick meeting on our group task to get everything sorted out.

3/6 - Pay close attention to the code review Professor Jonas did, because we will be expected to do the same for the rest of the following code reviews
 * Plan:

3/8 - Read Logs and looked at discord

3/10 - Read Logs and looked at discord

3/12 - Talk with the group about our plans, and what drones we want to use. Meet with the Software group to organize what everyone will be doing / starting with
 * Concerns:

3/6 - I wish we could have had that code review session weeks ago. We now know what Professor Jonas Wants from us, and I feel that the previous time spend looking at code, was kind of a waste.

3/8 - None

3/10 - None

3/12 - Avengers is best team

Week Ending March 25, 2018
3/20 - Have second code review with Professor Jonas, where hopefully we can make some headway into this project. We will also be meeting as a team for the first time. We would like to iron out some sort of strategy to improve the score we are getting.
 * Task:

3/23 - Read Logs, Looked at Discord

3/24 - Read Logs, Looked at Discord

3/25 - Look through Sphinx 3 code using 'kb_init' as the starting point. See how far down in the code I can make it today.

3/20 - Professor Jonas would like us to lead the code reviews a bit more, which I can agree with. This meeting was mostly Professor Jonas going through the code while we watched. For this week week I will look at the code starting with kb_init and going from there.
 * Results:

Met with team for the first time in person. We have a few Ideas floating around. We want to make sure everyone can successfully train on unseen data, so Steve wrote up some notes for the team and we are all going to attempt to finish that by next week, so that we can do productive things with our time.

3/25 - Started look through the code. Found the following:


 * kb_init is defined in the file kb.c
 * It returns a type of kb_t
 * kb_t is defined in ../include/kb.h, and is an unnamed struct
 * It stores "Core models, defined as acoustic and language models, dictionary, pronunciation models, front-ends, filler-penalties, approximate acoustic models such as sub vector quantification map and Gaussian selector"
 * kbcore_t (used in kb_t) shows up in kbcore.c
 * cmd_ln_t shows in both kbcore_t and kb_t

3/20 - After update meeting, we are going to go with professor Jonas for our code review. Then meet with the Avengers team for see what we want to move forward with.
 * Plan:

3/25 - Start looking at code at kb_init, Find where that lives, then just start look through and seeing what may be going on. 3/20 - No concerns today
 * Concerns:

3/23 - None

3/24 - None

3/25 - Hope that what I did will be useful during the code review.

Week Ending April 1, 2018
3/27 - Have code review with professor Jonas, Faruk, and Danielle. Meet with team again to talk about how we will be winning this year.
 * Task:

3/30 - Looked at logs, and discord

3/31 - Looked at logs, and discord

4/1 - take a look at code using utt_decode as the starting point. see what can be found with that.

3/27 - Professor Jonas explained a lot of the code and again would like us to lead the reviews for the following week. We ran a bit over in time which made us late for our group meeting, however when we got there they seemed to have some good ideas on getting an edge on the score. Professor Jonas had the idea of meeting as a group before hand so everyone is on the same page.
 * Results:

3/30 -

3/31 -

4/1 - Utt_decode shows 9up in ../lib3decoder/libAPI/utt.c. It has no return value and seems to use a file called mfc to generate vectors for sound.

looks like it uses a function called feat_s2mfc2feat. This shows up in ../spinxbase-0.6.1/src/libsphinxbase/feat/feat.c. It returns an int32. Need to continue looking down this path. Found functions that commonlly start with srch_utt_... and they all show up in ../lib3decoder/libsearch/srch.c. Possibly what to look at next week.

3/27 - Meet with Professor Jonas for the code review, then meet with the team to discuss plans moving forward.
 * Plan:

3/30 - look at logs, and discord

3/31 - look at logs, and discord

4/1 - look at code starting with utt_decode 3/27 - A bit annoying that the work I did for last week was very quickly counted as irrelevant.
 * Concerns:

3/30 - none

3/31 - none

4/1 - Hope this is more important that last week

Week Ending April 8, 2018
4/3 - Have another code review with professor Jonas. Have a team meeting for everyone to give updates and to finalize our plans for winning. 4/6 - Looked at logs
 * Task:

4/7 - Looked at code.. starting with utt_Decode: * Has no return value * Takes an mfc file to generate vectors * sphinxbase-0.6.1/src/libsphinxbase/feat/feat.c * Returns int32 * srch_utt_end * mainly writing log stuff
 * utt_decode is found under ../libs3decoder/libAPI/utt.c
 * total_frame = feat_s2mfc2feat
 * feat_s2mfc2feat
 * ../lib3decoder/libsearch/srch.c

4/8 - Looked at log

4/3 - I feel that we did look at some good information in regards to the code base, however we do seem to get on tangents.
 * Results:

4/7 - Have decent notes this time, hope fully we can stick to them

4/3 - After group updates, Meet with Professor Jonas for code review, Then meet with team to talk about plan.
 * Plan:

4/7 - Looked at code. Didn't find anything super cool
 * Concerns:

4/3 - I feel that it is difficult to figure out what he want to look at. The past two weeks I have gone through and took notes on aspects of the code. Then during the review I will start at the beginning of the notes. Professor Jonas will then find something he wants to take a look at that I did not take notes on, then seem disappointed that I don't know about that section. At the end he then saays that he wants us to lead the code review more, however if we don't stick to what I or the group looked at there is not way to do that effectively.

4/6 - No issues

4/7 - None

4/8 - no issues

Week Ending April 15, 2018
Forgot to log stuff and honestly forgot what I did. I'm just happy to be here.
 * Task:


 * Results:


 * Plan:


 * Concerns:

Week Ending April 22, 2018
4/17 - Code review with Professor Jonas and meet with group to better subdivide tasks.
 * Task:

4/20 - Run 30hr Exp to compare baseline results with other nonsense that I cant talk about.

4/21 - Rerun 30hr Exp because Im an idiot

4/22 - Look into issue that Steve and Dan are having regarding using LDA for running Exps.

4/17 - Got an idea of what I need to work on this week. Talked with Steve about looking into error they are running onto regarding LDA.
 * Results:

4/20 - EXP failed.. Cant find a file at a location, even though the file exists at the location. Need to look into tomorrow.

4/21 - I am an idiot and forgot to change the senone count in the cfg file. After rerunning, Exp completed successfully.

4/22 - Decode.log ==> FATAL_ERROR: bio.c, line 89: missing *END_COMMENT* marker ERROR shows in bio.c (go figure) ERROR is in bcomment_read bio.c [bcomment_read] ==> bio.c [bio_readhdr] ==> lda.c [feat_read_lda] ==> kbcore.c [s3_am_init]
 * bio.c resides in sphinxbase/libbase/util/
 * Last INFO line before error in decode.log is: Reading Feature Space Transfor From: /mnt/main/Exp/0309/052/python/sphinx/lda.py
 * Above info line hides in kbcore.c // s3_am_init(...)
 * No Info or ERROR lines below this show in log, so I need to dig into this function to find the root.
 * Back trace:
 * Error is due to the string: "*end_comment*\n". It needs to be in the file that is passed in. When its not the error is encountered.
 * The file that it needs to in is the file passed in with the "-lda" flag. Currently lda.py is being passed in and I feel that is not what needs to go there.
 * Based on the internet it is looking for a ".lda" file. Ther is one in the Exp, however does not have that string in it so it still fails.
 * Not much documentation I can find on how to get this going. I guess we can keep throwing shit at the wall to see what works.

4/17 - Have code review and talk with team
 * Plan:

4/20 - Run baseline Exp to compare to stuff

4/21 - Rerun aforementioned baseline because i am an idiot

4/22 - Look info Steve's issue, starting with bio.c and kbcore.c 4/17 - None
 * Concerns:

4/20 - None

4/21 - I need to pay attention

4/22 - None

Week Ending April 29, 2018
4/24 - Have code review with the entire software group today, Meet with team to talk about any updates, and information we may have to get a good score.
 * Task:

4/27 - Look into file that contains code that adds the missing line from last week to see if I can see what file it is changing. 4/28 - Read Logs

4/30 - Join team meeting to go over team final report

4/27 - Found that the file is pointing to the file given with the -LDA flag, so back to the drawing board
 * Results:

4/24 -Have code review and meet with team
 * Plan:

4/27 - start with the main_align.c file which is where the code is that adds the line needed. and back trace it to see what file it is changing.

4/28 - Read logs and look at discord

4/29 - connect on discord and talk about final report 4/24 - none
 * Concerns:

4/27 - none 4/28 - none

4/29 - neone

Week Ending May 6, 2018

 * Task:


 * Results:


 * Plan:


 * Concerns: