Speech:Spring 2018 Arias Talari Log


 * Home
 * Semesters
 * Spring 2018
 * Proposal
 * Report
 * Information - General Project Information
 * Experiments - List of speech experiments

Task
2/1 | Read up on CMU Sphinx and familiarize myself with how it works, check out how LDC's Switchboard corpus was created, review prior semester logs, and finalize the group meeting time.

2/2 | Set up Pulse Secure on primary workstation, continue working within a Unix environment, create sub-experiment directory within, and look into dictionaries and how they pertain to the Data Group's responsibilities.

2/3 | Document the Caesar file structure and continue reading logs.

2/5 | Run a train and download FileZilla.

Results
2/1 | CMU Sphinx is comprised of speech recognition development libraries and tools which can be used for commercial applications such as an automated telephone recognizer or for research purposes. The most recent version is Sphinx4 which was written in Java along with PocketSphinx which is a version for embedded systems written in C. I tried calling the CMU Communicator but neither number worked so I checked out the .wav file.

Learned that the LDC's (Linguistic Data Consortium) Switchboard corpus was created by recording roughly 2,400 two-sided phone conversations (approximately 260 speech hours) which included a total of 543 people from all over the US. An automated operator facilitated the process of responding to the caller, dialing other people to take part in conversations, and brought up roughly 70 topics to discuss while recording the resulting speech into two separate channels until the conversation ended. There were constraints created so that no two people would be able to converse together more than once and that no individual spoke more than once per topic.

Still reading more into prior semester's logs (primarily the data groups) and it looks like Wednesday afternoon will be our weekly group meeting time.

2/2 | No issues setting up the VPN on primary workstation. Continuing coursework on Udemy and have created my sub-experiment directory within. The more activity in a Unix environment the better as a lot is coming back to me from a prior class at another school. I see the four dictionary files in  although I'm not sure what I can do with them currently.

2/3 | Documenting the file structure helped visualize the system. Still reading older logs which will become more valuable once I begin running trains and decodes.

2/5 | Met up with Dan Rubin and Rose after class today and ironed out how to run a train. I followed the steps from https://foss.unh.edu/projects/index.php/Speech:Run_Train_Setup_Script to run a train on Obelix and created a new directory for the sub-experiment. I went through and ran a 10hr/train thinking it would take that long (I wanted it finished before class), then recreated the sub-experiment in  and ran 30/hr train. A  error occurred after entering   and was suggested to run it on another server. The same process was followed on Idefix and while I went further there was another error:

The process was again conducted in new sub-directory  which was successful as far as I can tell. FileZilla was utilized to view  within. All phases passed, although  had a   result.

Plan
2/1 | Read more and explore Caesar as well as begin work on a Udemy course to improve my Unix knowledge. Look into how an acoustic model works along with anything else that comes up.

2/2 | Continue reading older Data Group logs and possibly attempt to reach out to a prior student to gain more insight. Spend more time looking into how we can improve the dictionary.

2/3 | Run a train and decode to familiarize myself with the process.

2/5 | Run the decode and prepare for another week of questions.

Concerns
2/1 | There is a good amount to absorb but once gaps are filled and more reading is completed it should fall into place.

2/2 | Hoping to iron out exactly what is expected of the Data Group as my in-class notes from 1/30 left much to be desired. Also hoping to avoid any scheduling conflicts.

2/3 | Nothing outstanding at the moment.

2/5 | None knocking upstairs right now but tomorrow should remedy that.

Task
2/6 | Help Tri and Isaac get up to speed on running a train and troubleshoot any issues that arise. Navigate the Unix environment to find some .wav file locations and open a sample or two. Work on the concerns specific to the Data group brought up by Jonas during class.

2/8 | Meet up with the group to iron out proposal details, look into why .sph files (NIST SPHERE) do not open with VLC, continue with the steps on the Wiki to create the language model and run the decode, and deal with any other issues that might arise. More SPHERE information here.

2/9 | Pinpoint a focus for our proposal rough draft by continuing to read into prior semester data group proposals. Decide on whether to continue existing work or develop our own.

2/11 | Complete the Data group entry for the rough draft proposal and continue reading logs. This will be a light day as I have other work.

Results
2/6 | Mostly all servers were down so we could not run trains before class as planned; this was remedied after class as Tri and Isaac are now caught up so we can complete the next two steps either tomorrow or during our meeting on Thursday.

We navigated to  and used VLC Media Player to open a random .wav file to make sure it played properly in VLC (which it did). This was a basic process but was helpful for everyone to see how to interact with files within the server and how to access them on the local machine.

2/8 | Our group met up and discussed some details of the rough draft proposal and looked at past proposals to ignite the brain. The .sph files ended up running fine as Tri could get it working on his machine using VLC without any additional audio codecs.

A couple issues (and tips formed from them) arose when completing Create the Language Model and Run the Decode (trained data) processes. The first issue was under the Please Note section on the Create the Language Model page which states "The Base Experiment directory is specific to each experiment, and refers to /mnt/main/Exp/". This cannot be done as there would need to be multiple  directories for other students. Instead, I created the directory as  where   is my sub-experiment number.

The Run the Decode (trained data) page (note I ran the decode on trained data, not unseen data) also has the same copypasta: "The Base Experiment directory is specific to each experiment, and refers to /mnt/main/Exp/" so take note when working on this. Under "Setup the Decode Directory and Run the Decode", firstly you can find the  directory within your train directory. When you get to the next step ("We call the subset _decode.fileids"), I chose the third option and edited it so that it pointed to the  directory within my train folder (needed to add an additional path to the one listed in the instructions) and edited the hours to my train hours (which was 30hr). My end result was. Tri, Rose, and myself all had the default senone count of 1000. Next step has you input a command that again should be edited to include the additional sub-experiment folder. What the instructions listed:, and what I used:. Execution of that line will create a  within your   directory, which I recommend opening in a text editor such as Sublime or Notepad++.

The process was continued with the scoring portion where the error  did arise as the instructions suggested. I proceeded and entered the final line which used the SCLite software installed on the server:. This resulted in a  error which could possibly result from a bad pointer (most likely) or a possible memory or permission issue. Another result of the issue was an empty  file which I believe was intended to house hypothesized transcript data; the   that was created as a result of the final line was also empty. Perhaps either fixing the pointer issue or reinstalling SCLite would resolve the issue.

2/9 | Continuing and building upon existing work will likely be the route we'll take and if something new arises then we can handle that as well (or change our focus completely). Spring 2016's Data group discussed an alternate method when scoring using SCLite, which would allow our decodes to provide more accurate results due to a more efficient method of sorting through transcripts and audio files to remove bad and/or duplicate data. We could build upon that and/or research a way to manage the annotations within the transcripts by altering Perl scripts that use regular expressions to automate word deletion.

2/11 | I completed the rough draft proposal entry for our group on Jaden's template, although the Plan outline is likely very tentative until I can receive feedback from the rest of the group on responsibilities. Continuing to learn more from previous semesters.

Plan
2/6 | Read more into acoustic models, language modeling, neural networks (specifically Recurrent Neural Networks technique to improve language modeling), and complete Create the Language Model and Run the Decode steps. Complete the rough draft proposal before the weekend and look into previous groups' proposals to facilitate the brainstorming process.

2/8 | Work with my group to complete the proposal draft before the 2/11 6PM deadline, read more into SCLite software, alignment, scoring, and other concepts important to modeling results. Wrap my head around concepts important to improving the language model data set and whatever else emerges.

2/9 | Flesh out our focus for the proposal preferably by the end of tomorrow and work on executing the goals outlined within.

2/11 | Use another method suggested in Discord to run the decode with success. Once complete, ensure the rest of the team has successfully run a train, created a language model, and run the decode without errors.

Concerns
2/6 | Nothing at the moment, although there will likely be new concerns if someone from our group needs to move to the Experiment group.

2/8 | Make sure the proposal is definitive, define responsibilities and goals, try to determine why the last line on Run the Decode didn't result in success, and look into adjusting things if our group shifts next week.

2/9 | I will continue to reach out to my team on a relatively frequent basis and encourage every team member to keep an open channel of communication with their peers.

2/11 | Decide how responsibilities will be divided over the semester and encourage more consistent communication within teams.

Task
2/13 | Complete (this time with correct instructions) running a train, creating a language model, and running the decode so I can focus on other things. Work with Dan and Jaden to get up to speed on my responsibilities for the Experiment group and how I can better facilitate the execution of team goals.

2/15 | Continue learning Perl and utilize whatever other resources to expedite the learning process to better understand how the Perl scripts function. Post successful experiment results on my experiment page within the Wiki. This will be a light day as I have other work.

2/16 | This week's log entries are likely a little repetitive but it is what it is. Will work more on the proposal and continue to look into the addExp.pl code to see what we can do to improve it.

2/19 | Continue working with Perl, discuss code with Dan, and look more into the CreateExp.pl script, although right now the focus will be addExp.pl.

Results
2/13 | Following the Wiki instructions worked but I needed to remove everything within my  directory and started fresh and executed everything within the   directory rather than running the language model creation process and running the decode within individual directories created per train (ex: ,  , etc). The decode yielded positive results with a successful scoring.log file (trimmed for brevity):

SYSTEM SUMMARY PERCENTAGES by SPEAKER

,-.     |                            hyp.trans                            | |-|     | SPKR    | # Snt # Wrd | Corr    Sub    Del    Ins    Err  S.Err | |-+-+-|     |=================================================================|      | Sum/Avg |   37    717 | 70.3   21.6    8.1    6.0   35.7   89.2 | |=================================================================|     |  Mean   |  1.2   23.9 | 73.7   20.6    5.7   13.6   39.9   86.7 | | S.D.   |  0.4   22.4 | 22.5   19.0    9.7   23.8   26.5   34.6 | | Median |  1.0   16.0 | 74.2   19.5    1.1    3.3   36.0  100.0 | `-'

Some initial goals are to revise the addExp.pl script so that it runs without skipping a line (Line 81), implement new code so the script pulls the Active Directory username of the logged-in user and adds it to the Experiment page on the Wiki (ex: Active Directory Username: ajt2019). Research ways to include auto-incrementation (Line 274) for the next available sub-experiment rather than relying on manual entry.

2/15 | I set up my home workstation with Hyper-V (Windows 10 virtual machine/switch software) to run Ubuntu for education purposes while I work on my Perl course. I might switch to VMware Workstation Player due to the input lag within Hyper-V, which is likely because Microsoft provides poor video support for Linux systems. I also updated my experiment page with results from the successful train/LM/decode process completed earlier this week.

2/16 | Almost finalized our end of the proposal and will iron out whatever details remain tomorrow. Line 274 - 297 in the addExp.pl script houses the portion of code from the function "createWikiSubExp" (Line 234) which is responsible for the manual input of the next available sub-experiment number by the user. Our goal is to replace or revise this portion to allow for the code to pull a list of available sub-experiment numbers under the current experiment number and incrementing to remove the need for manual user input.

2/19 | Perl course is a little slow going as the meat of the course is toward the end, which I will reach soon. Open code discussion with Dan is helpful although we need to speak to Jonas or the Systems group on implementing a test environment to run our modified scripts.

Plan
2/13 | Revise rough draft proposal based on Jonas's in-class suggestions (review Spring '14 and '15 to start) and Camden's ideas on Discord. Add my train/LM/decode instruction process to the Experiment group page, and utilize Udemy to learn Perl so that I can better understand the scripts and how to manipulate them.

2/15 | Put in at least 15 hours learning Perl between now and Sunday and work on the rough draft tomorrow once Jaden does her piece today.

2/16 | Finalize the rough draft proposal v2 tomorrow and continue working on my Perl skills. The Systems group added a sharing/storage directory within  so we can share scripts between group members for review and testing purposes.

2/19 | Meet up with Dan before class tomorrow to discuss the code within the addExp.pl script. Our team will also iron out responsibilities for the week, which is likely to be work on one or more scripts to better understand them so we can begin the improvements; wrap up questions on unclear lines of code.

Concerns
2/13 | Joining the Experiment group means shifting gears and learning new concepts but I expected this and am prepared for it.

2/15 | Nothing at the moment, although I wish I had more time to complete additional supplementary work.

2/16 | Time is the fire in which we burn.

2/19 | Nothing at the moment, although I hope this revision of our proposal is worthy of a grade this time around.

Task
2/20 | Work on the rough draft proposal v3, run a train so I can begin the decode process on unseen test data, and read more into Perl commands such as  and the diamond operator.

2/22 | Complete the decode process on unseen test data and continue my Perl course on Udemy. Today is rough so this will be a light entry.

2/25 | The online Perl course I'm taking is going a little too slow, so I think the best bet is to cut ahead to prioritize information and research official and unofficial Perl documentation and forums.

2/26 | I meant to post this earlier but life happens. Test the addExp script by removing the  variable and try a different input by replacing the diamond operator with standard input.

Results
2/20 | Currently ironing out as much on the Task Timeline section as I can tonight. I will be following the unseen test data decode process outlined by Dan Beitel in one of his sub-experiments. This process differs slightly from the trained test data variant and I am executing this for the purpose of understanding how they differ (in process and result).

There needs to be more reading done on this but so far I gather that the  operator acts on a file to read in the next line. The default file is  (or standard input, the keyboard), so the operator alone will return a line from. The diamond operator  will return a string up to and including the line terminator, which differs from system to system. The  function is used to remove any new line character from the end of a string. I am focusing on these two because I haven't gotten to the point in my Udemy course where we work with those operators/functions and they are important to understand to be able to manipulate the code within the scripts, first and foremost in addExp.pl.

2/22 | Rough draft proposal v3 was completed by the updated deadline last night. I think it came out great so hopefully we get a solid grade for our contributions.

I followed the decode process laid out on the Wiki for Unseen Test Data but used Dan Beitel's instructions to test his notes. Everything ran properly and yielded a result (trimmed for brevity):

,-.     |                            hyp.trans                            | |-|     | SPKR    | # Snt # Wrd | Corr    Sub    Del    Ins    Err  S.Err | |-+-+-|     |=================================================================|      | Sum/Avg |    9    155 | 76.1   16.8    7.1    5.2   29.0   88.9 | |=================================================================|     |  Mean   |  1.3   22.1 | 79.7   14.2    6.1   11.8   32.2   85.7 | | S.D.   |  0.5   17.2 | 17.1   12.0    6.2   24.3   21.4   37.8 | | Median |  1.0   29.0 | 76.2   11.9    6.3    3.1   34.5  100.0 | `-'

Sub-experiment 033 (Train and Language Model)

Sub-experiment 043 (Unseen Test Data Decode) includes steps and full result.

2/25 | I'd love to have had more time today to do this, but now that I have a system set up I should be able to run through 80+ lines a day from here on. I have a running "translation" Word document for the addExp script which I will post as an update.

Note: I am learning and cannot currently claim 100% accuracy:

L40-42:  demands library files to be included. Listed are modules. Double-colon is the separator used between the name of a package, module, or class, and a member of that package, module, or class.


 * is a class implementing a web user agent.  objects can be used to dispatch web requests. In normal use the application creates an   object, and then configures it with values for timeouts, proxies, name, etc. It then creates an instance of   for the request that needs to be performed. This request is then passed to one of the request method the , which dispatches it using the relevant protocol, and returns a   object.


 * is a class encapsulating HTTP style requests, consisting of a request line, some headers, and a content body. Note that the LWP library uses HTTP style requests even for non-HTTP protocols. Instances of this class are usually passed to the request method of an  object.


 * is a class for objects that represent a "cookie jar" -- that is, a database of all the HTTP cookies that a given  object knows about. Cookies are a general mechanism which server side connections can use to both store and retrieve information on the client side of the connection.

L46: Perl automatically provides an array called  that holds all the values from the command line. is the number of passed arguments to the script. If the script is run as just  sans either flag, it will print the quoted and exit the process of executing the rest of the script.

L52:  is the first argument passed to the program. This will set the variable  to either   or   and invoke the   or   function.

L149-150:  and   functions.

L56-57: Using  restricts the scope of the variable (local), making the code more readable and less error-prone. The cookie object is created using the arrow operator  and   and declared by the   local variable. It looks like whatever was/in inside  is now removed with.

L60: HTTP request local variable is created.

L63-65: A new user agent object is assigned to the newly-created user agent  local variable. The  command is used to run a command with a time limit; this will start the command and kill it if still running after the set duration. Duration unit is seconds by default. The  method is a proxy attribute of the   module and is set up when requests should be passed via a proxy server. The right side of the arrow operator  is a method name or a simple scalar variable containing either the method name or a subroutine reference, and the left side must be either an object (a blessed reference) or a class name (that is, a package name).

L68:  object calls the method   with the value of   passed as an argument.

L71-79: Local variables are assigned empty strings for later use.

L81: The diamond operator, when empty and not in a while loop, stores the standard input   (the keyboard by default). This looks like a temp variable created that is never referred to. I’m not sure why this was created and is likely the first item to be removed before testing.

L85-89: You can change the  variable if you need to change the directory, but for our purposes we keep AD. The system prints a string to enter username, concatenated with the value of variable , followed by a /. is then assigned the value of the  standard input (keyboard). is used to remove the new line created after an input record. Newly-created  variable is declared with the value of. This creates the ajt2019 username (or whoever you are).

L91: Prints a string concatenated with the value of  concatenated with a / concatenated with the value of   concatenated with a new line.

L93-98: Prints a string to enter password. System command is used to disable the  so the password field doesn’t reflect your unmasked. Local variable  is created to store the. The  functionality is restored. The password is  and the system then prints a new line.

I also had a chance to use the Unix commands  to help Dan Beitel search for a file that was referred to but missing in a couple of Steve's decodes.

2/26 | Tested by Jaden: the replacement of the  operator with   and removal of the   variable resulted in no Line 81 error. A small victory. I still plan on figuring out why this worked, as the diamond operator is supposed to function as  by default (unless in a while loop).

Plan
2/20 | Complete the unseen test data decode tomorrow after the train is completed, complete whatever remains of the rough draft proposal v3, and continue my course on Perl; I'm encroaching on the meat of the course so hopefully this will help close gaps in understanding how the scripts function.

2/22 | Put in at least five hours into the Perl course tomorrow because the goal is to be finished with the course by next class. A lot of work needs to be done with these scripts so it's imperative to complete the course in order to understand how the scripts function, what needs to be modified, and how it can be accomplished. Also look into running the createExp script from start to finish to analyze its functionality, although right now addExp is the priority.

2/25 | Keep going through the addExp code to understand how everything functions. Once this is completed I'll be able to understand the other scripts much easier, or at least that's what I anticipate. Run the createExp script and document its execution if time permits.

2/26 | Work on implementing the auto-increment function for the sub-experiment creation to avoid manual entry. This will likely be the most arduous task as it requires a lot of research into how the existing script works while understanding how to implement something completely new. It will be a solid step in producing a quality script as no one wants to enter things on their own.

Concerns
2/20 | Communication was very poor today in regard to the rough draft proposal v3 revision. It would be great if all class-related correspondence is sent to all students rather than a select few, although I can see how this is likely yet another exercise in how we work within a team.

2/22 | Nothing outstanding at this time. We'll see what comes up tomorrow.

2/25 | Nothing but time, although learning Perl is coming along pretty well.

2/26 | Figuring out how to begin implementing the auto-increment. I'll likely continue my code "translation" doc to better understand how things function, and then research further online and on Udemy to learn how to implement the change.

Task
2/27 | We all communicate on a daily basis but we figured it'd be nice to meet either in person or online. A date has been set for our virtual meeting where we will collaborate on implementing the auto-increment feature for the sub-experiment creation with. See what Jonas says about our progress and create a new root experiment for script testing purposes.

3/2 | Review the last couple years of Experiment group logs and the  script to see when and why the auto-increment feature was removed.

3/3 | Continue with the Perl Udemy course. This is a light day as I had other assignments and ended up working.

3/4 | Work on the  script and collaborate with Dan Beitel.

Results
2/27 | We'll be meeting Friday night for our online meeting to discuss further  details. Jonas approved of our added feature to prefix the username to the sub-experiment name on the Wiki experiment page. He did request that we move it to the sub-experiment description, so it now shows:

Author: Arias Talari (UserID: ajt2019)

There was confusion on how to properly add an experiment which resulted in a decent amount of wasted time. To reiterate:

addExp.pl -s is used to add a sub-experiment to the Wiki under a root experiment addExp.pl -r is used to add a root experiment to the Wiki and will ask to add a sub-experiment

DO NOT:
 * Cancel the process mid-run or it will only create a partial Wiki entry
 * Use  to create an experiment directory before you run
 * Why?  will auto-increment to the next available experiment number by pulling data from an XML list. If you create a directory before running , and someone else runs   and is assigned the next available directory number (which you hastily created), the party who followed proper procedure will have a Wiki entry but will be unable to create a matching directory within.

Jonas ended up resolving the issue by creating and moving our Wiki entry to  which led me to create the directory we will be using for Perl script testing.

3/2 | If another team member creates a root directory with default permissions, remember to login as root with  and add Group write permissions to the directory with. Now everyone else will be able to create a sub-experiment directory.

I took a look at the script on the Wiki and revised its stated location from   to. Lines 107-146 contain the code where the sub-experiment number is auto-incremented, so now we need to understand what it's doing and why it wasn't kept in. It likely broke something or just wasn't completed because the root experiment option still auto-increments. I went through the past three years of Experiment group logs and could not find anything about when or why the auto-increment feature was removed when  and   merged into.

3/3 | Not too much to say right now aside from Jaden killing it this past week. I also appreciate the document she sent me and Dan explaining what she did to add the  sub-experiment auto-increment feature, which helps for those of us who are not so strong in programming (for now).

3/4 | First off, I do not recommend using the  entry on the Wiki until we complete the script and revise what's there. Documentation has been pretty garbagio with this script so far; for example, the entry on the Wiki states this script can be run anywhere while the code comments state the need to be in the sub-experiment folder (correct). The revisions were done in what I thought was going to be a junk directory so it is not on the Wiki, but I will recreate this same thing sometime tomorrow and create an entry on the Wiki.

The purpose of  is to use one script to either run a train, create the language model, run the decode, or all three in sequence. The current script was not functional and is stated as v1 but the code differs from that on the Wiki. I worked on the  and   subroutines with success and noted:


 * Some variable names were not consistent,  and   (Lines 137 & 146) for example, there were missing semicolons, had either dated or incorrect paths (Line 153), incorrect syntax (Line 137 & 146 are some examples), and they used a module that was never called.

This script will be designed to be run within your sub-experiment directory. Everything else will be automated outside of running, creating your directories, and   into your sub-experiment directory. The lines noted are from the original script as ours is not yet completed, but what was noted above was corrected along with:


 * The module  is a Perl function that returns the absolute path of the current working directory. Its function was assigned to a variable  and was used to read in the current directory throughout the script's processes. Using this fixed an issue in the   subroutine by reading in the proper directory and then creating the Language Model directory within.
 * The script wouldn't compile initially and the primary issue was the  (Line 55) should have been   as the way they had it would not begin to compile with only one argument; the argument vector variable is the number of passed arguments, which begins at -1, so 0 would be the first argument and would begin compiling at that point.
 * Revised the  subroutine by removing the existing standard input and replaced it by adding   within the printed string (Line 92). This keeps it consistent with our design where you will only need to manually   into a directory once.
 * The  command execution was commented out (Line 124) due to the need to   before running a train on a server. I added a string asking   and depending on the user input it would execute the   command for you.
 * This was the first time I ran a train and it fed the results to an output file  rather than on the screen.

For now the  and   subroutines work properly.

Plan
2/27 | Look into the  and   scripts for existing auto-incrementing code. Even though this was suggested I already know we'll need to implement the auto-increment function for  the same way that   is set up. Learning how to set up and get an XML list for each sub-experiment within each experiment will be pretty interesting.

3/2 | Spend most of the day tomorrow working on the Perl course and understand more about GET, POST, and XML so we can get a lead on how to implement this feature.

3/3 | Continue learning Perl and assist Dan Beitel with fixing the  script.

3/4 | Work on the  subroutine, ensure the   flag works as intended, and clean up the code formatting and whatever else comes up. Also, we will document changes within the code comments. Once this script is buttoned up we can begin work on, although we do have plans to implement the ability to work with unseen test data when running.

Concerns
2/27 | Read documentation. If that isn't clear, ask questions. Take your time when trying new things. "Slow is smooth. Smooth is fast."

3/2 | The lack of documentation on the removal of the auto-increment is pretty frustrating, but I can just figure it out myself after spending a lot more time learning Perl.

3/3 | Nothing at the moment.

3/4 | Nothing outstanding. We have yet to tackle the  script so we'll see what awaits us.

Task
3/7 | Begin debugging  subroutine and create a new sub-experiment for a fresh run of ,  , and. Will run on Obelix.

3/10 | Work on Udemy course and read up on issues pertaining to the  script. This is a light day as I have a lot of other work.

3/11 | Work on Udemy course and read more into pulling a PID from an executed command. Another light day but will have time to put in a full work day into this tomorrow.

3/12 | Reassign responsibilities based off Jonas's email earlier today and connect with the other Guardians team members.

Results
3/7 |    and   both worked as intended, however   resulted in a segfault (0308/011):

sh: line 1: 27879 Segmentation fault     (core dumped) sclite -r 011_train.trans -h hyp.trans -i swb >> scoring.log

The decode.log noted an issue:

-hmm /mnt/main/Exp//011/model_parameters/011.cd_cont_1000 \ -lm /mnt/main/Exp//011/LM/tmp.arpa \ -dict /mnt/main/Exp//011/etc/011.dic \ -fdict /mnt/main/Exp//011/etc/011.filler \ -ctl /mnt/main/Exp//011/etc/011_decode.fileids \ -cepdir /mnt/main/Exp//011/feat \

The base experiment directory is omitted from all the paths. This issue was corrected by adding  under the   subroutine. This gets the "short" base directory number (0308 for example) and the sub-experiment number (011 for example) along with the slash separator (example: 0308/011). However, I am still getting the  error so I will make another experiment and run through train and LM manually and then run the   to test if the   subroutine is the culprit or not.

I ran another experiment on Obelix (0308/013) to manually enter the train and language model processes and then use the  to see if any new issues arise and to establish a successful baseline to test for the cause of the   when running   after   and.

did not completely run and is having an issue with reading in  incorrectly by setting its value as   instead of. Lines 177 & 179 (my copy  in  ) change the path to the parent   and then to   but that doesn't seem to be happening given the value of   but there's no error output. I need to look more into what's going on with the  assignment.

I ended up running the decode (this time I took Jonas's advice and allowed the decode to finish) manually until we can debug  but I again received a   error.

3/10 | The most persistent issue is still those same lines I mentioned (first two  lines under the   subroutine). I can't get the path to change from the path set from   to the parent directory. At that point I'll be able to  into. I began using the  and   pragmas as it is highly-suggested to never use global variables in Perl, which is likely the issue with the   path assignment along with the need to continue learning Perl. A Perl pragma is a module which influences some aspect of the compile time or run time behavior. is used to restrict unsafe constructs and  provides many run-time warnings that can indicate some kind of problem or potential problem; it is an improved version of the   flag but is not recommended to be used in production so the end-user isn't impeded by any error message.

3/11 | I didn't resolve the PID grab yet and noticed that when you're executing a command in the background, as we are in  and , the PID you are getting back is that of the shell that's spawning the command and not of the command itself. This realization opened up other issues, or maybe I just don't yet grasp what I need to do in order to accomplish this. Hopefully this becomes more clear tomorrow, along with the local variable issue.

3/12 | Jaden and I will be finishing  while Dan works on. I'll work on the PID grab functionality while Jaden will resolve the directory path issue (this is a tentative plan). Whoever finishes their portion first can either help finish or move onto  to assist Dan. A new Discord server was created and all Guardians members were invited to join.

Plan
3/7 | Have another team member run a full experiment manually to try to get a successful baseline established, I will again do the same. We will likely need to remove the ability to consecutively run each  command by using   if we do not have time to spend on implementing a way to run a command once a prior process has completed. For instance,  is compiled after   is compiled, however the train wouldn't be finished running even if it's a 5hr which likely played a role in causing the error; the user needs some kind of output stating when the train is completed and a prompt to then run. The same issue persists for :   needs to finish processing before running   and finally.

3/10 | Put some solid time into the Udemy course with a focus on  and variable scope to solve the   assignment issue(s). Continue researching how to pull a PID (process ID) after running a command so we can implement a screen prompt once  and   complete processing. This is important in order to have the automation I'm hoping to achieve for this script.

3/11 | I plan on finishing the Udemy course but will skip around a bit to prioritize what I need to learn. I'll iron out any issues I have with implementing local variables, which will take care of the directory path errors, and then focus on adding code to pull the PID which is an imperative in order to add the process completion prompt.

3/12 | I had other things arise which impeded by ability to follow through with my plan from yesterday, so I intend to grind out whatever needs to be done over the rest of the break. Clean up the  Wiki page and finish the scripts, then update the Wiki pages for   and.

Concerns
3/7 | Fix the  script with a focus on the   subroutine. We're hoping to have this finished within a week so we can change our focus to  and any other needs that arise once we're split into two teams.

3/10 | I'm also hoping we can finish  soon so we can begin work on , which as far as I know is also pretty broken.

3/11 | Nothing but time, although the pace picks up once things begin to click.

3/12 | Hoping to finish  and   before the next class.

Task
3/21 | The Guardians team met yesterday on campus and those of us who could not make the physical meeting were listening and speaking over Discord. We discussed initial ideas and worked out scheduling.

3/24 | This entry is a day or so late but the task is to be prepared for the Guardians meeting later tonight. I ensured Skype for Business 2015 was properly set up on my laptop and will mean-mug Dan Beitel later when I test it before the meeting. Read more into automatic speech recognition and understand many new acronyms.

3/25 | Light log, light week of logs. A lot going on so I apologize for what seems like a lackluster log entry week. Work on  script as well as research ways to improve the language model.

3/26 | Read last year's logs and spend some time researching. Vague, yes.

Results
3/21 | We decided on Saturday as a tentative meeting date and outlined some team goals. I'm not sure what we should state here in our public logs as it's supposed to be a competition, so I will omit details for now and include them later if it's approved. Overall we laid out overarching team goals, secondary goals, future goals, and expectations for the next meeting.

3/24 | Reviewed a video on automatic speech recognition (again) from Microsoft Research and read material other team members posted on Discord. I also conducted independent research into how automatic speech recognition dictionaries work along with methods to optimize and improve them. Vague I know but I'll need to speak with the group on how we relay possibly "sensitive" information in our public logs.

3/25 | Playing catch up a little due to an unforeseen life event which has enveloped my attention for the past 1.5 weeks, so while I have been working on capstone I do not have too much to say at the moment. Hopefully a new week and a clear head will help the move forward.

3/26 | Read last year's logs and spent some time researching.

Plan
3/21 | Work on the  script and touch base with the Experiment team on their status now that we've returned from break. I will also use a couple resources suggested in our team Discord server to help fill gaps in understanding the complete train/LM/decode process.

3/24 | Continue researching ways to improve the WER and follow through with team assignments once they arise. I'll hopefully have time to work on  before Tuesday and work more with Perl as I'm sure the team will end up using it soon.

3/25 | Review 2017 team documentation to help stimulate ideas into improving the WER. Work on improving the language model and dig into the  script.

3/26 | Finish  and run tests on trains.

Concerns
3/21 | Nothing at the moment although, again, I'm hoping to complete these scripts so we can focus more on team goals.

3/24 | More of the same but nothing outstanding.

3/25 | Nothing but time.

3/26 | More of the same.

Task
3/29 | Research ways to complete. The primary task is to implement a way to grab the process ID of  and   so I can track the processes, and upon completion print a success statement to the user so they know it has completed and is safe to proceed (for scoring after the decode).

3/31 | Run a baseline experiment on trained data to compare against future experiments. Continue to work on existing bugs within  and brainstorm new features once the existing issues are corrected.

4/1 | Run tests on things we discussed from the Guardians meeting. Research ways to make things better. This is vague and will continue to be so until the end, most likely.

4/2 | Continue working on ways to improve the WER and run numerous __________ with __________. Find out how to enable __________ to improve __________.

Results
3/29 | Judging from multiple sources, I can use the following methods to track the process ID of the above commands and then print a string to  once they are finished processing:


 * Print the message from the background process by using something like:


 * Try using the  Perl function rather than  . If there is only one process involved (which is the case) then   will wait until the command is finished and returns its output to.
 * I'm not really looking for the output but just a string stating the command has finished processing. I need to check to see if the  and   scripts have a string stating successful completion.


 * Try using the or  CPAN modules.


 * The ampersand is actually processed by the shell, not by Perl. Perl executes the string you give it as a shell script and the shell creates the  process, which does not wait for it to terminate (see ,  , and   system calls).


 * It might be helpful for the purpose of this script to remove the ampersand to keep the process running in the foreground. I'll experiment with this.

3/31 | The experiment was successful (0308/031). Ran on Majestix with the default settings:
 * 1000 senone count
 * 8 density
 * No LDA
 * No MLLT

|=================================================================|     | Sum/Avg | 4172  60215 | 73.1   19.1    7.8    7.4   34.4   87.5 | |=================================================================|     |  Mean   |  1.3   19.1 | 76.0   18.3    5.8   15.4   39.4   87.9 | | S.D.   |  0.5   16.5 | 18.1   15.3    7.7   29.1   33.0   30.1 | | Median |  1.0   15.0 | 76.2   16.7    2.4    4.2   33.3  100.0 | `-'

Deliberating on the future of. The group decided to remove the  capability so I will focus more on fixing , which seems to have the bulk of the issues (or does it?). I switched to the  module which corrected the path issue but the decode still results in a segmentation fault. When I run the  portion of   it merely hangs for a second and then continues to the   line and then executes the scoring, which obviously results in a segmentation fault as I didn't even notice the   process even initialize. No idea what the issue is as I ran everything manually last night without any errors - clearly the script is behaving in an unbefitting manner.

I ensured all variable assignments were correct and the code mirrors that of, which seems to run correctly:



For the  line in the   subroutine:



This should behave the same as  but it does not - it seems to skip running   and go straight to the other lines which always results in a segmentation fault.

Other than that issue,, when used with  , seems to redirect all screen output to nohup.out, which is fine and expected when using   and   to run as a background process. However, if I keep the ampersand then  will simply execute the rest of the code rather than wait for the process to complete - this was an existing issue which always caused a segmentation fault error. I tried to find a workaround for  (which writes to nohup.out) and added   to the script to also show the output to the screen but it didn't work. I understand doing this might defeat the purpose of, but I still liked the use of   while retaining the screen output for the user. No resolution yet, but then again the  subroutine is still shitting all over the place and I'm running outta wipes.

It would still be ideal to keep these two scripts as background processes but I couldn't figure out how to do that quickly enough - I would likely need to use something other than  because if I use an   with   it will run it in the background and execute the rest of the code. If that happens then data will be incorrect/incomplete and result in a segmentation fault. I could likely remedy this by using  - I obtain the PID of each spawned command and use   for it. Once it returns it means the process with that PID has finished and I can print the success string. I'll look into this after fixing the  subroutine.

4/1 | We are keeping track of results when experimenting with certain things in a way where the result is expected but possibly not desired, but one we are indeed looking for. I'm a little stuck on  but hope to get it moving soon. I am not familiar enough with Perl (or programming in general, for now) and do not know how to structure the code necessary to achieve what I stated above yesterday - I know what I want to do, and what I think I need to do, but do not yet have the understanding to execute the task.

4/2 | Continuing to receive results based on our team objective - more varied results can be expected soon which will help us with larger experiments. This week will help us achieve more solid results in future experiments, so the expectation is to have more consistent results based on the _______ we change in future experiments.

Plan
3/29 | Test what I researched and ensure everything runs properly. The next step is to fix the directory path issue but I will likely simplify the script for brevity, although that would negate the use of the  ability.

3/31 | I need to fix the issue(s) with  before anything. Once that's completed I can look into allowing the scripts to run in the background again and using  to manage that aspect. Once that's all set I want to implement a progress % for  and   as you see when running   as part of running a train. I would also like to remove the need to reiterate the corpus size in  and.

4/1 | Keep testing things we discussed from the Guardians meeting and from ongoing Discord discussion. Resolve issues with, hopefully with some assistance from Prof. Jonas.

4/2 | Discuss future group goals (likely clean up the Wiki and/or create some instructions) and begin more varied testing within the team aspect. A previous semester had an issue with enabling a certain feature that I don't believe was used so that is one of the focuses.

Concerns
3/29 | Nothing really. I found a few solid resources for Perl that I will continue to use in the future.

3/31 | Responsibilities are likely to ramp up within the team aspect so I need to figure out what I'm going to do with  soon. I also hope to have some kind of result for Tuesday's class.

4/1 | HE HAS RISEN

4/2 | Not much yet, things are progressing.

Task
4/6 | Note: Facet names will be replaced with an identifier from our team result log, which will be uploaded at the end of the semester.

Attempt to run a successful train using the T6 facet. It didn't seem to be used in past semesters so I'm hoping there is a positive result in smaller trains.

4/7 | Test the T12 facet with T13 defaults on Miraculix. It is one of the last facets not yet tested.

4/8 | Test T13 at 4 times the base and again at 8 times the base.

4/9 | Test D1 with a +3 increase, D2 +20, and D3 +40.

Results
4/6 | Tested the T6 facet with default settings. First failure occurred at Module 4 due to the need to enable facet T19 (T20 with default settings). Failed again at Module 3 stating a need to have a specific file linked or copied to. That was resolved with the help of Camden but the error persisted:.

4/7 | Increased T12 to 4 from the default and used T13 defaults. Failure occurred at Module 20 stating an issue with another script:.

4/8 | In progress. Will update once I have the results.

7:34PM check-in: train still running on second experiment. Will not have a result until tomorrow so I will update then.

4/9 | The two experiments from yesterday had some results which seemed erroneous (0309/018, 0309/019) so they will likely be re-run. I am currently decoding and will have the results later tonight.


 * Experiments conducted for the week (and weekend before):

0308	031	5hr	Majestix	03/30/18 0308	035	5hr	Miraculix	04/01/18 0308	036	5hr	Majestix	04/01/18 0308	037	5hr	Miraculix	04/01/18 0308	038	5hr	Idefix	       04/01/18 0308	041	5hr	Miraculix	04/02/18 0308	042	5hr	Majestix	04/02/18 0309	008	5hr	MIraculix	04/06/18 0309	013	5hr	Miraculix	04/07/18 0309	025	5hr	Majestix	04/09/18 0309	026	5hr	Majestix	04/09/18 0309	027	5hr	Majestix	04/09/18 0309	029	5hr	Majestix	04/09/18 0309	018	30hr	Miraculix	04/08/18 0309	019	30hr	Miraculix	04/08/18

Plan
4/6 | Continue testing facets and try to get T6 working. Reach out to other groups to see if they need any assistance with their group work.

4/7 | A new week of testing begins tomorrow which will introduce longer tests with more varied facets. I also hope the amount of testing that occurs ramps up.

4/8 | Continue testing facets with a larger train and record results. This will likely be the plan until the end of capstone.

4/9 | We now have a focus so we will all run experiments based off of specific facets which will hopefully yield a low WER. Time will tell!

Concerns
4/6 | Nothing currently.

4/7 | There are a couple facets that we have not had success with which I feel are important for the results we're looking for. I'm hoping this week can unveil more into how to enable those.

4/8 | Won't know until we start recording results for the new series of tests.

4/9 | Nothing outstanding.

Task
4/10 | Establish a plan to organize the the execution of the impending experiments so it is conducted as efficiently as possible. Research the software/procedures necessary to enable the ability to use LDA.

4/14 | Likely the last few 30hr experiments before we solidify the adjusted facets used for the impending 300hr experiments. Increasing T11 to 4000, T13 to 64, and reducing T23 to 0.001.

4/15 | The 30hr experiments I ran (and one of Rose's) seemed to have an issue somewhere between the language model and decode (Camden). Running another baseline 5hr to see if there's any issue, but I believe the comparison would be best if another team member recreated one of my experiments.

4/16 | Run a baseline 300hr experiment and then run the decode on unseen data. This is one of three 300hr experiments we are using before implementing different configurations.

Results
4/10 | It was helpful to have most of class time to work as a team. We ironed out a plan for moving forward with facets we have been attempting to implement, which will be vital in 300hr experiments and for obtaining the end result we're hoping for.

4/14 | Currently running a 30hr train and will update when I have a result.

Update: Results are being compared with other experiments that output some similar results. The WER is very low but that may or may not be due to something erroneous. Will be investigating.

4/15 | I spoke with Ol' Beardy Beitel and it's likely the seemingly erroneous result is due to the configuration and not any outside factors. We were discussing ratios a few days back in channel so I will need to look up how to adjust facets so the sentence and word count are similar to that of the rest of the 30hr experiments. Not necessary a result but you get what I give!

4/16 | There's nothing to report yet as it'll take well into tomorrow for everything to finish. The train and language model process are the same although different steps need to be used to run a decode on unseen data.

Plan
4/10 | Conduct more focused experiments based off prior successes with lesser trains. The ball be a rollin'.

4/14 | Dan Beitel has been cranking out the LDA experiments, so once he is settled I'll likely run a couple 30hr experiments with priority facets and add LDA.

4/15 | Try to solve this potential issue and then look for other facets to test at 30hr before 300hr experiments begin (impending).

4/16 | Some of us have separate 300hr experiments running at the moment, and once they are completed we will proceed with implementing the successful facets from the 5hr and 30hr experiments.

Concerns
4/10 | Tramuta in lazzi lo spasmo ed il pianto in una smorfia il singhiozzo e 'l dolor, Ah!

4/14 | Nothing outstanding at the moment. The hope is to have LDA running smoothly and have an idea of what we plan to run for 300hr experiments before we pull the trigger.

4/15 | Iron out, button up, and whisk away whatever issues pertain to the 30hr experiments I ran. If you need anymore metaphors you can reach me at ajt2019@wildcats.unh.edu.

4/16 | Nutsink at ze moment, yah. We'll see what occurs when we begin introducing modifications to the configuration.

Task
4/18 | Review what needs to be run for 300hr experiments so we can analyze results to use for our final experiment.

4/20 | Run a 300hr experiment with previously used configurations which yielded a solid WER.

4/22 | Run a 300hr experiment with previously used configurations which yielded a solid WER. Same as my prior entry.

4/23 | Run a 300hr experiment with previously used configurations which yielded a solid WER and ensure my first LDA experiment was a success.

Results
4/18 | Nothing I can really say here at the moment, but we have a few baseline experiments completed and will be moving forward with implementing previously-used facets for more focused experiments.

4/20 | Realized there wasn't a 300hr LDA baseline experiment run yet so that's in process right now before I begin modifying other configuration parameters. Results will be posted sometime late tomorrow or early Sunday.

4/22 | The same as my prior entry. I needed to restart the experiment mid-train last night as I neglected to modify one of the configuration parameters. There was one train still running on Idefix and Dan Beitel needed Majestix to be free today.

4/23 | I dun goofed with some input when manually running the decode for LDA. I corrected that (see Dan Beitel's instructions here) and have rerun the decode as it resulted in a segmentation fault when I tried scoring it. I am also running another 300hr train and increasing T11 to 8000, T13 to 64, and decreasing T23 to 0.001. This configuration yielded a solid WER in 30hr trains so I'm hoping it has potential to do well for 300hr.

Plan
4/18 | Begin implementing configurations used in 5hr and 30hr experiments for our final few 300hr experiments. Thankfully we have LDA functioning mostly thanks to Dan Beitel and Steve.

4/20 | Begin 300hr experiments with other configurations once the LDA baseline is completed.

4/22 | The same as my prior entry. I needed to restart the experiment mid-train last night as I neglected to modify one of the configuration parameters. One train was still running on Idefix and Majestix needed to be open for Beetle.

4/23 | We have a couple experiments still running so once those are completed we will have more of an idea as to what the focus will be for our final run.

Concerns
4/18 | Nothing outstanding at the moment.

4/20 | Nothing yet. ∩　　　　　　　＼＼　　　　　　　／　 ） ⊂＼＿／￣￣￣　 ／　＼＿／ ° ͜ʖ ° （ ）　　 　／⌒＼　　／　 ＿＿＿／⌒＼⊃　（　 ／　　＼＼     U

4/22 | The timely execution of tasks so we may achieve a better result than the other team within the remaining time allotted. Nothing serious though.

4/23 | We're setting a good pace experiment-wise and have a schedule to complete the final report, so nothing outstanding at the moment.

Task
4/27 | Check in.

4/28 | Work on final report.

4/29 | Finalize content I have to add for the final report summary.

4/30 | Finish the 300hr and 30hr unseen decodes from yesterday.

Results
4/27 | Just checked in.

4/28 | Worked on final report.

4/29 | Finalized what I had to add to the final report summary.

4/30 | Both 300hr and 30hr unseen decode experiments were scored. The 300hr result looks promising!

Plan
4/27 | Work on the final report summary tomorrow.

4/28 | Continue working on final report.

4/29 | See what needs to be done for polishing purposes, last minute experiments, other miscellaneous efforts, etc.

4/30 | Wait on final experiment results and work on the class final report.

Concerns
4/27 | Just a pop in today but am digging in over the weekend (ง ͠° ͟ل͜ ͡°)ง

4/28 | None.

4/29 | Eager to see our impending end result.

4/30 | Nothing outstanding - just waiting on final experiment results.

Task
5/4 | Add content to the final report and revise a couple Wiki entries.

Results
5/4 | Worked on the final report and revised the  and   Wiki entries.

Plan
5/4 | Continue working on the final report (if necessary) and see if anything else on the Wiki needs revision.

Concerns
5/4 | Nothing at the moment.