Difference between revisions of "Speech:Spring 2015 CharaNo1"

From Openitware
Jump to: navigation, search
(Team Logs)
 
Line 12: Line 12:
 
*[Bruins]
 
*[Bruins]
 
*[[Speech:Spring_2015_PatsXLIX| Patriots]]
 
*[[Speech:Spring_2015_PatsXLIX| Patriots]]
 +
 +
 +
==== Report ====
 +
 +
*[https://foss.unh.edu/projects/images/b/bb/Results_Report_Draft_Final.pdf Final Bruins Results]]
  
  

Latest revision as of 01:35, 6 May 2019


Team Logs


Report


This page is of the Winning team from 2015 Spring Semester

Team Member Logs

Competition E-mail Stream

3-11-2015

Russ

  Go bruins! If your getting this then you are part of the bruins team in the capstone class 

Sam

  Here is our teams secret group page: http://foss.unh.edu/projects/index.php/Speech:Spring_2015_CharaNo1 
  Don't tell the other team about it!

Stephen

  I have gone ahead and set up a new master experiment in the Wiki, so that our team members may run experiments. Our team experiment number is 0269, and can be found on the experiments page (  
  http://foss.unh.edu/projects/index.php/Speech:Exps#List_of_Speech_Experiments ), as well as in mnt/main/Exp on Caesar. I have set up a child experiment numbered 001, and was thinking that each team member  
  can set one up in similar fashion to run individual trains. Is there anything else we should do to prepare for the time being? The sooner we get started and have everyone up to speed, the better. 

3-12-2015

Sam

  Jonas just sent me some information and I am told to share it with you. First off, he does not want our team to have one parent experiment directory where we all have sub directories. He wants us to create 
  new experiments and child experiments that are testing specific tuning values. Also, he does not want us to waste anytime teaching people how to train and decode. If you do not currently know how to train  
  and decode then go follow the tutorial and ask me about any questions you have. 

3-25-2015

Sam

  I was doing some research and I do not think you should use 128 as a density value. 
  On http://cmusphinx.sourceforge.net/wiki/tutorialam the only acceptable densities listed are 
  2,4,8,16,32, and 64.  Its either not a supported value or the benefits of using it are not worth the time it takes. 

Ben

  Hi Sam, 
  Thanks for the heads-up. I do see what you mean about it not being an explicitly-noted value. In light of that, I think our time might be better spent on the dictionary failure's effect on word error.
  rate. We should at least stick to 64, in light of their improved results last year using 64 over 32.
  Have you gotten a result (as in, number of transcript -> dictionary "not-present" warnings) there yet for your 250h train, or is it still running?
  I'll let you know when we have something for the 125h train and we can compare. Then we'll know if it's worth pursuing any further.
  Note, Partners are, according to my notes:
  125h                256h
  Ben                 Stephen
  Mohammed      Russ
  Chris               Zach
  Morgan            Kayla
  Adam              Kenneth

3-29-2105

Stephen

  Just wanted to note that I am having an issue running the RunAll script today. I set everything up for the new experiment and adjusted the density value, but I get the following error after the initial    
  warnings:
  Something failed: (/mnt/main/Exp/0274/001/scripts_pl/00.verify/verify_all.pl)
  Not sure what this may be related to.  Experiment in question is 0274/001.
  Ben, when you set up yours as well, just make a directory in there for the 125hr you will be running called 002.  You can add the details to what I already have on the Wiki experiment and group page. 

Zach

   Here's the full output from Stephen's experiment: http://puu.sh/gUGLt/16ddf6bd09.txt
   I'll snoop around and see if I can figure anything out. I don't think the warnings immediately before the error were there when I ran mine, but I don't think that would throw a fatal error.
   Semi-related: I also had an error in Phase 7 last night, but mine ended up being due to my phonelist being empty. I copied the phonelist from another experiment and it seems to be running fine, but   
   this may indicate a few issues with the setup process on 256hr trains. It'd be nice to have the rest of the 256hr group chime in on this with any results, positive or negative.

Kayla

  Hey guys,
  I am part of the 256 hr group and I will probably get to running that train tomorrow night. I have a weird question, it's not super important, but when running a sub-experiment I tried to run the 
  createWiki_Sub_Experiment.pl script and it said 'Permission denied'. Just curious if anyone knows why that is happening or if I am missing something? 



3-30-2015

Zach

  Russ, the previous semester used 64 density and 8000 senone for their best result, so we should be doing that as well.
  Kayla, I just tried running createWiki_Sub_Experiment.pl and got the same error, though it looks like the permissions are configured correctly. No idea what's going on there

Morgan

  I just ran the script a few minutes ago and it seemed to work fine. I'll look into the error further tomorrow when I try running a train. Any other information you could provide about the error, such as   
  where you were running it from (which server, directory of the script), where the error occurred, or some of the information you tried putting in, would be awesome. If anyone runs into anymore trouble 
  with either of the wiki scripts be sure to let me know. I wrote them so if there are any bugs I would love to know and I'd be happy to help fix them.

Kenneth

  Email Melissa on the systems team. I had permissions issue creating dirs last week. There was apparently some issues with the server switch over.  She fixed the problem
  In addition I've had script issues running a decode since last week. Im on my 5th attempt without any luck.

Sam

  Hey guys, 
  So for some reason there has been a whole list of issues this week.  Some perl scripts do not run, others cannot even be found when they clearly exist and some trains are dying half way through.  The only 
  idea for what could be causing these problems are something happened during the switch over to new Caesar or the other group changed something.  I know my train failed somewhere along the lines of 12   
  hours in so I am not very happy at this point.  I can give you suggestions on how to continue past any step you are stuck on with some hackish methods if you email outside of this group email.

Ben

  Hi Zach,  
  Not sure if you're aware of the recent changes on the UNH.edu domain network, but things work a bit differently now.
  You should be able to get access to Caesar over the web VPN: 
  1) Log in to: https://t.co/Pj4YKBY7XR
  2) SSH to Caesar
  That should work and fix whatever problem you're having by passing all your traffic to UNH over an SSLVPN tunnel.
  Now, I'm also encountering some other issues, with web VPN traffic in the browser not quite working right, so if you experience that issue as well, there are some additional steps.

Russ

  I have had luck with ssh'ing into cisunix.unh.edu... Your password for that is your current blackboard password... Then ssh into caesar.unh.edu and use your caesar password and you should be good 

3-31-2015

Kayla

  Yeah there is some information about it on our secret Bruins page, where they talk about senones. The more senones a model has, the more precisely it discriminates the sounds.
  In our meeting Sam also said the following for the values:
  Density: 16, 32, 64
  Senone: 3,500 - 20,000 ; but typically in the range of 8,000 - 10,000
  As your data increases, your Senone increases. To be honest though, I do not remember what the density refers to. I have tried researching this as well and I wish there was some document that defined 
  these clearly.

4-1-2015

Sam

  Choosing Senone Values
  The more senones model has, the more precisely it discriminates the sounds. But on the other hand if you have too many senones, model will not be generic enough to recognize unseen speech. That means that 
  the WER will be higher on unseen data. That's why it is important to not overtrain the models. In case there are too many unseen senones, the warnings will be generated in the norm log on stage 50 below
  ERROR: "gauden.c", line 1700: Variance (mgau= 948, feat= 0, density=3, 
  component=38) is less then 0. Most probably the number of senones is too
  high for such a small training database. Use smaller $CFG_N_TIED_STATES.
  Training using LDA/MLLT
  http://cmusphinx.sourceforge.net/wiki/ldamllt
  Can possibly improve WER by 25%
  Suggested Settings: 
  $CFG_LDA_MLLT = 'yes';
  $CFG_LDA_DIMENSION = 32; 

4-3-2015

Morgan

  Kayla and I ran 5 hr trains earlier this week with the default settings noted in the wiki. Is there a guide or wiki page about changing the settings for trains and what each of the settings do? 

Sam

  Good job Russ on getting the baseline for the 256 hour train.  It is 61.5%.  Now, that's not very good, but we have something to base our results off of.  It should also be noted that your decode is still  
  running since it takes 256 hours to run (Russ kill the decode process to free up server space, and look in the decode.log there are some errors).  Jonas suggested to me that we use subsets of about 5 hours 
  to decode on.  So we could use the first_5hr to decode the 256hr, or we could grep about 5 hours out of xxx_train.trans and save it to decode_train.trans and use that during our decode.  What would you 
  guys prefer to do? 
  Also, I have not looked at the 125hr_3170 train, but Dakota tells me it is still only 5 hours of data. So I suggest no one runs a 125hr train and we all concentrate on 256hr. If you have some ideas you  
  want to try out and you do not want to waste the time on 256hr then do it on 5hr. If the results improve 5hr lower than the baseline of 41% then apply it to the 256hr. It should also be noted that changing 
  one parameter at a time will not lower the baseline a significant amount. Keep stacking your changes up as you run more and more trains and share results with everyone so we can all possibly apply your 
  changes to our own trains.

Russ

  I just killed my decode process, so that's set. I personally don't understand how using the first_5hr will help with the 256? do we run different configurations on the first_5hr, then if we find a  
  successful set of parameters that lower the baseline, we do it to 256 and see if it changes that as well?

Kenneth

  This a little off subject but has anyone thought of archiving these e-mails on the wiki?
  For recording keeping purposes this email chain is far more useful for future semesters than the blogs or experiment dirs. I'd rather read the emails because they relay the groups process and train of  
  thought better than the individual ramblings posted on the wiki currently.
  If were successful in lowering the results future students will be able to understand our process and mimic it as opposed to reading one experiment log and attempting to correlate that to rhe entire groups 
  process.

Stephen

  Looks like a good idea to just scrap the 125hr for now and focus on the 256. Good idea about maintaining an archive of our useful correspondence. 

Russ

  So, since we are focusing in on the 256, and using 5hr to test our ideas, should we continue with the partner strategy?

Kenneth

  I still like the buddy system because it promotes collaboration and makes you accountable on a consistent basis. The buddy system had more to do with bouncing ideas off of one person. Instead of the  
  entirety of the group.
  Two can work on a five hour train with one running and decoding it. The second person can run the 256 and decode it.
  We can scrap it but I think its already been beneficial and will continue to be.

4-5-2015

Morgan

  I looked at some of our group's logs and emails though. 64 density appears to be what a lot of people have used and from what I understand moving up to 128 density is the next step. So I think we should  
  try to run trains with 128 density and 8,000 senone until we figure out more on our own. I also did some research on Caesar; sphinx_train.cfg appears to be where some train settings are located. In our 
  case we would change $CFG_FINAL_NUM_DENSITIES to 128 and $CFG_N_TIED_STATES to 8000 in the sphinx_train.cfg to give our train a density of 128 and a senone value of 8000. I'm still new to trains so I don't 
  know if these settings will cause the train to fail or not, but its a start. 

4-6-2015

Sam

  There's no tutorial explain this mostly because many of the parameters and their functions are unknown. I can give you an overview of where to find the files to modify and some of the parameters I know. I  
  will try to get back to you early tomorrow with how to do this. 
  The file you need to tune parameters is sphinx_train.cfg. This will be located in your experiments /etc folder after you run prepareExperiment.pl. Once you are in /etc do "nano sphinx_train.cfg" this will  
  allow you to edit the file. The senone parameter is called $CFG_N_TIED_STATES which currently defaults to 1000, and density is called $CFG_FINAL_NUM_DENSITIES. Be sure your setting the density value in 
  the "elsif ($CFG_HMM_TYPE eq '.cont.')" part of the conditional statement. Another variable that should be changed is $CFG_CONVERGENCE_RATIO. I recommend that gets set to 0.004. Everything else is up to 
  you guys to research and figure out. 
  Another file you can nano into is feat.params. There is a lot of parameters in there so research them. upperf may be a good one to start with.
  Let me know if this wasn't clear or if you need more assistance.

Sam

  This is the email Jonas just sent me and Zach. People are still using the whole 256 hours to decode so you can no longer do that. 
  First I created a /mnt/main/scripts/user/History/run_decode and stuck all the other versions (1-5) into that. The current version (not named run_decode6.pl but simply called run_decode.pl) has a singular  
  change. It now looks for <task>_decode.fileids instead of <task>_train.fileids. The latter existed from training but the former does not exist and needs to be first created. I also updated the "Run a D
  Decode" wiki page to reflect that change. Right now if you simply take, say, the first 1000's audio file id's from <task>_train.fileids to create <task>_decode.fileids. then you are left with about an hour 
  of decode. The command for this would simply be:
  head -1000 001_train.fileids > 001_decode.fileids
  But run_decode.pl will no longer work if it doesn't see <task>_decode.fileids.
  Each group should probably come up with
  a bigger set to decode on (between 2-5 hours) last year they did 5 hours I believe.
  a better way then just taking the first n utterances
  perhaps some random sample
  this is what the test/ directory in the corpus should eventually be about but for now, not critical as long as people stop using <task>_train.fileids to decode on all 250,000+ wave files as that could take 
  weeks.

Sam

  For now we can just stick with the first n utterance. A way to get around hours would be going to the first_5hr/train/audio/utt directory and cont the number of files.. I just did a ls -l | wc -l on that 
  directory and for 4660 files. So you could then do head -4660 001_train.fileids > 001_decode.fileids to get 5 hours to decode on. You could also drop that as low as 2000 or so if you wanted to decode on 2  
  hour.

4-10-2015

Zach

  The error rate on my train is about 43.5% after 7,000 files decoded.
  Do we have to have a fully decoded set at the end? Because at this rate of 1,000 files per ~7 hours, that's around 67 days of straight decoding.

Russ

  Quick question for everyone... has anyone tried the drone machines for trains so far? I am on miraculix, and it literally took about 3.5 hours to do a gentrans. like, Ihaven't even run the train yet... i  
  am also the only person on the drone, and my processes are the only ones running... should i stay on miraculix as per jonas' request, or say fuck it and do it on caesar? 

Sam

  I don't use the drones, but I know if we flood Caesar the processes will be just as slow. Use Caesar if you want, but check how many processes are running on it before you do. If there are too many running 
  then I would use a drone. genTrans should only take like an hour tops I would say.

Zach

  I actually just started a 1,000 file decode on Traubadix to test exactly that. I'll see how that turns out.
  My plan at the moment is to run a few different 5hr decodes at once with different language weights decoding on the same train and I'll send out an update as soon as I have some results from that.
  I think the first 5k is an okay sample size to at least see what direction it's trending in.
  The way I'll probably start doing it is to run 2k in the morning so I'll have some results to look at before I go to bed that night, and then run another 1k overnight to view in the morning. Then, repeat 
  until I have 5k total.

Stephen

  Sounds like a good strategy. A question here... would there be any major benefits if we were to have a dedicated powerful machine completely at our disposal, strictly for experimental purposes? I was  
  thinking about it, and I have like $100 in credits with a virtual server provider that I use for my side projects. Theoretically, we could spin up a 8 core 16GB RAM server for a week or two if we were to 
  gain a significant advantage from doing so. That said, we would also have to load and configure sphinx and our required experiment files onto the system, so it might be more pain than it is worth. Let me 
  know. 

Kenneth

  We we're going to set up a machine on a usb drive to test software upgrades but stopped after a week. I think I had planned on spending two weeks to do so because from reading past tools group logs 
  previous semesters half awful experiences trying to do the same idea. This was planned to be a from scratch install. Jonas suggested it the first week of class then told us not to do it because it would 
  take to long.

Kayla

  I would agree with Ken. Not to mention whatever OS is on that server we would need to find out if it is compatible with all the software tools, because if it isn't, then we would have a problem and would 
  also need to try to implement a wrapper 

4-10-2015

Mohamed

  For me, this depends on many factors. What platform are you using to spin up the VM? AWS, Digital Ocean, Heroku or Broad River. Is it easy enough to configure out of the box redhat machine?
  If not. Are we OK with using other Linux distributions? Ubuntu for example: Spinning Ubuntu machine in AWS is really easy. Copying the data to this machine will take time, but we don't need all of the  
  data. My only concern is that we use cisunix as our gateway which is really going to slow us down. However, if this really something we need I can talk to Jonas to give us another port and see how this 
  works.
  Tools Group, if you think any of the tools we use on Caesar now, will be hard to install in a new server please let us know.

4-10-2015

Stephen

  Platform is Digital Ocean.... the following linux distros are available:
  Ubuntu, CentOS, Debian, Fedora, CoreOS, FreeBSD
  I don't believe they support RedHat at the moment.
  As far as configuration goes, when you spin up a "droplet" the machine
  is just ready to go, and fully configured.  The main thing would be
  getting Sphinx and whatever other tools and files we may need up and
  running.  If it is feasible, I am glad to use my resources to help
  give us an edge here so just let me know.  If using AWS is easier and
  someone has available credit on there, that sounds good as well.
  Regarding the decode of my current 256hr train - Zach or Sam if you
  get word from Jonas on his new test set going live please let us know.
  I will just wait for that data to become available, as that is what he
  is ultimately going to be judging our results on.

4-12-2015

Kayla

  As Ken had mentioned, last Spring the Tools group worked with installations of software tools on a VM and it took up a good chunk of time trying to get everything working and configured properly. If the  
  general decision is that you guys want to try to use the server that Stephen had in mind, then we can definitely give it a try. But I am just saying that it might not be smooth sailing and if we run into 
  any issues or bumps along the way, then that could be pretty time consuming and take time away from running trains and such. If we for some reason have trouble fixing any potential software issues we may 
  run into, and then the semester ends, I would hate to have people relying on this potential new server and then have it be too late. What does everyone think? 

4-13-2015

Kayla

  So I attempted to run a 125 hour train last night, and I got the following response in the first step in training:
  sed: can't read etc/002_train.trans: No such file or directory.
  Processing 0 words against dictionary...
  Added 0 files to add.txt
  There was also another error message about there being a missing .wav file, but I can't remember what the exact error said. This isn't normal, right? I didn't finish the process because I wasn't sure if it 
  would give me an accurate result after reading that.

Morgan

  I was trying to run some 5hr trains to test some settings and I keep receiving errors I have never seen before. The errors always occur during the decode step. Jonas didn't change the process for 5hrs did  
  he? Can anyone else run 5hr trains? The information I got from decode.log

Sam

  Give me the experiment number your working on and I will check when I'm home from work. 

Stephen

  Just wanted to give an update on my 256hr train results.  Final
  baseline with the parameters that I changed yielded a 49.3% result.
  Sam has advised me to run another decode with a different mixture
  weight,  as it produced a dramatic effect on Zach's experiment.
  Will keep you posted on new findings as they come in! 

4-13-2015

Sam

  While I was looking into the problem you were having with 002_train.trans not existing when you run prepareExperiment.pl I have discovered that some things have changed on the server.  The   
  directory /mnt/main/scripts/user/scripts_pl is no longer there.  This is problematic since we use scripts that live in that directory.  I found this directory was moved 
  to /mnt/main/scripts/train/scripts_pl.  I assume that the prepareExperiment.pl script has not been changed to account for this being moved.  I will try to fix this later when I have time, but for now it 
  seems like trains are not going to work again. If you have run trains in the last day and didn't pay attention to prepareExperiment running then odds are your decode will fail. Hang tight until this is 
  verified and fixed.

Sam

  My decode is not finished, but I wanted to see what the current WER was. Im sitting right around 40% right now. This will most likely go up, but hopefully not by that much. 
  Update: my decode has been running for 19 hours and to my surprise my WER has actually decreased since the first mid-decode score I ran on it. My current WER is 39.5%. Sub 40 woo. The decode has a few more 
  hours to go so it is possible that this will get slightly lower. 

4-15-2015

Kenneth

  We discussed this in our group meeting today but because a large portion were missing I'm sending out this email to inform and reach out.
  As a group we've been pretty successful in performance scores but those top scores are coming from a few of the members. I myself feel like I'm spinning my wheels running useless trains that will not 
  benefit the group because I would not be able to replicate such low scores. At the same time I want to contribute to the group in a useful way.
  So I've suggested we assign tasks to group members who choose to to take alternate work. You can still run trains but you'll be able to provide effort that will be useful. If you choose to help out with 
  these jobs the entire group will have something to grade you on at the end of the semester. If you choose not to take on any alternate jobs and you aren't communicating with the group in this email or  
  otherwise we will have nothing to grade you on.
  A few examples of work would be to: Create a log of our decode scores for all the trains the group runs during the week. This would be used in our final document for the team challenge. Another option is 
  to create the document in which we turn in our final scores for the challenges.
  Two students in class already choose to take on these assignments. Kayla will be creating a document with instructions on how to edit the train files. The document will highlight what configuration 
  parameters can be changed and the range in which they can be modified. Morgan has taken on the cataloging, organizing and publishing on the wiki our chain email. If you are interested in one of the other 
  tasks email me and I'll let you know what is available. If you feel like your successfully running trains and providing competitive results ignore this email.

4-16-2015

Sam

  Just went through all the Patriots decode scores and there best is around 41-42%. The majority of their trains are 5 hours with the base result of 43% so that tells us most of them do not understanding 
  tuning very well. There best 5 hour train was 19% and ours is sub 15%. We are looking good so lets keep trucking along. We should continuously monitor their work as we get close to the end of this 
  competition.

Kenneth

  A few tasks have been handed out in regards to the last email. Ben is going to compile a document of successful, unsuccessful, and indifferent parameters changes to the experiments. It should also list  
  configurations that have become standard to every experiment we run. This is useful in two ways. Jonas said the competition would not be based merely on numbers so when we turn in our results it would be 
  good to have a documented trial and error strategy as opposed to just a result. Ben is also possibly going to document the errors and issues we've had along with any fixes found.
  Chris is going to create a spreadsheet of our results by week for all the experiments the group initiates. It should list experiment folders, outcomes, the server it was run on and a few other things. This 
  useful for our results as we can show we were consistent in both results and weekly improvements. We'll be able to show that our best result wasn't just a fluke but came from consistent improvements.
  Morgan is going to organize the chain email for posterity. This will probably be the most useful tool for future semesters along with definitive proof we worked as a group to win the competition. Kayla is 
  going to be creating a how to and a what not to do for editing the experiments. I am going to compile all this information into our semesters end results.
  If you, like myself are still having trouble lowering the baseline you should contact Ben and Kayla later in the week as their work should help you improve your own experiments.
  And for last it could be helpful if someone who's not busy running trains and is looking for additional tasks could run decodes on the opposing teams experiments. I would focus on the usual suspects 
 (Dakota) or anyone else you feel could threaten our scores. It would be good to know what exact number we need to beat as opposed to gossip we're getting through the grape vine.

Sam

  decided to run a quick command to see the logins from the last 7 days. Below is this list of all user logins ids, where they connected from, the date they signed in and off, and duration of the session.
  If you're interested in the command I used it's last | while read line; do date=`date -d "$(echo $line | awk '{ print $5" "$6" "$7 }')" +%s`; [[ $date -ge `date -d "Apr 09 00:00" +%s` && $date -le `date 
  -d "Apr 16 00:00" +%s` ]] && echo $line; done
  You can change the dates to the range you want to collect this information for.

4-17-2015

Chris

  I am going to be working on the results spreadsheet this weekend.
  I will be looking at the Wiki experiments logs and experiment directories to gather information, but it would helpful if you could send me some information on the trains you ran as well.
  Experiments Info:

Directory / Author Size tested (125hr, 256hr, etc Date tested Results or if experiment failed Server (Caesar, Obelix, etc)

Russ

  0276 is the dir i work in, 
  001, 003, and 004 are mine
  I believe 001 is a 5hr, and the other two are 256 hr.
  I dont know when i tested them, but an ls -all will tell you that
  001 and 003 are done, i am still currently decoding and working with 004
  all three of them are on caesar.

Stephen

  I utilize 0274/002 for my current experiment, which is a 256hr. Decode is being run a second time on the test set Jonas gave us in hope of an improved result (previous 49.3%). I have been using miraculix   
  for my decoding jobs. 


4-18-2015

Kayla

  I have run:
  5hr train run on 4/6-4/7, directory 0265/007, result was 19.6% and it was tested on Caesar
  125 hour train run on 4/12, in directory 0270/002, tested on Caesar but did not finish because of errors
  I am also working on a 125 hour that I started the other night, but I need to finish up the decode so I can update you when that is finished also!

4-19-2015

Stephen

  Update on 256hr train second decode with language weight increased to 20... exact same result as previous decode.  I was anticipating some sort of effect here, but it still came out as exactly 49.3%.  Just 
  to be sure I deleted the related files and tried the process again with the same score.  Not sure why it had no impact here..  just to be clear, exact setting was:
  $DEC_CFG_LANGUAGEWEIGHT = "20";
  Regarding class this Wednesday... it appears that falls during the same time slot as the URC? Are we actually having an official class or has anything heard what the plan is?
  Zach or Sam, when you have a chance please let me know what you might want me to test for the next train based on what you have so far, thanks! 

Zach

  Stephen, I was having the same issue in my train. I actually deleted my decode config file completely and the decode still seemed to run fine. This leads me to believe it may somehow still be referencing  
  the original decode settings. Has anyone had success running a second decode on a train with different settings? 

4-20-2015

Sam

  I sent this to Chris the other day regarding how to identify the length of a train. 
  If you do an "ls feat | wc -l" in an experiment directory it will tell you how many utterance files are in the experiment.  Use the numbers below to decide what length experiment it is.
  256hr - 250330 files 
  125hr - 139984 files
  5hr - less than 125hr
  I didn't look how many files were in the 5hr, but if its less that the 125hr then it's safe to assume that it is 5 hours.

4-21-2015

Zach

  So, Sam just confirmed that you cannot decode a single train multiple times with different settings. I'm going to run some tests and work toward figuring out why that is, but it looks like the results we   
  have now are likely the last results we're going to get. We're still beating the other team in WER, so I guess that's a good thing. 

4-22-2015

Chris

  This is what I have for the results log so far. Let me know if you find errors or any experiments are missing. 

Kenneth

  That's great Chris. It was a good idea to separate the failed experiments. We can now show our overall performance as opposed to our best single result. 

Sam

  Did some spying on the other team. I have found that they are definetly trying to attack the language model. The only change I see currently to their lm_create.pl file is this line "system(  
  $folder."wfreq2vocab <tmp.wfreq> tmp.vocab -records 1000000" );" They added "-records 1000000". I do not actually know what this does as of now, but I thought it was worth noting This was found in Garretts 
  experiment 0275/002. He is currently running the decode for it so I will keep monitoring it.

Zach

  Spoke to Jonas and he confirmed that they're messing around with the LM and dictionary.
  Garrett's decode configuration still has the default language weight of 10, which means they probably don't realize how valuable that is. A higher language weight becomes more effective as the Language 
  Model becomes more reliable, which means any changes they make to improve their Language Model will be amplified if we do the same. I unfortunately haven't been able to calibrate the language weight, but 
  we're pretty sure 25 produces a significantly better result than 10 in the current system. If anything, being on the right side of the "valley" produces far better results than being on the left side so 
  this new information should only widen the gap and should do so pretty significantly.
  If anyone has free time tonight, they should do some research on how language models are created and structured and hopefully develop some context to the changes the other team made. I heard them talking 
  about the dictionary as well, so it may be worth spending a bit of time looking at that, but their plans with the dictionary sounded much less concrete.

Stephen

  Since they are deleting their results once they harvest them, I wonder if we might be able to write a script or something that monitors ongoing decode jobs and when they finish, copy the pertinent files to 
  a private directory so that we may review the results before they get trashed. 

Sam

  Stephen and everyone else, 
  That will be awfully tricky. However, it would be pretty easy to tell if they are deleting their decode.log files with fake ones. If you do a "ls -lu decode.log" it will tell you the last time the file was 
  accessed. So as long as you do not go into that file or do anything to it other than that command it should shoot out the last time run_decode.pl wrote to it. If you do the same command on run_decode.pl 
  the times should be pretty similar. This can also be done when looking at the scoring.log and hip.trans files. These are created generally in the same minute or so, so you can check if the times match up. 
  Another trick that can be used it to compare the corpus size they used (ls -l feat | wc -l) with their scoring.log. If the scoring.log is extremely short then and the corpus is 125-256hr then odds are they 
  replaced it with a shorter trains scoring.log.
  If you want to see if they have been modifying there sphinx_decode.cfg, sphinx_train.cfg, feat.params, or any LM files just do an "ls - lu <filename>" to see when it was last modified. This way you can 
  stay up to date on their most recent changes.
  That is just some hacky shit I've been doing among other "ls" tricks to investigate what they have been doing. Anyone else have any spying methods?

4-23-2015

Zach

  There was an issue identified earlier in the semester that had something to do with a conflict between the words contained between some combination of the transcript/dictionary/language model. Does anyone  
  remember exactly what the issue was, the magnitude of it, and whether or not it ever got resolved?
  The way Forrest was talking made it sound like the other group is editing something outside of the system to be inserted immediately before they run their last train. I know for a fact they're working with 
  the dictionary and language model, so it would make sense if they are manually adding missing words to something. Normally, when I ask Forrest about what the other group is doing, he gets a smug grin on  
  his face and talks about how much he wishes he could tell me, but when I pitched this to him, his face went blank and he changed the subject.

Sam

  All missing words should be fixed now. The reason there was missing words before was because the wrong dictionary was being used. Jonas and I fixed it so that it would pull from the write dictionary. If we 
  want we can make our our dictionary. We just need to grab every word out of the 125 or 256 transcript file and write it to a file. We then need to grab all those words out of the master dictionary so that 
  we get the phonetic spelling of them.


4-25-2015

Kayla

  Heyy, 
  So I was tasked with creating the wiki page that contains the setting configurations and I'm just curious if I should wait until after the competition between our groups? In case there's anything that  
  would give away our tactics.. Or is it general knowledge and it shouldn't matter and I can create it this weekend? Just wanted to know what you guys thought. 

4-27-2015

Kayla

  Hey everyone, What was/is the lowest word error rate % that we have gotten so far now at this point? Also I just had a question about decoding.. I have so far run 2 decodes on my 125 hour experiment, but I 
  am not sure if I was doing it correctly..
  Each time you go to decode, once a run_decode.pl script has finished running.. Do you just go back into /etc and enter the same command?:
  head -1000 001_train.fileids > 001_decode.fileids  (obviously changing the parameters)
  Because that is what I did for the second time around decoding, but the fact that it is the head command and I don't know much about Linux commands, it threw me off. I know that that means the first 
  however many hours, but I didn't know if that meant I was decoding over the same first hours of the 125 hour train more than once, if that makes sense? 

Zach

  I'm not sure what our lowest error rate is. Sam would have to answer that.
  For your second question, you are re-decoding the first 1,000 files. To get the next 1,000, you do:
  head -2000 001_train.fileids | tail -1000 > 001_decode.fileids
  The number that goes after 'head' is the position of the last file you want to decode. The number that goes after 'tail' is the total number you want to decode. So, 'head -2000' specifies that we are   
  starting with the 2,000th file and 'tail -1000' specifies that we want to decode that file and the 1,000 files before it. Also, note that there are 5,074 files in total, so if you are decoding in sets of 
  1,000, your last decode should actually cover 1,074 files rather than 1,000. Does that make sense?

Sam

  The lowest WER is still 38.5% with a 4x real time factor. We need to get the real time factor to 2x so I expect the WER to increase by a few percent. This is okay because the other team will also feel the  
  same affects when they adjust for the real time factor.  

4-28-2015

Kayla

  Okay because on my 125 hour train, I still would have decoding to do.. but I did it with head -7000 I believe, and so far the result is 32.9 WER. Do you think it would be worth it to continue decoding to  
  see if it goes up or down?  

Sam

  How long has it been running? 32.9 WER is very good and is considerably less than last years results I believe. Keep decoding it I say. 

Ben

  Greetings gents, 
  Per Kenneth's request a week-and-a-half ago (or so), I'm putting together a list of affected properties from all of your logs and this email chain that should go a long way towards addressing this report's 
  desired "list of parameter" section.
  You can help. Can you each list for me a short list of all the values you changed throughout the project? I can dig for your results myself, so there's no need for you to spend a lot of time being 
  detailed, but it might be helpful to section them off into "things that kinda worked" and "things that didn't" if you can manage.
  I will compile it with what I have already into a comprehensive list of parameter changes. I might need some help from Sam, Stephen, Zach, or others, to complete the full description section, but I can get 
  it 90% of the way there, since its right in line with the documentation I'm already making.
  While I'm at it, do you want me to take a shot at the intro and stuff too? I haven't seen any progress on this so I'm not sure if somebody has started. Let me know, I'll be in tomorrow with my 
  documentation so far.
  PS: Made some changes to the results spreadsheet for readability. Added graphs. Have a look. Let me know what you think.

Kayla

  Wait I'm confused, Ken maybe you can help with this.. But how is that different from the page that I am creating? Because I was also tasked with listing what the configurations and settings were. Just want 
  to be clear we aren't doing the same thing Ben! 


Kenneth

  Ben's is suppose to be an overall strategy the group has been using. Yours is suppose to be instructional on how to change settings during config and decode. Ben's is purposed for the results to the 
  competition. Yours is for future semesters on how to config not what to config. Plus any errors or bugs. Sorry for the confusion.

Adam

  Attached are the results of my 125hour. I don't know where to post this...i haven't figured out our private Bruins page. Settings I used: Language weight -- 30, Density -- 64, Senome – 8000. When setting 
  up the decode, I only did head -1000. Someone please let me know if you'd like me to run something different.

Kayla

  So I took your advice and am running a second decode on the 125 hour since you said it was a good WER. I believe I ran the first decode using the first 7000 audio files, just using head -7000 (etc..). So, 
  for the second decode set, I tried to follow Zach's instructions that he had sent me about how to pick up where I left off and just want to make sure it is correct: 
  head -8000 | tail -1000 (etc...) for the next 1000 files?

4-29-2015

Ben

  Hi again all, 
  Here's a fancier-looking updated spreadsheet. I worked with Adam a bit on this as well as helped him through his decode process. I added a projection line which will predict one week past our final entry.  
  I'd also like to eventually separate our data into 5, 125, and 256 sections (rather than short (5) and long (256)).
  I removed some anomalously high data from the successful experiments run late in the process, mostly just to make us look good by highlighting our progress rather than our regressions. Interestingly our 
  progress on 256 hour trains is pretty close to exactly linear, making our "consistent improvement" angle very strong should we choose to go that route.
  Let me know what you think. When Chris is finished compiling results, I could add them here. Or, as mentioned, Kenneth could. Either way.

5-1-2015

Adam

  Just finished my 125 hour decode, this time of all 5566 and not just the first 1,000. Scored a 35.5. When I only ran the first 1,000 I scored a 34.5. Everything else stayed the same for both experiments.
  LW:30
  Density: 64
  Senome: 8000

5-3-2015

Stephen

  Hey Kenneth,
  If you want my 256hr train results and parameters let me know. The results were far worse than the shorter trains, I believe 49.3%. Still, if they are worth mentioning in the paper I can send the info o
  over. If it makes sense to include this info just let me know, take care! 

Kayla

  Hey Ken and everyone, I scored my 125 hour train in total and the result for WER was an even 34%. I don't know how this ranks among the rest of the scored trains, so I can give you the parameters I changed 
  if you would like Ken.

Zach

  So everyone knows what's going on: The other team is most likely submitting 0275/007 as their final result. This is a 256hr train that scores at around 32.5%. They used 128 density in this train, though, 
  which is useless for real-world applications. The plan is to mention this information in our report and the judges should give us the win both because their error rate without 128 density is a fair bit 
  higher than ours and because we evidently did more research than them. We also have more backing to our other changes than they do, as they did just change random values to get all of their results.
  We are going to be submitting our lowest 256hr train (0271/003) which has an error rate of 38% or something like that. We don't have a real-time factor for this yet, so I'm running a decode on it on 
  majestix right now and I'm going to start decoding the first hour and a half or so on traubadix just to ensure we at least have a decent sample size for our RT.