Speech:Spring 2014 Mitchell Dezak Log


 * Home
 * Semesters
 * Spring 2014
 * Proposal
 * Report
 * Information - General Project Information
 * Experiments - List of speech experiments

Week Ending February 4th, 2014
Review past class logs
 * Task:

2/1
 * Results:

Looked at past logs to see what other groups did in the first week. Mostly quick, simple stuff.

Get a basic understanding of what our group is supposed to accomplish and get on board with everyone. Nothing so far.
 * Plan:
 * Concerns:

Week Ending February 11, 2014

 * Task:


 * Results:


 * Plan:


 * Concerns:

Week Ending February 18, 2014
To find the genTrans scripts and make sure that they are up to date and could use any improvements. I will also read about them on the genTrans page on the wiki. I want to know more about the data backup and speech corpus.
 * Task:

2/16
 * Results:

Read up on the speech corpus and genTrans. In logs from last year it was said that there might be some missing data from some of the scripts. It looks as though, the missing data was added to that script later on. Perl was also used to do this, which I'm not too familiar with, so depending on what I am able to find I might have to add some missing data using Perl as well. Looking at what was done, it doesn't look too complicated but it is still something that I am not familiar with.

2/17

Looking to download the genTrans file to my computer to review it. I know this was done last year and some issues were run into, so hopefully those were resolved.

Access and make any updates that are necessary to the genTrans. It looks like getting familiar with Perl might have to happen sooner rather than later.
 * Plan:


 * Concerns:

Week Ending February 25, 2014
2/20- Read some logs and wrote a couple tasks for the week below.

2/23- Read up about Perl, starting to understand it better. Analyzed some of the code the best that I could.

2/24- Kept analyzing the genTrans, thinking of changes to be made.

2/25- Logged in, read logs


 * Task:
 * Get a basic understanding of Perl
 * Once I get a basic understanding of Perl, start to work out the genTrans

2/23- So far I am noticing that Perl isn't any different from other programming languages. I can't say that I am that surprised. That doesn't mean that I am able to make out all of the genTrans, but it is progress. Luckily to make things easier, the genTrans does have explanations of the code throughout. Some of it is a bit vague, so parts of it is still not entirely easy to understand.
 * Results:

for example:

$message =~ s/sw[0-9]*[A-B]-ms98-a-[0-9]* [0-9]*.[0-9]* [0-9]*.[0-9]* //; # remove everything before the message

That is from genTrans6.pl The code seems pretty confusing at first glance, and the comment doesn't tell me very much. There is more code like this to look into.

I have also noticed that in some of the genTrans that some of the code has been commented out. A lot of it has been in 5 and 6 actually.

from genTrans5:

#$message =~ s/\/.*?]//g; #$message =~ s/\[noise\]//g; #remove [noise] #$message =~ s/\[laughter\]//g; #remove [laughter] #$message =~ s/\[vocalized-noise\]//g; #remove [vocalized-noise] #$message =~ s/\[laughter-//g; #remove [laughter- #$message =~ s/\[.*?\///g; #remove [ / $message =~ s/(\si\-\s)/ i /g; #replace i- with i     $message =~ s/<.*?>//g; # remove < > #$message =~ s/-\[//g; #remove -[ #$message =~ s/\]-//g; #remove ]- #$message =~ s/\]//g; #remove ] #$message =~ s/\[//g; #remove [

There is probably a reason behind it, (maybe something that I have forgot about), but I thought it was something worth noting.

2/24- After noticing that the genTrans5 and 6 files had commented out a lot of stuff, I went back and looked at genTrans4. It has the same messages without them being commented out. I found out on the page: http://openitware.org/projects/index.php/Speech:GenTrans that it is done because the messages that are commented out cause data to be lost. Therefore, that script did not work.

I want to be able to understand Perl so that I can understand what I am looking at on the genTrans, then I can also work out everything in those files. I don't want to jump in too quickly without knowing what I am doing otherwise I can end up wasting a lot of time. Trial and error doesn't hurt, but I'd like to reduce error as much as possible. Nothing at the moment
 * Plan:
 * Concerns:

Week Ending March 4, 2014
2/27- Logged in, read logs

3/2- Discussed with Jared and John over Email what we need to work on as a group and separated them individually.

3/3- Worked on Poster Proposal

3/4- redid poster proposal


 * Task:

I have the Poster Proposal to work on, I'm not entirely sure what I'm going to do for this, but I plan on getting it started and we can finalize it as a group.


 * Results:

3/3- I worked on the Poster Proposal in Power Point, because that seems to be the easiest way to put the contents of what you want to put on a poster on a computer. Finding graphics to use for this seems to be the hardest part, since we are essentially working off of a command line most of the time.

3/4- I scrapped what I had before and wrote the abstract for the Poster proposal.


 * Plan:


 * Concerns:

I'm not entirely sure what the Poster Proposal is about

Week Ending March 18, 2014
3/15- logged in


 * Task:


 * Results:


 * Plan:


 * Concerns:

Week Ending March 25, 2014
3/20- Logged in. Read up on running experiments.

3/23- wrote in tasks and plans for the remaining week.

3/25- Attempted experiment


 * Task:

Learn about running experiments before running experiments.

Run my first experiment this week.


 * Results:

3/23- Read through the model building steps to get a run down again of running an experiment. David walked me through running one during Wednesday's class. I have also gone through some logs that have experiments that were done. I'm going to start my first experiment tomorrow.

3/25- I tried running experiments 0238 and 0239. 0238 had a senome of 1000 and came up with the error: BEGIN failed--compilation aborted at RunAll.pl line 48. I think tried experiment 0239 with a senome of 3000 and came up with the same error. Not exactly sure how to fix this. I went through the wiki pages again and haven't found anything that says I should be doing anything different so I must be missing something.


 * Plan:

3/23- I will attempt to run my first experiment tomorrow. Results to follow.


 * Concerns:

Week Ending April 1, 2014
3/30- Worked on experiment, read logs.


 * Task:

3/30- Finish up my experiment for the week. I also need to look into how to install a SciPy stack on Fedora 19.


 * Results:

3/30- Went back to look for the experiment (0238) that I was working on. It was a 5hr train so I had it running in the background and I came back to it and am not exactly sure where it left off so I am going to talk to David about where to go from here.

3/31- Researched installing the SciPy stack into Fedora 19. I found the command "sudo yum install numpy scipy python-matplotlib ipython python-pandas sympy python-nose" on the SciPy website. This command is specifically for Fedora. I checked Colby Johnson's log, since he asked the Data Group to look into this, and it appears that this command failed and he has an error log in his log.


 * Plan:

For the week I hope to finish my experiment and make progress on the SciPy stack


 * Concerns:

Not exactly sure where to go with my experiment at the moment. The good news is that it has worked so far. I think it just needs the language model and decode.

Week Ending April 8, 2014
4/5- logged in

4/6- logged in. Wrote in tasks for the week

4/7- logged in. Running Experiment

4/8- logged in. 4/8 below


 * Task:

4/6- Since John got SciPy successfully working on Fedora, I no longer have to worry about that anymore. So Colby told me to start with running a train on Rome.

4/8- Colby sent me an Email about the size of the dictionaries taking up too much space during the experiment process. David, Pauline, and I have been assigned the task of creating smaller dictionaries and symbolically linking them along with the wav files and transcripts since those are taking up space as well. We need resolve this issue as soon as we can so that we can run experiments without having to worry about how much disk space is available.


 * Results:

4/7- Did a lot of reading today. Logs, articles, anything I could find. Also, I'm currently in the process of running a mini/train on Rome. Will post the results when it is finished.


 * Plan:

My plan this week is to run a train and decode on Rome.


 * Concerns:

I haven't run a train without the help of others yet, but I think I will be alright.

Week Ending April 15, 2014
4/12-Logged in. Began deleting wav files from past experiments and found the amount of space taken up and what files need to be deleted first

4/13- Logged in.

4/14- Logged in. Deleted most of the wav files

4/15- Logged in. Showed amount of space available on disk


 * Task:

4/12- To free up some of the space, I will need to begin to delete wav files from past experiments. The wav files from last years experiments (0001-0134), will be fine since they are not really of use to us anymore and they are just taking up space. They are perfect to take space back from. I will start by deleting the bigger ones first and then work my way to the smaller ones.

4/14- Delete all of the wav files in experiments 0001 through 0134. At least the ones that I have not deleted already.


 * Results:

4/12- To find how much space is being taken up by these wav files specifically by experiment, I run the command:    du -sk */wav      in the     /mnt/main/Exp     directory. You could just simply run   du */wav      in the same directory and you will get the same results you need, but you will sometimes get less specific results that are not in block size (1k) and it will include files that may be included inside of the wav files. Which there should not be any. However, in this case Exp 0019 has files located inside of it for some reason. Not sure how that happened, but the files that are in there already exist in the Exp/0019 directory. They are either duplicate files or they are from somewhere else. Not sure.

After running du -sk */wav I got the results:

23596	0001/wav 4	0005/wav 4	0006/wav 17664	0007/wav 17664	0008/wav 17664	0009/wav 17664	0010/wav 17688	0011/wav 112888	0012/wav 9296	0013/wav 8908	0014/wav 111972	0015/wav 112464	0016/wav 1131664	0017/wav 17688	0018/wav 59984	0019/wav 225140	0020/wav 4	0021/wav 17688	0022/wav 202536	0023/wav 112888	0024/wav 112888	0025/wav 112464	0026/wav 112464	0027/wav 112464	0028/wav 17688	0029/wav 4	0030/wav 17688	0031/wav 112888	0033/wav 112464	0034/wav 225304	0035/wav 225268	0036/wav 17688	0037/wav 112888	0038/wav 112888	0039/wav 112892	0040/wav 112888	0041/wav 112464	0042/wav 112888	0043/wav 112888	0044/wav 112464	0045/wav 225304	0048/wav 112464	0049/wav 112888	0050/wav 112464	0051/wav 112464	0052/wav 112464	0053/wav 112464	0054/wav 112464	0055/wav 112464	0056/wav 112464	0057/wav 112888	0058/wav 112888	0059/wav 112888	0060/wav 112464	0061/wav 4	0062/wav 112888	0063/wav 112464	0064/wav 112888	0065/wav 112888	0066/wav 112888	0067/wav 112888	0068/wav 4	0069/wav 112888	0070/wav 112464	0071/wav 112464	0072/wav 112888	0073/wav 672028	0074/wav 66592	0075/wav 701968	0076/wav 701968	0077/wav 701968	0078/wav 66592	0080/wav 672028	0081/wav 672028	0082/wav 672028	0083/wav 701968	0084/wav 216	0085/wav 701968	0086/wav 66592	0087/wav 701968	0088/wav 672028	0089/wav 66592	0090/wav 66592	0091/wav 77716	0092/wav 77716	0093/wav 66592	0094/wav 112888	0095/wav 66592	0096/wav 358732	0097/wav 672028	0098/wav 343012	0099/wav 34032	0100/wav 672032	0101/wav 672032	0102/wav 34032	0103/wav 66592	0104/wav 672048	0105/wav 672032	0106/wav 672028	0107/wav 66592	0108/wav 66592	0109/wav 66592	0110/wav 112888	0111/wav 66592	0112/wav 66592	0113/wav 343012	0114/wav 34032	0115/wav 701968	0116/wav 77716	0117/wav 66600	0118/wav 672096	0119/wav 66592	0120/wav 672024	0121/wav 66616	0122/wav 672008	0123/wav 66560	0124/wav 672024	0125/wav 20	0126/wav 1373936	0127/wav 66560	0128/wav 1373928	0129/wav 4	0131/wav 4	0132/wav 672008	0133/wav 672012	0135/wav 668568	0136/wav 66356	0137/wav 66560	0138/wav 66560	0139/wav 672008	0140/wav 672008	0141/wav 672008	0142/wav 112892	0143/wav 1124560	0144/wav 0	0145/wav 4	0146/wav 0	0147/wav 701968	0148/wav 701968	0149/wav 0	0150/wav 4	0151/wav 57328	0152/wav 57328	0153/wav 8976	0154/wav 8976	0155/wav 358736	0156/wav 701968	0157/wav 0	0158/wav 4	0159/wav 4	0160/wav 358736	0161/wav 400	0162/wav 358736	0163/wav 1373972	0164/wav 358748	0165/wav 404	0166/wav 358740	0167/wav 701972	0168/wav 358736	0169/wav 701972	0170/wav 701972	0171/wav 701968	0172/wav 701968	0173/wav 701972	0174/wav 701972	0175/wav 701972	0176/wav 57328	0177/wav 1373936	0178/wav 702024	0180/wav 1373968	0181/wav 701968	0182/wav 66592	0183/wav 0	0184/wav 4	0185/wav 4	0188/wav 358736	0189/wav 57328	0190/wav 0	0191/wav 3452	0192/wav 77716	0193/wav 0	0194/wav 77720	0195/wav 77720	0196/wav 77720	0197/wav 672012	0198/wav 701968	0201/wav 701968	0203/wav 10299096	0205/wav 66600	0206/wav 420272	0207/wav 701972	0209/wav 4	0211/wav 701972	0212/wav 57332	0213/wav 17688	0214/wav 358740	0215/wav 496972	0216/wav 17688	0217/wav 701972	0218/wav 4	0219/wav 17688	0220/wav 112892	0221/wav 701972	0222/wav 112892	0223/wav 112888	0224/wav 4	0225/wav 112892	0226/wav 112892	0227/wav 4	0228/wav 4	0229/wav 701976	0230/wav 4	0231/wav 4	0232/wav 400	0233/wav 4	0234/wav 4	0235/wav 4	0236/wav 112888	0237/wav 4	0239/wav 113004	0240/wav 112888	0241/wav 672032	0242/wav 112892	0243/wav 112888	0246/wav 672032	0247/wav 112888	0248/wav 112888	0249/wav 112888	0250/wav

On the left is the amount of bytes being taken up by each experiment. Some are enormous, while some are just 4 or 0 which aren't even worth touching. This grabbed the first 250 experiments wav files and some have already been deleted it appears and some just never had wav files to begin with.

caesar:/mnt/main/Exp # df Filesystem                 1K-blocks          Used                  Available         Use%    Mounted on /dev/sda                    220641788         6470800                13122364          34%       / devtmpfs                   1035460           168                    1035292            1%      /dev tmpfs                      1035668           332                    1035336            1%     /dev/shm /dev/sda3                  47838744          344784                 45063876           1%     /home /dev/sdb1                  458311888         366289540              68741404          85%    /mnt/main caesar:/mnt/main/Exp # df -h Filesystem              Size         Used    Avail     Use% Mounted on /dev/sda2              20G           6.2G      13G      34%   / devtmpfs              1012M         168K      1012M    1%   /dev tmpfs                 1012M        332K       1012M    1%   /dev/shm /dev/sda3             46G           337M      43G      1%   /home /dev/sdb1            438G           350G      66G     85%   /mnt/main

The amount of space being taken up on /mnt/main it appears is 85%. The top command, df, shows how much bytes we used and are available. The df -h command makes it human readable or easier to understand. Really it just makes the numbers appear a lot smaller.

4/14- To go in and delete the wav files I have to log in on caesar as root. I then go into the /mnt/main/Exp directory. From there I navigate back and forth from Experiment to Experiment getting rid of the wav files. It is very repetitive.

From the /mnt/main/Exp directory I cd into the experiment I need to:

caesar:/mnt/main/Exp # cd 0086

I then created a directory inside of that experiment so that I can move the wav file inside of it for deletion. I could simply delete the wav file itself, but a simple typo could lead to deleting all the wav files in the directory. (the rm -rf command is not forgiving). To create a directory I used mkdir and I can name it almost anything as long as it is unique to this directory so that I am not deleting anything else within it. Calling it DELETE, or in this case DELTE will do just fine.

caesar:/mnt/main/Exp/0086 # mkdir DELTE

Now when I list everything in the directory, the new directory, DELTE, shows up.

caesar:/mnt/main/Exp/0086 # ls 0086.html bwaccumdir  etc   logdir              python      trees add.txt   DECODE      feat  model_architecture  qmanager    wav bin       DELTE       LM    model_parameters    scripts_pl

I need to move wav into the DELTE directory so that I can delete it cleanly. to do this I used the command    mv wav DELTE      This simply moves wav into the DELTE directory, but it is not yet deleted, it is just in a different directory.

caesar:/mnt/main/Exp/0086 # mv wav DELTE

When I cd into DELTE you can see that the wav file has been moved into there

caesar:/mnt/main/Exp/0086 # cd DELTE caesar:/mnt/main/Exp/0086/DELTE # ls wav

Now all I have left is to delete the DELTE directory. I used the command    rm -rf DELTE     in the directory where DELTE was located. This remove command deletes everything with that name without asking if I am sure.

caesar:/mnt/main/Exp/0086 # rm -rf DELTE

The wav file is gone from Experiment 0086

caesar:/mnt/main/Exp/0086 # ls 0086.html bwaccumdir  feat    model_architecture  qmanager add.txt   DECODE      LM      model_parameters    scripts_pl bin       etc         logdir  python              trees

After that I cd back into the experiments directory and move onto the next one and do it again.

4/15- The wav files cleared a significant amount of space. When I looked last night, it was actually at 80% but now it is at 81%.

caesar main/Exp> df Filesystem          1K-blocks      Used Available Use% Mounted on /dev/sda2             20641788   6464144  13129020  33% / devtmpfs              1035460       168   1035292   1% /dev tmpfs                 1035668       332   1035336   1% /dev/shm /dev/sda3            47838744    344784  45063876   1% /home /dev/sdb1           458311888 351872128  83158816  81% /mnt/main

caesar main/Exp> df -h Filesystem           Size  Used Avail Use% Mounted on /dev/sda2              20G  6.2G   13G  33% / devtmpfs            1012M  168K 1012M   1% /dev tmpfs               1012M  332K 1012M   1% /dev/shm /dev/sda3             46G  337M   43G   1% /home /dev/sdb1            438G  336G   80G  81% /mnt/main


 * Plan:

My plan for this week is to clear all the wav files from experiments 0001 through 0134, then help David generate and move new files into the info directory of each corpus.
 * Concerns:

Obviously, deleting something that I am not supposed. I am not that concerned though

Week Ending April 22, 2014
4/19- Logged in


 * Task:


 * Results:


 * Plan:


 * Concerns:

Week Ending April 29, 2014

 * Task:


 * Results:


 * Plan:


 * Concerns:

Week Ending May 6, 2014

 * Task:


 * Results:


 * Plan:


 * Concerns: