Speech:Spring 2013 Brian Drouin Log


 * Home
 * Semesters
 * Spring 2013
 * Proposal
 * Report

Week Ending February 5th, 2013

 * Tasks:

1. Continue researching and reading about the system and hardware

2. Start learning Linux/Unix + explore the current system

3. Research findings from previous semesters regarding backups and potential solutions

4. Begin researching other backup options

2/3/13 into 2/4 9:00pm to 1:00pm - I spent a few hours researching and exploring the current system and it's hardware. I also spent some time researching unix, hardware information documented from previous semesters and started researching backup solutions.

2/4/13 4:00pm to 7:30pm - I spent this time gathering hardware and networking information on caesar. I found a bunch of commands through searching the wiki as well as the internet. I logged the session and will pull out the useful commands and information and share with the team on 2/6.

2/5/13 10:00am to 12:30pm - Reviewed my logged session output from 2/4 and put together the list of commands below to share with the group.

2/5/13 9:00pm to 10:00pm - mapped the network and put together a brief outlined listed below under "Network Information". I'm seeing a device in caesar's arp cache that is unknown - millie1.miller1.unh.edu (132.177.189.40). I'll ask about this device during class on 2/6.

- found the device in question through researching the network bridge setup article - 132.177.189.40 is the name server

2/6/13 8:00am to 9:45am - continued to read logs / information from previous semesters relating to backups. Will discuss during class to determine what needs to be backed up / How much space is required to support a backup.


 * Results:

Here is a list of helpful commands for general purposes and information as it relates to the filesystem, hardware, OS and network.

man - help manual

Hardware and System


 * hwinfo - extended hardware informationar
 * hwinfo --short --wlan - hardware info
 * hwinfo --short --gfxcard
 * lspci - pci information
 * ethtool - Ethernet card settings
 * free -m - available memory
 * ps - process status
 * top - list processes and their usage
 * who - usernames of whoever is currently logged in
 * uname -r - check kernel version
 * ls -l /dev/disk/by-id
 * df -h - used disk space
 * dmidecode -t memory - memory configuration to include type and bank location

Networking


 * arp - check host network
 * netstat - networking info
 * netstat | head - summary network information
 * netstat -rn - show routes
 * nslookup - query name servers
 * ping - ICMP request
 * ip - configure a network interface
 * route - routing information
 * ip route - find your gateway
 * cat /etc/resolv.conf - find your DNS servers
 * ethtool - configure ethernet settings

File System


 * ls -a - lists hidden files
 * dir - list directory contents
 * vdir - verbose directory info
 * fdisk -l - list all disk and partitions
 * cfdisk - partition table manipulation
 * fscheck - file system consistency check and repair <<< only run if you suspect problems with the drives
 * format - format disks
 * mount
 * cat /proc/mounts - lists mounted file systems

Network Information

Caesar (Sever)       192.168.10.1


 * Gateway            132.177.188.1  at 00:18:19:f3:42:70 [ether] on eth0
 * Asterix (Client)   192.168.10.2   at 00:0f:1f:69:6f:65 [ether] on eth1
 * Obelix(Client)     192.168.10.3   at 00:0f:1f:03:e6:83 [ether] on eth1
 * Miraculix (Client) 192.168.10.4   at 00:0f:1f:03:e6:43 [ether] on eth1
 * Traubadix (Client) 192.168.10.5   at 00:0f:1f:69:6a:f3 [ether] on eth1
 * Majestix (Client)  192.168.10.6   at 00:0f:1f:03:e6:73 [ether] on eth1
 * Idefix (Client)    192.168.10.7   at 00:0f:1f:03:e1:5d [ether] on eth1
 * Automatix (Client) 192.168.10.8   at 00:0f:1f:03:e1:5b [ether] on eth1
 * Methusalix (Client) 192.168.10.9  at 00:0d:56:fd:77:1d [ether] on eth1
 * Verleihnix (Client) 192.168.10.10 at 00:0b:db:94:c9:98 [ether] on eth1

millie1.miller1.unh.edu 132.177.189.40 at 00:50:56:a6:24:a6 [ether] on eth0 - name server


 * Plan:

I plan to discuss my findings with the group on 2/6 and to continue to research backup options. I need more information on what needs to be backed up.


 * Concerns:

No major concerns at this time

Week Ending February 12, 2013

 * Task:


 * Continue to investigate backup options discussed by previous semesters (dropbox, google sites) and evaluate those as options


 * Research clonezilla. We also discovered that opensuse offers a backup service


 * Draft a proposal for backups based on the best option


 * Results:

2/10/13 10:00pm to 11:15pm - I read about clonezilla. This seems like a viable option dependent on some external HD space. We could perhaps acquire some external hardrives or could clone the servers on some of the other hardware in the server room. We need to figure out exactly how much space is needed and where we're to get disk space before performing this backup.

http://clonezilla.org/

2/11/13 various times throughout the day - Exchanged emails with the group in regards to the plan for the rest of the semester. I need more clarification on clonezilla how to accomplish a system backup. I logged into the system to find out exactly how much HDD space is being used in this current configuration. We will also need to estimate how much HDD space will be needed after we get the system running to include trains, models, decodes.

2/12/13 12:00pm to 1:00pm - began writing a proposal for using clonezilla to image the servers. One limitation discovered is that the partition to be cloned must be unmounted. I need to find a way that we can unmount for the backup then mount the clients again or find another option.


 * Tyler and I have been discussing this option. We can set up a scheduled maintenance shutdown of the system. After a shut down the clients unmount from Caesar which would give us the opportunity to back up.

2/12/13 1:00pm to 2:45pm - drafted a project proposal for the hardware team and emailed it to the group. Each member can add based on what work they've done thus far. We will review this as a group on 2/13.


 * Plan:


 * Concerns:

Week Ending February 19, 2013

 * Task:

1. Draft a proposal for the clonezilla backup solution


 * Results:

2/13/13 - 1:30pm to 3:00pm - Worked with the team in an attempt to restore caesar. We replaced the failed hard drive and attempted to bring the server back online.

2/16/13 - 1:00pm to 1:30pm - Read emails explaining how Eric was able to restore caesar.

2/18/13 - 12:00pm to 2:00pm - started drafting the proposal for the clonezilla backup image backup solution. It will be posted when completed prior to the class/team meeting on 2/20. I also updated the project proposal section for our group as it relates to the drive failure on Caesar.

2/19/13 - 6:00pm to 8:00pm - completed drafting the proposal.


 * Plan:

Proposal for a Clonezilla backup solution

Summary

The UNHM CIS790 capstone class is using a system comprised of ten interconnected servers to run speech recognition software on an open source Linux operating system (OpenSUSE 12.2). The main server, a Dell PowerEdge 2650 with the hostname of Caesar, serves as the systems NFS repository with all other clients (Dell PowerEdge 1750s) in the stack sharing Caesar’s 500GB of hard drive space in a mounted configuration. This system will become increasingly prone to hardware failures as the equipment ages. With an estimated 200 hours of transcripts and all other data derived through the production of language models it’s essential to not only back up the data but the entire system through imaging.

In a traditional file backup only specified data is copied to external space, however, imaging software creates an entire backup of a drive which captures everything currently running in a production system. Imaging the system will eliminate timely restoration effort as it can recover the running configuration as well as the programs that are currently installed.

Benefits

1.	A complete backup of the entire system configuration 2.	Reduced downtime in the event of a failure 3.	Easily performed by system technicians 4.	This is a free solution (depending on if external HDD space can be acquired)

Scope

This backup project will also serve as the foundation backup solutions in the future and will provide information for future classes on how to orchestrate a backup using this outline methodology.

Methodology

As specified on the clonezilla site their software cannot backup a mounted system. We will have to shut down in order to perform the backup. A backup will occur during a scheduled maintenance period in which the system can be powered down.

Confirm you have the correct path name before executing commands! Failure to do so could cause loss of data or your GNU/Linux not to boot!!! /dev/sdd is a device path name /dev/sdd1 is a partition path name

1.	Download the Clonezilla Live zip file. 2.	If you already have a FAT16 or FAT32 partition on your USB flash drive then skip to the next step (3). Otherwise prepare at least a 200 MB partition formatted with either a FAT16 or FAT32 file system. If the USB flash drive or USB hard drive does not have any partition, you can use a partitioning tool (e.g. gparted, parted, fdisk, cfdisk or sfdisk) to create a partition with a size of 200 MB or more.

Here we assume your USB flash drive or USB hard drive is /dev/sdd (You have to comfirm your device name, since it's _NOT_ always /dev/sdd) on your GNU/Linux, so the partition table is like:

Disk /dev/sdd: 12.8 GB, 12884901888 bytes 15 heads, 63 sectors/track, 26630 cylinders Units = cylinders of 945 * 512 = 483840 bytes Disk identifier: 0x000c2aa7
 * 1) fdisk -l /dev/sdd

Device Boot     Start         End      Blocks   Id  System /dev/sdd1  *           1       26630    12582643+   b  W95 FAT32

Then format the partition as FAT with a command such as "mkfs.vfat -F 32 /dev/sdd1"

WARNING! Executing the mkfs.vfat command on the wrong partition or devic could cause your GNU/Linux not to boot. Be sure to confirm the command before you run it.

mkfs.vfat 2.11 (12 Mar 2005)
 * 1) mkfs.vfat -F 32 /dev/sdd1

3.	Insert your USB flash drive or USB hard drive into the USB port on your Linux machine and wait a few seconds. Next, run the command "dmesg" to query the device name of the USB flash drive or USB hard drive. Let's say, for example, that you find it is /dev/sdd1. In this example, we assume /dev/sdd1 has FAT filesystem, and it is automatically mounted in dir /media/usb/. If it's not automatically mounted, manually mount it with commands such as "mkdir -p /media/usb; mount /dev/sdd1 /media/usb/".

4.	Unzip all the files and copy them into your USB flash drive or USB hard drive. You can do this with a command such as: "unzip gparted-live-0.4.5-2.zip -d /media/usb/"). Keep the directory architecture, for example, file "GPL" should be in the USB flash drive or USB hard drive's top directory (e.g. /media/usb/GPL).

5.	To make your USB flash drive bootable, first change the working dir, e.g. "cd /media/usb/utils/linux", then run "bash makeboot.sh /dev/sdd1" (replace /dev/sdd1 with your USB flash drive device name), and follow the prompts. WARNING! Executing makeboot.sh with the wrong device name could cause your GNU/Linux not to boot. Be sure to confirm the command before you run it.

NOTE: There is a known problem if you run makeboot.sh on Debian Etch, since the program utils/linux/syslinux does not work properly. Make sure you run it on newer GNU/Linux, such as Debian Lenny, Ubuntu 8.04, or Fedora 9.

TIP:  If your USB flash drive or USB hard drive is not able to boot, check the following:

Ensure that your USB flash drive contains at least one FAT partition. Ensure that the partition is marked as "bootable" in the partition table. Ensure that the partition starts on a cylinder boundary. For the first partition this is usually sector 63.

Risk Analysis

This is a low-risk project as it doesn’t change any of the configurations of the system. The only risk is the downtime required in which the backup will occur.

Project Coordination

The systems team will coordinate a scheduled maintenance schedule and will ensure that the class is aware of the outage prior to the maintenance event.


 * Concerns:

Emailed the team with the proposal for review. If approved by the team it will be sent to Prof Jonas for the final approval

Week Ending February 26, 2013

 * Task:


 * Results:

2/20/13 - 1:00pm - 3:30pm - worked with the group in an attempt to restore SSH to caesar.

2/20/13 - 3:30pm - 4:00pm - signed up for the URC

2/22/13 - 12:30pm - 1:30pm - added the proposal to the Systems group page (Pending review) - added a brief timeline for the backup solution to the proposal page.

2/25/13 - 12:30pm - 1:30pm - read through Mike's proposal for the fedora OS upgrade. Made suggestions through email on how to improve the proposal. Suggested adding in a comparison chart outlining the differences between our current version of opensuse and the latest distribution of fedora and perhaps more Linux OS releases.

2/25/13 - 11:00pm - 12:00am - read through logs

2/26/13 - 8:30pm to 10:00pm - read through the project proposal page as well as logs from the system group.


 * Plan:

Last week we were unable to attempt a an image of one of the batch servers due to the SSH issue. I would like to try a back up this week during our class meeting.
 * Concerns:

Week Ending March 5, 2013

 * Task:


 * Results:

3/3/13 10:30pm - 12:00am - Researched Clonezilla with RAID backups. Information thus far is inconclusive. Will research more.

3/4/13 11:30am - 1:00pm - Read through the logs of the system group and continued the research started on 3/3.


 * Plan:


 * Concerns:

Week Ending March 12, 2013

 * Task:

1. create some diagrams of the server stack in visio

2. create a process for the clonezilla backup


 * Results:

3/6/13 - 2:00pm to 4:00pm - worked with the group to install fedora on a spare drive used in caesar's chassis.


 * Plan:


 * Concerns:

Week Ending March 26, 2013

 * Task:

3/17/13 10pm - 11pm - started reading through logs from the Modeling Group. Still need to catch up on the work that they've done in preparation for my own modeling.

3/18/13 5pm-9pm - Started working on a visio diagram showing the rack and servers for this project.

3/23/13 10:30pm to 12:00am - Read through the newly revised document that Eric sent out to familiarize myself with how to set up experiments. I'll run through the process on the system on 3/24 and will contact Eric or the group if I run into any issues.

3/25/13 5:00pm to 6:30pm - Ran a clonezilla backup on my local machine and created a procedure for backing up the servers.

3/25/13 8:00pm to 10:30pm - Completed the rack diagram and adjusted it to the feedback from Tyler.

3/26/13 7:00pm to 9:30pm - Attempted to create train based on the instructions give however ran into an issue with generating the feats data. I'm sure that it's something that I did incorrectly however, can't figure out where my mistake is. Emailed Eric asking him to look at this on 3/27 during class. See experiment 0038 for the error.

Clonezilla Backup Procedure:

Requirements:
 * CD or USB with clonezilla installed
 * External hard drive for storing the image


 * Download the ISO clonezilla from http://clonezilla.org/ on to a CD
 * Use a program such as nero to burn to a disc
 * Insert the disc
 * Boot off the server and go into bios
 * Change the boot settings to disc
 * Enter to continue
 * Choose your language
 * Select “don’t touch key map” and hit enter
 * Start clonezilla
 * Select device image work with discs or partitions using images
 * Select USB local device (external HDD)
 * Hit enter on the default “Do you want to use a USB drive as a clonezilla repository”
 * Select your hard drive
 * Choose the directory in which you want to save the image to
 * Save the entire disc
 * Name the image
 * Select the disc that you’re backing up from (The server’s OS disc) and hit enter to continue
 * Click yes when asked if you’re sure you want to continue
 * After the backup has been completed, you’ll be prompted to either reboot or shutdown
 * Select reboot (and boot into windows) and your disc will be ejected and your image

Restoral Procedure:


 * Insert the clonezilla disc
 * Boot off the server and go into bios
 * Change the boot settings to disc
 * Enter to continue
 * Choose your language
 * Select “don’t touch key map” and hit enter
 * Start clonezilla
 * Select device image work with discs or partitions using images
 * Choose the local hard drive in which the image is located
 * Find the directory that contains the image
 * Use beginner mode
 * Instead of Saving the entire disc, choose “restore disc”
 * Choose the image file to restore
 * Select the drive that you want to restore
 * You’ll be asked if you’re sure twice. Click yes through both prompts.




 * Results:


 * Plan:


 * Concerns:

Week Ending April 2, 2013
Complete an experiment
 * Task:


 * Results:

Stuck on the "Generate Feats data" section of training. I'll consult the group on 3/27/13 to see if anyone has experienced this and knows how to resolve it.

3/30/13 - 10:00am to 11:00am - Reviewed Eric's suggestions and debug of the failed experiment attempted earlier this week (0038). I'll attempt to work through it again either today or on 4/1/13. I also added the rack diagram to hardware page in the information section. I also added the list of helpful and more detailed unix commands to the Unix Notes page in the information section.

4/1/13 - 1:00pm to 2:30pm - Attempted to finish experiment 0038 and got through running the first train however was interrupted. I hope to get the full experiment done between 6:30pm and 10:00pm. Eric found a colon that I added to sphinx_train.cfg file that I must have added while editing in VI. After the removal I was able to successfully complete the gen feats step. I have a list of words that produced warnings and will work to getting those added into the dictionary or removed from the transcript.

4/2/13 - 2:00pm to 3:00pm - Updated the list of words that threw warnings when running the train with the list of phones from the CMU site.

4/2/13 - 8:00pm to 9:00pm - Attempted to update the dictionary with the the list of words however am now getting a ton of phone errors which seems to be related to missing stress indicators. Working with Eric to get an updated working dictionary. After adding the new dictionary to the experiment from the mini/train corpus which is running without an issue now. After this completes, I'll create the language model and start the decode. Scoring to be completed on 4/3. 10:00pm - Created the language model and began the decode.

Results

caesar Exp/0038> /mnt/main/scripts/train/scripts_pl/make_feats.pl -ctl /mnt/mExp/0038/etc/0038_train.fileids Configuration (e.g. etc/sphinx_train.cfg) not defined Compilation failed in require at /mnt/main/scripts/train/scripts_pl/make_feat line 43. BEGIN failed--compilation aborted at /mnt/main/scripts/train/scripts_pl/make_s.pl line 43.

Words that errored during the initial train.

FEDERALDES

DUCTWORK

COGNIZITIVE

CHOWPHERD

ALBRIDGE

SOUTHBEND

VOCALIZED

MOOSEWOOD

VOCALIZED

DADGUM

EXPERIENCEWISE

VOCALIZED

CANSEGO

HOPELY

VOCALIZED

VOCALIZED

STORLY

VOCALIZED

KID'LL

REINJURING

NFL

PE

PE

VOCALIZED

VOCALIZED

UNDERGRADS

GTE

IBM

VOCALIZED

VOCALIZED

VOCALIZED

MARYLANDER

MARYLANDER

MARYLANDER

PLANOITE

The warning below occurred after to incorrect additions into the dictionary: The CMU site wasn't able to lookup many of the words that threw warnings during the initial train and directed me to another site "LOGIOS". When updating some of the words into the dictionary, either the phones were incorrect or the lexical stresses were incorrect.

WARNING: This phone (UW2) occurs in the phonelist (/mnt/main/Exp/0038/etc/0038.phone), but not in any word in the transcription (/mnt/main/Exp/0038/etc/0038_train.trans)

We then added a new working dictionary from the mini/train corpus which ran successfully after correcting two words in the new dictionary. I was then able to run a successful train, create the language model and run the decode.


 * Plan:

Have a much better understanding of what to do however, have not been able to complete the experiment. I hope to have it completed and scored on 4/3.
 * Concerns:

Week Ending April 9, 2013
4/3/13 - 1:30 - Investigating how to resolve the errors that I received when scoring on 4/2. Following the steps to remove the redundant entries in 0038_train.trans.
 * Task:

Missing: (sw2005a-ms98-a-0052) (sw2020b-ms98-a-0018) (sw2022a-ms98-a-0005) (sw2028a-ms98-a-0049) (sw2234a-ms98-a-0007) (sw2245a-ms98-a-0166)

I was able to complete the experiment by running the uniq hyp.trans >> hyp.trans.uniq command to remove the duplicates. I then ran the sclite -r 0038_train.trans -h hyp.trans.uniq -i swb >> scoring.log command to run the scoring on the new hyp.trans.uniq file which threw one duplicate entry error "sw2245a-ms98-a-0166". I went into VI and removed the duplicate entry and ran through scoring again which was successful. The experiment page 0038 has been update with the results.

4/7/13 11:30pm to 12:30am - set up the experiment directories for both Exp 0074 and 0075 by running the setup_SphinxTrain.pl script. I then edited the sphinx_train.cfg files in both experiment directories in preparation for the train to be started on 4/8.

4/8/13 5:15pm to 6:00pm - Found the pronunciations for 10 words to be used in the add.txt file after the train runs today. Emailed Eric the results

4/9/13 12:00 to 1:00pm - Read through some logs. Began putting together information useful for the URC systems group poster


 * Results:


 * Plan:


 * Concerns:

Week Ending April 16, 2013
4/13/13 - Although this post wasn't updated on 4/13 (due to a 14hr shift at work), I read through the assignments/goals for this week and caught up on email communication between the group.

4/14/13 - Read through the groups logs to find out where the group was at regarding this weeks tasks. There were only two tasks left Exp 0080 through Exp 0082 were completed. I created the Language model for Exp 0083. All that's left is to run the decode and to score. Updated the group log in google docs and on the group page on the wiki. Responded to emails from the systems group regarding the URC Poster.

4/15/13 - modified the sphinx_train.cfg file and generated the transcript file for 0083.

4/16/13 - read through the group assignment sheet to catch up on the current state of 0083.


 * Task:


 * Results:


 * Plan:


 * Concerns:

Week Ending April 23, 2013
4/17/13 - Created bullet points for the Systems group URC poster.

4/19/13 - Reviewed the URC poster. No other suggestions for improvement.


 * Task:


 * Results:


 * Plan:


 * Concerns:

Week Ending April 30, 2013
4/24/13 - Went over this week's goals with the group

4/27/13 - Read through logs and assignment sheet

4/29/13 - Started the decode on Exp0094. I'll score the experiment and create the 0094 wiki article. After decoding and scoring we ended up with a word error rate of around 57%. The experiment is complete and the wiki article has been created. Further analysis is needed to determine why the result was this bad.

SYSTEM SUMMARY PERCENTAGES by SPEAKER

,-.     |                            hyp.trans                            | |-|     | SPKR    | # Snt # Wrd | Corr    Sub    Del    Ins    Err  S.Err | |-+-+-|     |=================================================================|      | Sum/Avg |  437   6474 | 56.6   35.4    8.0   14.1   57.6   99.1 | |=================================================================|     |  Mean   | 36.4  539.5 | 57.0   35.4    7.7   15.1   58.1   99.3 | | S.D.   |  8.3  143.2 |  7.1    5.9    2.7    5.4    7.7    1.6 | | Median | 32.5  546.5 | 58.0   36.0    6.5   15.7   60.0  100.0 | `-'

Successful Completion

4/29/13 - Worked with Eric to create a new Exp (0096) in which new sphinx models were used. The decode is running overnight. I'll score and post the results on 4/30.

4/30/13 - The decode on Exp 0096 completed overnight. I scored it this morning and we ended up with an average error rate of 72.0. The Exp0096 wiki article has been created in which the results have been posted.

SYSTEM SUMMARY PERCENTAGES by SPEAKER

,-.     |                            hyp.trans                            | |-|     | SPKR    | # Snt # Wrd | Corr    Sub    Del    Ins    Err  S.Err | |-+-+-|     |=================================================================|      | Sum/Avg |  437   6474 | 37.1   48.3   14.6    9.2   72.0   99.8 | |=================================================================|     |  Mean   | 36.4  539.5 | 37.7   48.2   14.0    9.6   71.9   99.7 | | S.D.   |  8.3  143.2 |  7.1    4.0    5.5    3.4    6.7    1.1 | | Median | 32.5  546.5 | 37.4   47.8   13.3    9.4   71.9  100.0 | `-'

Successful Completion


 * Task:


 * Results:


 * Plan:


 * Concerns:

Week Ending May 7, 2013
5/3/13 - Read through this week's tasks and goals. Read through some logs. I may be able to work on some tasks on Exp0097 today and will email Charlie Haynes regarding the final report to see if there's anything I can do to help with that.

5/4/13 - Setup the experiment directory for Exp 0098, modified the sphinx_train.cfg file, ran genTrans, copied over the dictionary from 0089 and the filler dictionary. Commented the assignment sheet asking the group to check 0097.dict before moving forward. Not sure if copying over 0089.dict and renaming it 0097.dict was correct.

5/6/13 - Ran genTrans6.pl on the last_5hr/Train corpus for 0099, copied the dictionary from 0089 and the filler dictionary into 0099/etc. A few closing brackets were grepped out of 0099_train.trans and have been sent to Eric. I generated phone and feats and am hold until we get these corrected. Eric corrected genTrans6.pl and I was able to continue. Ran into another issue where I missed adding "SIL" to 0099.phones. That was corrected and was able to complete running the train and creating the experiment wiki. There is no decode / score for this experiment.

5/6/13 - Began working on Exp 0100. Setting up the directory and modifying sphinx_train.cfg were already completed. I generated the transcript, set up 0100 with the dictionary and filler, generated phones, generated feats. Left off here due to other obligations.

5/7/13 - Read through logs and the group assignment sheet. Created the experiment page for 0100. The group will update the page with the score once completed.


 * Task:


 * Results:


 * Plan:


 * Concerns: