Speech:Spring 2017 Systems Group

From Openitware
Jump to: navigation, search


Groups


Group Member Logs


In-House Useful Resources for the Systems Group


Tasks

Week Ending February 21th, 2017

SSH keygen guide

Start out by logging into caesar with your active directory username and password. Once you have done that type ssh-keygen into the terminal.

[jhc1@caesar ~]$ ssh-keygen

Press enter on the follow up prompts to enter default values.

Enter file in which to save the key (/mnt/main/home/sp17/jhc1/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:

Take note of the locations the keys are stored -

Your identification has been saved in /mnt/main/home/sp17/jhc1/.ssh/id_rsa.
Your public key has been saved in /mnt/main/home/sp17/jhc1/.ssh/id_rsa.pub.

At, this point we need to create the symbolic link -

[jhc1@caesar ~]$ cd .ssh
[jhc1@caesar ~/.ssh]$ ln -s id_rsa.pub authorized_keys

Week Ending February 28th, 2017

Step 1: OS Installation

To begin, acquire a RedHat installation disk from the disc case located on the server rack.

If you are attempting to install Redhat from a CD on a UNH PowerEdge 1750, YOU WILL NEED AN OPTICAL DRIVE!!!!

1. To install RedHat boot from the CD. When prompted, select "install or upgrade an existing system." Hit enter.

2. On the following screen, you will be prompted to test media. (you can skip this). Then, click hit enter.

3. On the following screen splash screen. Click next.

4. Select the installation language (English) Click next.

5. Select your keyboard set up (U.S. English). Click next.

6. On the following screen, select "Basic Storage Devices". Click next.

7. Enter the HostName (aka caesar, miraculix, obelix...). Click next.

8. Select time zone. (Eastern Time). Click next.

9. Enter Root Password. This will be the machine's local root password. Ask Professor Jonas. Click next.

10. For the installation type, select "Use All Space". Click next.

11. On the following screen, since we would like a nice UI, select Desktop. Next, go down and select the "customize now" radio button. Click next.

12. On the customize installation screen. Select "base system" then check the following options: "compatibility libraries", "Legacy UNIX compatibility", and "Network Tools". Optional. if this server will host NFS, perform the following. Next, In the left drown down menu, select "server". Check "NFS file Server. Click next.

13. Congratulations! You've successfully installed Redhat! Click Reboot when the installation process finishes.

Step 2: Network setup

Each server has two NICs. Caesar (the main server) uses one to connect to the internet and one for the local drone network (192.168.10.0/24). The drones are currently only using both NICs. The secondary NIC on the drones are on a different subnet (172.16.0.0/24) that is connected to Rome (the drones have access to the Internet via Rome).


Host File Configuration

First, let's configure the host file. To configure the host file, navigate to /etc/ and open the file named hosts.

Add the IP address and corresponding host name for each machine.

192.168.10.1    caesar caesar
192.168.10.2    asterix asterix
192.168.10.3    obelix obelix
192.168.10.4    miraculix miraculix
192.168.10.6    majestix majestix
192.168.10.7    idefix idefix
192.168.10.9    methusalix methusalix
192.168.10.11   rome rome 
192.168.10.12   brutus brutus

For more information on configuring hosts, check out this link

(optional) change a hostname after installation

To change a hostname after installation, Open the network file in the vi (or nano) editor.

vi /etc/sysconfig/network

In the editor, set the hostname

NETWORKING=yes
HOSTNAME=brutus


NIC setup

1. Open a terminal, this can be done by right clicking on the desktop and clicking "Open in Terminal"

2. Enter
ifconfig -a
to view all network cards.

3. Copy MAC address for the NIC you wish to use. e.g. 00:21:70:XX:XX:XX. You will need this to configure the Ethernet Adapter.

4. Navigate to /etc/sysconfig/network-scripts/. In here you should find a file named ifconfig-ethX where X represent the Ethernet number relative to the OS.

Note If you installed Redhat on a one server (the one with a CD drive) and moved the hard drive to a different server, the X value will increase to 3 and 4 (e.g. eth3 and eth4) the OS will not create a new config flle.

Solution Rename the file to the appropriate Ethernet X value. e.g ifconfig-eth3

5. Configure the network card. Example is for setting up Ethernet adapter 4 for obelix.


DEVICE=eth4
IPADDR=192.168.10.3
NETMASK=255.255.255.0
NETWORK=192.168.10.1
BROADCAST=192.168.10.255
HWADDR=00:0F:1F:03:E6:84
TYPE=Ethernet
UUID=50e385fd-3457-4668-9eb5-6befce9b54ee
ONBOOT=yes
NM_CONTROLLED=no
BOOTPROTO=static

It is important that you set NM_CONTROLLED=no. This lets the network manager know that it will not configure the ethernet adapter.

For the static IPs, please refer to the host file above.

For more information about configuring NICs, check out this link

Configure DNS Server on Drone

To configure Caesar as a drone's DNS server, add the following to /etc/resolv.conf:

search=caesar
 
nameserver 132.177.128.99
nameserver 132.177.128.56
nameserver 132.177.102.30

When completed, restart the network service:

service network restart

The drones network settings should be good to go!

Step 4: Mount Caesar NFS on Drones

Since we don't want to install sphinx on every drone, Caesar is set up to host /mnt/main to share resources and save disk space.

For this, NFS utilities will be required. If it was not installed during the initial installation, install it with this command:

yum install nfs-utils

Next, create a /mnt/main directory on the drone to mount Caesar's on:

mkdir /mnt/main

Then, try mounting Caesar using the following command.

mount -t nfs caesar:/mnt/main /mnt/main

Since we don't want to have to mount Caesar network drive every time we reboot the system. Add the following command to /etc/fstab:

caesar:/mnt/main        /mnt/main               nfs     defaults        0 0

When you're done, save the file.

Next, add a cron job to automatically mount the file-system periodically just in case the connection between the drone and Caesar is lost. Open the terminal as root user and enter the following:

crontab -e


Once the editor opens, enter the following:

0 0,6,12,18 * * * mount -a

Week Ending March 7, 2017

Registering Red Hat Linux on Cinnabar

How to (re) register your system against the Red Hat Satellite server cinnabar.unh.edu

1) If your system is already registered through RHN Classic, you want to use the migration script that is available - it not only registers your system against cinnabar, it will also remove the system from the RHN Classic registration system, which will make it easier for us to keep track of the systems that still need to be migrated.

2) You need to ensure that the following packages are installed. This is required for either a system being re-registered from RHN Classic to Subscription Management, or for a new system being registered for the first time.

   a) 
   If you are still registered via the RHN Classic system, then you can install these RPM packages via
   the normal mechanism.

# yum -y install subscription-manager subscription-manager-gui subscription-manager-migration

   b) 
   OR, iof you have just installed a system, then you will need to hunt down the same RPM packages on
   the ISO, and install them using rpm:

# rpm -ivh subscription-manager*.rpm

3) For either newly created systems, or systems being re-registered, you need to install this package/certificate so the subscription-manager utility(or the migration script which uses it) will talk to the cinnabar Satellite server, rather than directly back to Red Hat.

# wget http://cinnabar.unh.edu/pub/katello-ca-consumer-cinnabar.unh.edu-1.0-1.noarch.rpm # yum -y --nogpgcheck install katello-ca-consumer-cinnabar.unh.edu-1.0-1.noarch.rpm

OR, you can try doing it directly:

# rpm -Uvh http://cinnabar.unh.edu/pub/katello-ca-consumer-cinnabar.unh.edu-1.0-1.noarch.rpm

The instructions on the cinnabar GUI ("Register Host") that talk about the same step refer to katello-ca-consumer-latest.noarch.rpm, but that is the same file as the one in my example.

They will be the same files for the foreseeable future. I prefer to use the specifically named package that actually has the Satellite hostname name on it. Your choice, though. Whatever.

4) IMPORTANT! Remove yum repo metadata left over from other sources

# yum clean all # rm -rf /var/cache/yum/*

Otherwise the next time you do a yum update after re-registering with cinnabar, you might get errors about the repomd file having a bad crc, having the wrong timestamp, etc.

This is probably only really important on systems being re-registered/re-linked to the cinnabar Satellite after having been on RHN Classic, but it won't hurt.

5) If you are registering a new system that has never been registered to the Red Hat RHN Classic please go to step #7.

6) What follows is a (redacted) session from where I ran the Red Hat migration tool that takes your system out of the RHN Classic network, and then plugs it into the new Subscription Management system.

The tool is provided in the subscription-manager-migration RPM you installed in the first step.

EXAMPLE MIGRATION SESSION: ---

   # rhn-migrate-classic-to-rhsm --org="<Our SADA cinnabar Org name>" --activation-key="<One of our SADA cinnabar Activation keys>"
   Legacy username: unh_enterprisegroup	<<<<< Our old org account from Red Hat RHN Classic - use the one I gave you
   Legacy password: 				<<<<< the password for that account
   Retrieving existing legacy subscription information...
   +-----------------------------------------------------+
   System is currently subscribed to these legacy channels:
   +-----------------------------------------------------+
   rhel-x86_64-server-6
   rhel-x86_64-server-optional-6
   +-----------------------------------------------------+
   Installing product certificates for these legacy channels:
   +-----------------------------------------------------+
   rhel-x86_64-server-6
   rhel-x86_64-server-optional-6
   Product certificates installed successfully to /etc/pki/product.
   Preparing to unregister system from legacy server...
   System successfully unregistered from legacy server.
   Attempting to register system to destination server...
   The system has been registered with ID: a606ec07-f600-45e3-ba34-a33dbecf1f77 
   Installed Product Current Status:
   Product Name: Red Hat Enterprise Linux Server
   Status:       Subscribed
   System 'cs01.unh.edu' successfully registered.
   #

--- The Legacy account/password information is what allows the script to remove/deregister your system from RHN Classic.

NEW: After running the migration script, you should ensure that

a) All copies of /etc/sysconfig/rhn/systemid have been -removed-. They are not part of the Satellite/Subscription

  Management mechanism, and can confuse things.

b) In /etc/yum/pluginconf.d/rhnplugin.conf, ensure that "enabled = 0". This also is part of the old RHN Classic

  system, and needs to be disabled.

Your system should now show up on the cinnabar Satellite server, but you are not completely done. Go to step #8.

7) If you are registering a system that has never been registered, or has been previously registered to the Red Hat Subscription Management system(but NOT the RHN Classic system):

This session was from re-registering a system that was already in Subscription Management.

  1. subscription-manager register --force --org="<Your cinnabar Org Name>" --activationkey="<Your selected cinnabar activation key>"

The system with UUID b7424cc1-5d83-4c14-b6b4-ef2ce3dc2585 has been unregistered The system has been registered with ID: a78d0cfb-c87c-4d81-ade5-0d510faeaab3

Installed Product Current Status: Product Name: Red Hat Enterprise Linux Server Status: Subscribed

If registering a system for the first time, then you do not need the --force option.

8) OK, now make sure you have the RH Common repo enabled so you can install the katello-agent package

  1. subscription-manager repos --enable=rhel-6-server-rh-common-rpms

The "rhel-6" part of that will obviously need to change depending on whether this system is running RHEL5, 6, or 7. If you want to see what repos are available to you at this point:

  1. subscription-manager repos --list

This utility is actually pretty friendly.


9) Install the katello-agent package. It should automatically set up the goferd daemon, enable it in /etc/init.d, and also start it up immediately. Not that it would hurt to make sure by doing a "# /etc/init.d/goferd restart" afterwards.

Before you do this, you may want to re-re-reflush the yum cache(again):

  1. yum clean all
  2. rm -rf /var/cache/yum/*

Then install the katello agent/goferd.

  1. yum -y install katello-agent
  1. chkconfig --list goferd

goferd 0:off 1:off 2:on 3:on 4:on 5:on 6:off

10) Unfortunately, when the katello-agent package is installed, it doesn't automagically know how to link itself into the certificates for cinnabar/Satellite that you installed in step 3. So until you fix that, goferd will probably be flooding /var/log/messages with certificate failure errors.

Edit this config file that goferd uses:

   /etc/gofer/plugins/katelloplugin.conf

And either comment out or delete the original line, and change the cacert config option so it points goferd to the correct certificate, katello-server-ca.pem.

   #cacert=/etc/rhsm/ca/candlepin-local.pem
   cacert=/etc/rhsm/ca/katello-server-ca.pem

Now restart goferd, and it should no longer generate those certificate errors. In fact, it should be fairly silent, with the last message from it in /var/log/messages being to the effect that it is successfully connecting to the AMQP service on cinnabar/Satellite.


In addition, you will probably need to execute the following command - in some/most cases, the goferd daemon will still not connect to cinnabar without it:

  1. katello-package-upload

11) You are done. Your system should show up under your organization on the cinnabar Satellite server as registered, and because you've installed the katello-agent/goferd mechanism, your system can now report back its RPM package status, as well as other information.

Activating License Key

Under root, use the following command:

subscription-manager register --org="UNHM" --activationkey="***license key goes here***"

Week Ending March 14, 2017

Physical Connections/Connectivity

Below is a table that describes the connections to the physical interfaces on Caesar and the other drones. Use this guide as a point of reference in case any cables need to be replaced or swapped. Machines are ordered from top down as they are situated on the rack.

Hardware Component Name Primary Interface IP Secondary Interface IP Number of Drives Drive Specs eth0 Connectivity eth1 Connectivity Power Supply 1 Status Power Supply 2 Status
Rome 192.168.10.11 172.16.0.11 1 73GB 10k Enterasys port 7 Enterasys port 32 Plugged in Unplugged
Brutus 192.168.10.12 N/A Enterasys port 8 N/A Plugged in
Asterisk 192.168.10.2 172.16.0.2 1 73GB 15k Enterasys port 2 Enterasys port 31 Plugged in N/A
Miraculix 192.168.10.4 172.16.0.4 1 73GB 15k Enterasys port 6 Enterasys port 30 Plugged in N/A
Obelix 192.168.10.3 172.16.0.3 1 73GB 15k Enterasys port 3 Enterasys port 29 Plugged in N/A
Majestix 192.168.10.6 172.16.0.6 2 73GB 15k Enterasys port 4 Enterasys port 27 Plugged in N/A
Idefix 192.168.10.7 172.16.0.7 1 73GB 10k Enterasys port 5 Enterasys port 26 Plugged in N/A
Methusalix 192.168.10.9 N/A Enterasys port 9 Plugged in N/A
Caesar 192.168.10.1 N/A 8 Port 37-N 1 on main patch panel Enterasys port 1 Plugged in Unplugged

Week Ending April 25, 2017

Installation of Torque Install of Torque 6.0.2

http://docs.adaptivecomputing.com/torque/6-0-2/adminGuide/help.htm#topics/torque/0-intro/torquewelcome.htm%3FTocPath%3DWelcome%7C_____0

All the information for torque We installed Torque directly from the git hub with no issue. If you clone from github make sure these packages are installed! Be sure to install Git. [yum install git] gcc gcc-c++ posix-compatible version of make libtool 1.5.22 or later boost-devel 1.36.0 or later

Make sure that you have all necessary packages installed. We ran into an issue trying to get gcc to install. We ran a yum update which took about twenty minutes after once that was completed gcc was installed fine.

[root]# yum install libtool openssl-devel libxml2-devel boost-devel gcc gcc-c++

Open Necessary Ports

On the Torque Server Host:

[root]# iptables-save > /tmp/iptables.mod
[root]# vi /tmp/iptables.mod
				
# Add the following line immediately *before* the line matching
# "-A INPUT -j REJECT --reject-with icmp-host-prohibited"

-A INPUT -p tcp --dport 15001 -j ACCEPT
		
[root]# iptables-restore < /tmp/iptables.mod				
[root]# service iptables save

On the Torque MOM Hosts (compute nodes):

[root]# iptables-save > /tmp/iptables.mod
[root]# vi /tmp/iptables.mod
	
# Add the following lines immediately *before* the line matching
# "-A INPUT -j REJECT --reject-with icmp-host-prohibited"

-A INPUT -p tcp --dport 15002:15003 -j ACCEPT
				
[root]# iptables-restore < /tmp/iptables.mod
[root]# service iptables save

Verify the hostname

On the Torque Server Host, confirm your host (with the correct IP address) is in your /etc/hosts file. To verify that the hostname resolves correctly, make sure that hostname and hostname -f report the correct name for the host.

When getting the tarball source distribution don’t use the wget command that is provided in the Adaptive Computing documentation Andrew and Julian discovered that they had moved their server. Instead use:

[root]# yum install wget
[root]# wget http://wpfilebase.s3.amazonaws.com/torque/torque-6.0.2-1469811694_d9a3483.tar.gz -O torque-6.0.2.tar.gz
[root]# tar -xzvf torque-6.0.2.tar.gz
[root]# cd torque-6.0.2/

[root]# ./configure
[root]# make
[root]# make install

Verify that the /var/spool/torque/server_name file exists and contains the correct name of the server.

[root]# echo <torque_server_hostname> > /var/spool/torque/server_name

Configure the trqauthd daemon to start automatically at system boot.

[root]# cp contrib/init.d/trqauthd /etc/init.d/
[root]# chkconfig --add trqauthd
[root]# echo /usr/local/lib > /etc/ld.so.conf.d/torque.conf
[root]# ldconfig
[root]# service trqauthd start

By default, Torque installs all binary files to /usr/local/bin and /usr/local/sbin. Make sure the path environment variable includes these directories for both the installation user and the root user.

[root]# export PATH=/usr/local/bin/:/usr/local/sbin/:$PATH

Initialize serverdb by executing the torque.setup script.

[root]# ./torque.setup root

Add nodes to the /var/spool/torque/server_priv/nodes file. See Specifying Compute Nodes (http://docs.adaptivecomputing.com/torque/6-0-2/adminGuide/Content/topics/torque/1-installConfig/specifyComputeNodes.htm) for information on syntax and options for specifying compute nodes.

Configure pbs_server to start automatically at system boot, and then start the daemon.

[root]# cp contrib/init.d/pbs_server /etc/init.d
[root]# chkconfig --add pbs_server
[root]# service pbs_server restart

Install Torque MOMs

On the Torque Server Host, do the following:

Create the self-extracting packages that are copied and executed on your nodes.

[root]# make packages
Building ./torque-package-clients-linux-x86_64.sh ...
Building ./torque-package-mom-linux-x86_64.sh ...
Building ./torque-package-server-linux-x86_64.sh ...
Building ./torque-package-gui-linux-x86_64.sh ...
Building ./torque-package-devel-linux-x86_64.sh ...
Done.

The package files are self-extracting packages that can be copied and executed on your production machines. Use --help for options.

Copy the self-extracting packages to each Torque MOM Host.

[root]# scp torque-package-mom-linux-x86_64.sh <mom-node>:
[root]# scp torque-package-clients-linux-x86_64.sh <mom-node>:

Copy the pbs_mom startup script to each Torque MOM Host.

[root]# scp contrib/init.d/pbs_mom <mom-node>:/etc/init.d

On each Torque MOM Host, do the following: Install the self-extracting packages and run ldconfig.

[root]# ssh root@<mom-node>
[root]# ./torque-package-mom-linux-x86_64.sh --install
[root]# ./torque-package-clients-linux-x86_64.sh --install
[root]# echo /usr/local/lib > /etc/ld.so.conf.d/torque.conf
[root]# ldconfig

Configure pbs_mom to start at system boot, and then start the daemon.

[root]# chkconfig --add pbs_mom
[root]# service pbs_mom start

When configuring the nodes, if you encounter an error such as:

Starting TORQUE Mom: /usr/local/sbin/pbs_mom: error while loading shared libraries: libxml2.so.2: cannot open shared object file: No such file or directory [FAILED]

Try updating the shared libraries with

yum update install libxml2.so.2

NOTE: if you run into the same error again but this a different library adjust the command accordingly Example: If the missing library is libxml2.so.3 the yum command would look like

yum update install libxml2.so.3 

Verify all queues are properly configured:

[acg12@rome ~]$ qstat -q

View additional server configuration:

[acg12@rome ~]$ qmgr -c 'p s'
#
# Create queues and set their attributes
#
#
# Create and define queue batch
#
create queue batch
set queue batch queue_type = Execution
set queue batch resources_default.nodes = 1
set queue batch resources_default.walltime = 01:00:00
set queue batch enabled = True
set queue batch started = True
#
# Set server attributes.
#
set server scheduling = True
set server acl_hosts = kmn
set server managers = user1@kmn
set server operators = user1@kmn
set server default_queue = batch
set server log_events = 511
set server mail_from = adm
set server node_check_rate = 150
set server tcp_timeout = 300
set server job_stat_rate = 45
set server poll_jobs = True
set server mom_job_sync = True
set server keep_completed = 300
set server next_job_number = 0

Verify all nodes are correctly reporting

[acg12@rome ~]$ pbsnodes -a
majestix
     state = free
     power_state = Running
     np = 1
     ntype = cluster
     status = 
rectime=1493331031,macaddr=00:19:b9:e7:51:7a,cpuclock=UserSpace:2333MHz,varattr=,jobs=,state=free,netload=225079103,gres=,loadave=0.22,ncpus=8,physmem=16333236kb,availmem=22841656kb,totmem=23435696kb,idletime=99001,nusers=1,nsessions=1,sessions=9932,uname=Linux majestix 2.6.32-642.15.1.el6.x86_64 #1 SMP Mon Feb 20 02:26:38 EST 2017 x86_64,opsys=linux
     mom_service_port = 15002
     mom_manager_port = 15003

Submit a basic job - DO NOT RUN AS ROOT:

[acg12@rome ~]$ echo "sleep 30" | qsub
6.rome

Verify jobs display:

[acg12@rome ~]$ qstat
Job ID                    Name             User            Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
6.rome                     STDIN            acg12           00:00:00 C batch

At this point, the job should be in the Q state and will not run because a scheduler is not running yet. Torque can use its native scheduler by running pbs_sched or an advanced scheduler (such as Moab Workload Manager). See Integrating schedulers for details on setting up an advanced scheduler.

This is where we decided to use Maui as the scheduler. Here is the link to the guide: http://docs.adaptivecomputing.com/maui/mauistart.php We installed version 3.3.1 which was the latest at the time of installation.


From here you will need to edit the Sphinx config files to work with Torque

http://messe2media.com/files/sphinx_train.cfg

Some other possibly helpful links:

http://www.speech.cs.cmu.edu/sphinx/tutorial.html http://cmusphinx.sourceforge.net/wiki/tutorialam https://sourceforge.net/p/cmusphinx/discussion/help/thread/8fff1c03/ https://upcommons.upc.edu/bitstream/handle/2099.1/7336/ThesisReport.pdf http://www.voxforge.org/home/forums/message-boards/acoustic-model-discussions/problem-in-training-stage-using-sohinx/7