Speech:Spring 2019 Software Group Torque

From Openitware
Jump to: navigation, search

Project Logs

Project Member Logs


  • Rome(host only)
  • Astronomix(mom_pbs)
  • Obelodalix(mom_pbs)


Torque Introduction

  • Torque is an open source resource manager providing control over batch jobs and distributed compute nodes. One can setup a home or small office Linux cluster and queue jobs with this software. For the purpose of this course, Torque is used to queue experiments to the server in order to drastically reduce the amount of time the experiments take. A cluster consists of one head node and many compute nodes. The head node runs the torque-server daemon and the compute nodes run the torque-client daemon. The head node also runs a scheduler daemon.
  • While Torque has a built-in scheduler, pbs_sched, it is typically used solely as a resource manager with a scheduler making requests to it. Resources managers provide the low-level functionality to start, hold, cancel, and monitor jobs. Without these capabilities, a scheduler alone cannot control jobs.
  • While Torque is flexible enough to handle scheduling a conference room, it is primarily used in batch systems. Batch systems are a collection of computers and other resources (networks, storage systems, license servers, and so forth) that operate under the notion that the whole is greater than the sum of the parts. Some batch systems consist of just a handful of machines running single-processor jobs, minimally managed by the users themselves. Other systems have thousands and thousands of machines executing users' jobs simultaneously while tracking software licenses and access to hardware equipment and storage systems. Pooling resources in a batch system typically reduces technical administration of resources while offering a uniform view to users. Once configured properly, batch systems abstract away many of the details involved with running and managing jobs, allowing higher resource utilization. For example, users typically only need to specify the minimal constraints of a job and do not need to know the individual machine names of each host on which they are running. With this uniform abstracted view, batch systems can execute thousands and thousands of jobs simultaneously. Batch systems are comprised of four different components: (1) Master Node, (2) Submit/Interactive Nodes, (3) Compute Nodes, and (4) Resources.

Steps To Install Torque

1.log into Rome as root.(this is needed as only root has access to torque files)

2.ls then cd into torque-6.0.2

3.use ls to see list of files

4.use scp torque-package-mom-linux-i686.sh <server name>:mom.sh to transfer mom installer

5.use scp torque-package-clients-linux-i686.sh <server name>: client.sh to transfer client installer

6.scp contrib/init.d/pbs_mom <server name>:/etc/init.d

7.ssh to the server as root

8. use the command ./mom.sh --install

9. use the command ./client.sh --install

10.echo /usr/local/lib > /etc/ld.so.conf.d/torque.conf


12.chkconfig --add pbs_mom

13.service pbs_mom start

14. open ports 15002 and 15003

Torque Requirements

Supported Operating Systems

  • CentOS 6.x, 7.x
  • RHEL 6.x, 7.x
  • Scientific Linux 6.x, 7.x
  • SUSE Linux Enterprise Server 11, 12

Torque Architecture

  • A Torque cluster consists of one head node and many compute nodes. The head node runs the pbs_server daemon and the compute nodes run the pbs_mom daemon. Client commands for submitting and managing jobs can be installed on any host (including hosts not running pbs_server or pbs_mom). The head node also runs a scheduler daemon. The scheduler interacts with pbs_server to make local policy decisions for resource usage and allocate nodes to jobs. A simple FIFO scheduler, and code to construct more advanced schedulers, is provided in the Torque source distribution. Most Torque users choose to use a packaged, advanced scheduler such as Maui or Moab. Users submit jobs to pbs_server using the qsub command. When pbs_server receives a new job, it informs the scheduler. When the scheduler finds nodes for the job, it sends instructions to run the job with the node list to pbs_server. Then, pbs_server sends the new job to the first node in the node list and instructs it to launch the job. This node is designated the execution host and is called Mother Superior. Other nodes in a job are called sister MOMs.

Torque References