Queuing System Guide - Slurm | UIT High-Performance Computing

Overview

In an HPC cluster, the users tasks to be done on compute nodes are controlled by a batch queuing system. On Talon 3, we have chosen The Slurm Workload Manager or Slurm.

Queuing systems manage job requests (shell scripts generally referred to as jobs) submitted by users. In other words, to get your computations done by the cluster, you must submit a job request to a specific batch queue. The scheduler will assign your job to a compute node in the order determined by the policy on that queue and the availability of an idle compute node. Currently, Talon 3 resources have several policies in place to help guarantee fair resource utilization.

Partitions

On Talon3, the main partition is named public. There is a limit of 672 CPUs or 24 compute nodes.

There are other private partitions for users that need more computing resources. Please contact hpc-admin@unt.edu for more info and to request these partitions.

QOS (Quality of Service)

There will be three QOS's under the 'public' partition

Name	Description
debug	For running quick test computations for debugging purposes. Limit to 2 hours and 2 compute nodes Exclusive Jobs Allowed
general	This is the default QOS for submitting jobs that take 72 hours or less. Time Limit: 72 hours Limit 616 CPUs
large	This QOS is for large jobs. Time Limit: 3 Weeks Limit 22 compute nodes. Exclusive Jobs Allowed
unlimited	Limits: Two Jobs Exclusive Jobs Allowed

SLURM Commands

The following table lists frequently used commands

Slurm Command	Description	UQE Equiv.
sbatch script.job	submit a job	qsub script.job
squeue [job_id]	display job status (by job)	qstat [job_id]
squeue -u EUID	display status user's jobs	qstat -u
squeue	display queue summary status	qstat -g c
scancel	delete a job in current state	qdel
scontrol update	modify a pending job	qalter

Job State

When using squeue, the following job states are possible.
State	Full State Name	Description
R	RUNNING	The job currently has an allocation.
CA	CANCELED	The job was explicitly canceled by the user or system administrator. The job may or may not have been initiated.
CD	COMPLETED	The job has been terminated all processes on all nodes.
CF	CONFIGURING	The job has been allocated resources, but are waiting for them to become ready for use (e.g. booting)
CG	COMPLETING	The job is in the process of completing. Some processes on some nodes may still be active.
F	FAILED	The job terminated with non-zero exit code or other failure condition.
NF	NODE_FAIL	The job terminated due to failure of one or more allocated nodes.
PD	PENDING	The job is awaiting resource allocation.
PR	PREEMPTED	The job terminated due to preemption.
S	SUSPENDED	The job has an allocation, but execution has been suspended.
TO	TIMEOUT	The job terminated upon reaching its time limit.

The following table lists common SLURM variables, with their UGE Equivalents; for a compete list see the sbatch manpage:

SLURM Variable	Description	UGE Variable
SLURM_SUBMIT_DIR	current working directory of the submitting client	SGE_O_WORKDIR
SLURM_JOB_ID	unique identifier assigned when the job was submitted	JOB_ID
SLURM_NTASKS	number of CPUs in use by a parallel job	NSLOTS
SLURM_NNODES	number of hosts in use by a parallel job	NHOSTS
SLURM_ARRAY_TASK_ID	index number of the current array job task	SGE_TASK_ID
SLURM_JOB_CPUS_PER_NODE	number of CPU cores per node
SLURM_JOB_NAME	Name of JOB

Job Submission Tips

At the top of your job script, begin with special directive #$, which are sbatch options. Alternatively these options also could be submitted as command line options with srun.

#SBATCH -p public

Defines the partition which may be used to execute this job. The only partition on Talon3 is 'public'.
#SBATCH -J job_name

Defines the job name.
#SBATCH -o JOB.o%j

Defines the output file name.
#SBATCH -e JOB.e%j

Defines the error file name.
#SBATCH --qos general

Defines the QOS the job will be executed. (debug, general, large are the only options)
#SBATCH -t 80:00:00

Sets up the WallTime Limit for the job in hh:mm:ss.
#SBATCH -n 84

Defines the total number of mpi tasks.
#SBATCH -N 3

Defines the number of compute nodes requested.
#SBATCH --ntasks-per-node 28

Defines the number of tasks per node.
#SBATCH -C c6320

Requests the c6320 compute nodes. (Also can request R420, R720, and R730 compute nodes)
#SBATCH --mail-user=user@unt.edu

Sets up email notification.
#SBATCH --mail-type=begin

Email user when job begins.
#SBATCH --mail-type=end

Email user when job finishes.

Basic Information about Slurm:

The Slurm Workload Manager (formally known as Simple Linux Utility for Resource Management or SLURM), or Slurm, is a free and open-source job scheduler for Linux and Unix-like kernels, used by many of the world's supercomputers and computer clusters. It provides three key functions. First, it allocates exclusive and/or non-exclusive access to resources (computer nodes) to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing, and monitoring work (typically a parallel job such as MPI) on a set of allocated nodes. Finally, it arbitrates contention for resources by managing a queue of pending jobs. Slurm is the workload manager on about 60% of the TOP500 supercomputers, including Tianhe-2 that, until 2016, was the world's fastest computer. Slurm uses a best fit algorithm based on Hilbert curve scheduling or fat tree network topology in order to optimize locality of task assignments on parallel computers.

Slurm Tutorials and Commands:

A Quick-Start Guide for those unfamiliar with Slurm can be found here:

https://slurm.schedmd.com/quickstart.html

Slurm Tutorial Videos can be found here for additional information:

https://slurm.schedmd.com/tutorials.html

Sample Slurm Job Script:

Simple serial job script example

#!/bin/bash
######################################

# Example of a SLURM job script for Talon3

# Job Name: Sample_Job

# Number of cores: 1

# Number of nodes: 1

# QOS: general

# Run time: 12 hrs

######################################

#SBATCH -J Sample_Job

#SBATCH -o Sample_job.o%j
#SBATCH -p public

#SBATCH --qos general

#SBATCH -N 1

#SBATCH -n 1

#SBATCH -t 12:00:00

### Loading modules

module load intel/PS2017

./a.out > outfile.out

Simple parallel MPI job script example

#!/bin/bash
######################################

# Example of a SLURM job script for Talon3

# Job Name: Sample_Job

# Number of cores: 28

# Number of nodes: 1

# QOS: general

# Run time: 12 hrs

######################################

#SBATCH -J Sample_Job

#SBATCH -o Sample_job.o%j
#SBATCH -p public

#SBATCH --qos general

#SBATCH -N 1

#SBATCH -n 28

#SBATCH -t 12:00:00

### Loading modules

module load intel/PS2017

### Use mpirun to run parallel jobs

mpirun ./a.out > outfile.out

Large MPI job script example

#!/bin/bash
######################################

# Example of a SLURM job script for Talon3

# Job Name: Sample_Job

# Number of cores: 112

# Number of nodes: 4

# QOS: general

# Run time: 12 hrs

######################################

#SBATCH -J Sample_Job

#SBATCH -o Sample_job.o%j
#SBATCH -p public

#SBATCH --qos general

#SBATCH -N 4

#SBATCH -n 112

#SBATCH -t 12:00:00

### Loading modules

module load intel/PS2017

## Use mpirun for MPI jobs

mpirun ./a.out > outfile.out

OPENMP job script example

#!/bin/bash
######################################

# Example of a SLURM job script for Talon3

# Job Name: Sample_Job

# Number of MPI tasks: 1

# Number of nodes: 1

# QOS: general

# Run time: 12 hrs

######################################

#SBATCH -J Sample_Job

#SBATCH -o Sample_job.o%j
#SBATCH -p public

#SBATCH --qos general

#SBATCH -N 4

#SBATCH -n 112

#SBATCH -t 12:00:00

### Loading modules

module load intel/PS2017

### Set the number of threads

export OMP_NUM_THREADS=28

./a.out > outfile.out

CUDA parallel GPU job script example

#!/bin/bash
######################################
# Example of a SLURM job script for Talon3
# Job Name: Sample_Job
# Number of devices(GPUs): 2
# Number of nodes: 1
# QOS: general
# Run time: 12 hrs
######################################

#SBATCH -J Sample_Job
#SBATCH --ntasks=1
#SBATCH --qos=general
#SBATCH -p public
#SBATCH --gres=gpu:1
#SBATCH -t 12:00:00

### execute code
./a.out -numdevices=2

Submit the job:
$ sbatch slurm-job.sh

Interactive Jobs

Interactive job sessions can be used on Talon if you need to compile or test software. An example command of starting an interactive sessions is shown below:

srun -p public --qos general -N 1 --pty bash

This launches an interactive job session and lanches a bash shell to a compute node. From there, you can exectue software and shell commands that would otherwise not be allowed on the Talon login nodes.

List jobs:
$ squeue -u $USER

Output:
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
106 public slurm-jo rstober R 0:04 1 atom01

Get job details:
$ scontrol show job 106

Kill a job. Users can kill their own jobs, root can kill any job.
$ scancel $JOB_ID
where $JOB_ID is the job ID number of the job you want to be killed

Hold a job:
$ scontrol hold 139

Release a job:
$ scontrol release 139

Quick Links

UIT High-Performance Computing