Queuing System Guide - Slurm
Contents:
Overview
In an HPC cluster, the users tasks to be done on compute nodes are controlled by a batch queuing system. On Talon 3, we have chosen The Slurm Workload Manager or Slurm.
Queuing systems manage job requests (shell scripts generally referred to as jobs) submitted by users. In other words, to get your computations done by the cluster, you must submit a job request to a specific batch queue. The scheduler will assign your job to a compute node in the order determined by the policy on that queue and the availability of an idle compute node. Currently, Talon 3 resources have several policies in place to help guarantee fair resource utilization.
Partitions
On Talon3, the main partition is named public. There is a limit of 672 CPUs or 24 compute nodes.
There are other private partitions for users that need more computing resources. Please contact hpc-admin@unt.edu for more info and to request these partitions.
QOS (Quality of Service)
There will be three QOS's under the 'public' partition
Name | Description |
---|---|
debug |
For running quick test computations for debugging purposes.
Limit to 2 hours and 2 compute nodes
Exclusive Jobs Allowed
|
general |
This is the default QOS for submitting jobs that take 72 hours or less.
Time Limit: 72 hours
Limit 616 CPUs
|
large
|
This QOS is for large jobs.
Time Limit: 3 Weeks
Limit 22 compute nodes.
Exclusive Jobs Allowed
|
unlimited
|
Limits: Two Jobs
Exclusive Jobs Allowed
|
SLURM Commands
The following table lists frequently used commands
Slurm Command | Description | UQE Equiv. |
---|---|---|
sbatch script.job | submit a job | qsub script.job |
squeue [job_id] | display job status (by job) | qstat [job_id] |
squeue -u EUID | display status user's jobs | qstat -u |
squeue | display queue summary status | qstat -g c |
scancel | delete a job in current state | qdel |
scontrol update | modify a pending job | qalter |
Job State
State | Full State Name | Description |
R | RUNNING | The job currently has an allocation. |
CA | CANCELED | The job was explicitly canceled by the user or system administrator. The job may or may not have been initiated. |
CD | COMPLETED | The job has been terminated all processes on all nodes. |
CF | CONFIGURING | The job has been allocated resources, but are waiting for them to become ready for use (e.g. booting) |
CG | COMPLETING | The job is in the process of completing. Some processes on some nodes may still be active. |
F | FAILED | The job terminated with non-zero exit code or other failure condition. |
NF | NODE_FAIL | The job terminated due to failure of one or more allocated nodes. |
PD | PENDING | The job is awaiting resource allocation. |
PR | PREEMPTED | The job terminated due to preemption. |
S | SUSPENDED | The job has an allocation, but execution has been suspended. |
TO | TIMEOUT | The job terminated upon reaching its time limit. |
The following table lists common SLURM variables, with their UGE Equivalents; for a compete list see the sbatch manpage:
SLURM Variable | Description | UGE Variable |
---|---|---|
SLURM_SUBMIT_DIR | current working directory of the submitting client | SGE_O_WORKDIR |
SLURM_JOB_ID | unique identifier assigned when the job was submitted | JOB_ID |
SLURM_NTASKS | number of CPUs in use by a parallel job | NSLOTS |
SLURM_NNODES | number of hosts in use by a parallel job | NHOSTS |
SLURM_ARRAY_TASK_ID | index number of the current array job task | SGE_TASK_ID |
SLURM_JOB_CPUS_PER_NODE | number of CPU cores per node | |
SLURM_JOB_NAME | Name of JOB |
Job Submission Tips
At the top of your job script, begin with special directive #$, which are sbatch options. Alternatively these options also could be submitted as command line options with srun.
-
#SBATCH -p public
Defines the partition which may be used to execute this job. The only partition on Talon3 is 'public'.
-
#SBATCH -J job_name
Defines the job name.
-
#SBATCH -o JOB.o%j
Defines the output file name.
-
#SBATCH -e JOB.e%j
Defines the error file name.
-
#SBATCH --qos general
Defines the QOS the job will be executed. (debug, general, large are the only options)
-
#SBATCH -t 80:00:00
Sets up the WallTime Limit for the job in hh:mm:ss.
-
#SBATCH -n 84
Defines the total number of mpi tasks.
-
#SBATCH -N 3
Defines the number of compute nodes requested.
-
#SBATCH --ntasks-per-node 28
Defines the number of tasks per node.
-
#SBATCH -C c6320
Requests the c6320 compute nodes. (Also can request R420, R720, and R730 compute nodes)
-
#SBATCH --mail-user=user@unt.edu
Sets up email notification.
-
#SBATCH --mail-type=begin
Email user when job begins.
-
#SBATCH --mail-type=end
Email user when job finishes.
Basic Information about Slurm:
Slurm Tutorials and Commands:
Sample Slurm Job Script:
Simple serial job script example
#!/bin/bash
###################################### # Example of a SLURM job script for Talon3
# Job Name: Sample_Job
# Number of cores: 1
# Number of nodes: 1
# QOS: general
# Run time: 12 hrs
######################################
#SBATCH -J Sample_Job
#SBATCH -o Sample_job.o%j
#SBATCH -p public #SBATCH --qos general
#SBATCH -N 1
#SBATCH -n 1
#SBATCH -t 12:00:00
### Loading modules
module load intel/PS2017
./a.out > outfile.out
|
Simple parallel MPI job script example
#!/bin/bash
###################################### # Example of a SLURM job script for Talon3
# Job Name: Sample_Job
# Number of cores: 28
# Number of nodes: 1
# QOS: general
# Run time: 12 hrs
######################################
#SBATCH -J Sample_Job
#SBATCH -o Sample_job.o%j
#SBATCH -p public #SBATCH --qos general
#SBATCH -N 1
#SBATCH -n 28
#SBATCH -t 12:00:00
### Loading modules
module load intel/PS2017
### Use mpirun to run parallel jobs
mpirun ./a.out > outfile.out
|
Large MPI job script example
#!/bin/bash
###################################### # Example of a SLURM job script for Talon3
# Job Name: Sample_Job
# Number of cores: 112
# Number of nodes: 4
# QOS: general
# Run time: 12 hrs
######################################
#SBATCH -J Sample_Job
#SBATCH -o Sample_job.o%j
#SBATCH -p public #SBATCH --qos general
#SBATCH -N 4
#SBATCH -n 112
#SBATCH -t 12:00:00
### Loading modules
module load intel/PS2017
## Use mpirun for MPI jobs
mpirun ./a.out > outfile.out
|
OPENMP job script example
#!/bin/bash
###################################### # Example of a SLURM job script for Talon3
# Job Name: Sample_Job
# Number of MPI tasks: 1
# Number of nodes: 1
# QOS: general
# Run time: 12 hrs
######################################
#SBATCH -J Sample_Job
#SBATCH -o Sample_job.o%j
#SBATCH -p public #SBATCH --qos general
#SBATCH -N 4
#SBATCH -n 112
#SBATCH -t 12:00:00
### Loading modules
module load intel/PS2017
### Set the number of threads
export OMP_NUM_THREADS=28
./a.out > outfile.out
|
CUDA parallel GPU job script example
#!/bin/bash
#SBATCH -J Sample_Job
### execute code |
Submit the job:
$ sbatch slurm-job.sh
Interactive Jobs
Interactive job sessions can be used on Talon if you need to compile or test software. An example command of starting an interactive sessions is shown below:
srun -p public --qos general -N 1 --pty bash
This launches an interactive job session and lanches a bash shell to a compute node. From there, you can exectue software and shell commands that would otherwise not be allowed on the Talon login nodes.
List jobs:
$ squeue -u $USER
Output:
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
106 public slurm-jo rstober R 0:04 1 atom01
Get job details:
$ scontrol show job 106
Kill a job. Users can kill their own jobs, root can kill any job.
$ scancel $JOB_ID
where $JOB_ID is the job ID number of the job you want to be killed
Hold a job:
$ scontrol hold 139
Release a job:
$ scontrol release 139