T3 NUO, Part 1.6, Slurm Workload Manager

Talon 3

Slurm Workload Manager is the job scheduler on Talon 3 and sets up your job to be run on the compute nodes.

Talon 3 Configuration

  • One Partition (queue)
    • ‘compute’
  • Max amount of nodes: 21
  • Max amount of CPU cores: 588
  • Quality of Service
    • Debug
      • Limit to 2 hours
      • Limit to 2 nodes
      • High priority 
    • General
      • Default partition
      • Limit to 72 hours
      • Limit to 560 CPU cores
    • Large
      • Allow exclusive jobs
      • Unlimited hours
      • Limit to 20 nodes
      • Low priority

Slurm on Talon 3 — CommandsSLURM logo

Common Slurm Commands

  • Submiting a job : sbatch
  • sbatch test.job : Submit a job with a job script (test.job)
  • Display job status: squeue
    • squeue –u [EUID] : Display user’s job status
    • squeue –j [jobID] : display job status of a certain job
  • Delete a queuing job: scancel
    • scancel [jobID] : delete a job with a certain job ID number
  • Modify a pending job: scontrol
    • scontrol update PartitionName=debug MaxTime=60:00 MaxNodes=2
  • See job accounting information on active/competed jobs: sacct
    • sacct –format=jobid,elapsed,ntasks,state

10 Example Slurm Job Scripts