hpss

Since 2/12/13 01:55 pm

lens

Since 2/13/13 10:20 am

smoky

Since 2/13/13 08:05 am
OLCF User Assistance Center

Can't find the information you need below? Need advice from a real person? We're here to help.

OLCF support consultants are available to respond to your emails and phone calls from 9:00 a.m. to 5:00 p.m. EST, Monday through Friday, exclusive of holidays. Emails received outside of regular support hours will be addressed the next business day.

Lens Batch Script Examples

Bookmark and Share

This page lists several example batch scripts that can be used to run various types of jobs on lens’ compute resources.

Basic MPI

The following example will request (2) nodes for 1 hour and 30 minutes. It will then run a 32 task MPI job on the allocated cores (one task per core).

  #!/bin/csh
  # File name: mpi-ex.pbs
  #PBS -A XXXYYY
  #PBS -N mpi-ex
  #PBS -j oe
  #PBS -l walltime=1:30:00,nodes=2:ppn=16
  #PBS -l gres=widow2%widow3

  cd /tmp/work/$USER
   
  mpirun –np 32 a.out

And to invoke the above script from the command line, one would use:

  $ qsub mpi-ex.pbs
    43098.mgmt2
  $ showq | grep 43098
    43098   userid    Running   32   00:00:44   Sat Feb 29 06:18:56
Basic MPI on Partial Nodes

Users do not have to utilize all of the cores allocated to their batch job. Through mpirun options, users have the ability to run on all or only some of a node’s cores.

One reason for utilizing only a portion of each allocated node’s cores is memory access. Spreading the tasks out such that not all of a node’s cores are used allows each task to access more of the node’s memory.

The following example will request 2 nodes for (2) hours. It will then run a 4 task MPI job using (2) of each allocated node’s (16) cores.

  #!/bin/csh
  #PBS -A XXXYYY
  #PBS -N mpi-partial-node
  #PBS -j oe
  #PBS -l walltime=2:00:00,nodes=2:ppn=16
  #PBS -l gres=widow1

  cd /tmp/work/$USER
   
  mpirun -np 4 bynode a.out
  $ qsub mpi-partial-node-ex.pbs
    234567.mgmt2
  $ showq | grep 234567
    234567   userid   Running   32   00:00:44   Fri Feb 28 03:11:23
Multiple Simultaneous Jobs Single Batch Script

The following example will request 4 nodes for (2) hours. It will then run (2) MPI jobs each simultaneously running on (4) cores.

#!/bin/csh
#PBS -A XXXYYY
#PBS -N mpi-partial-node
#PBS -j oe
#PBS -l walltime=2:00:00,nodes=4:ppn=16
#PBS -l gres=widow2

cd /tmp/work/$USER
#####
# Store allocated nodes in $NODES
#  - read node names from file created by the
#     scheduler which is pointed to by $PBS_NODEFILE
#  - use 'uniq' to store only one node, the node file
#     contains a node entry for each of the cores
#     allocated on the node
#  - place the mpiruns in the background to run
#     simultaneously
set NODES = `sort $PBS_NODEFILE | uniq`
set x = 1

###########
# Loop through the allocated nodes
#  - Grab a pair of node names ($host1, $host2)
#  - Run a.out on the pair (-host host1,host2)
#  - Start two tasks per mpirun process (-np 2)
#      -- one task per node (-bynode)
#  See 'man mpirun' or 'mpirun -h' for more information
#   on the options used
while ($x <= ${#NODES})
  set host1 = $NODES[$x];
  @ x++
  set host2 = $NODES[$x]; 
  @ x++

  echo "$host1 : $host2";
    time mpirun -np 2 -bynode  -host $host1,$host2 a.out & 

end #end while

##########
# Wait on the backgrounded mpiruns
#  - without the 'wait' the batch script
#     will exit leaving the backgrounded mpiruns
#     behind to be killed by the batch job's
#     post processing clean-up
wait

  $ qsub multi-job-ex.pbs
    234568.mgmt2
  $ showq | grep 234568
    234568   userid   Running   64   00:00:44   Fri Oct 07 18:14:56
Important Considerations for Simultaneous Jobs in a Single Script

  • The mpirun instances must be backgrounded
    The & symbols in the exmaple above will place each mpirun in the background allowing the OS to place and run each simultaneously. Without placing the mpiruns in the background, the OS will run them serially waiting until one completes before starting the next.
  • The batch script must wait for backgrounded processes
    The wait command will prevent the batch script from returning until each background-ed mpirun completes. Without the wait the script will return once each mpirun has been started, causing the batch job to end, which kills each of the background-ed mpirun processes.
  • The OS will not automatically spread the mpiruns over the allocated nodes
    The OS on the head node where the mpirun is executed is not aware of the nodes allocated by the batch system. If you simply run two mpirun processes and place each in the background, they will each run on the same node likely swapping between cores shared by both mpirun processes. To spread the mpirun processes over the allocated nodes, you must tell each mpirun process on which nodes to run.