Job Execution

Once resources have been allocated through PBS, users have the option of serially running commands on the allocated resources’ head node or across all the resources in the allocated resource pool.

Serial

Batch Script

The executable portion of batch scripts is interpreted by the shell specified on the first line of the script. If a shell is not specified, the submitting user’s default shell will be used. This portion of the script may contain comments, shell commands, executable scripts, and compiled executables. These can be used in combination to, for example, navigate file systems, set up job execution, run executables, and even submit other batch jobs.

Batch Interactive

While running in interactive mode, the submitting user’s default shell will be used.

Parallel

By default, commands will be executed on the job’s associate service node. The aprun command is used to execute a job on one or more compute nodes. Jaguar’s layout should be kept in mind when running a job using aprun. Jaguar’s current layout consists of four cores (CPUs) per compute node. The PBS size option requests compute cores.

Aprun accepts the following common options:

-D	Debug option that can be used to view layout
-N	Number of cores per socket
-n	Number of total cores
-m	Memory required per task (maximum 2,000 MB per core; 2,100 MB will allocate two cores for the task)

Note: If you do not specify the number of tasks to aprun, the system will default to 1.

Notice:
Compute nodes can see only the Lustre work space.

The NFS-mounted home, project, and software directories are not accessible to the compute nodes.

Executables must be executed from within the Lustre work space.
Batch jobs can be submitted from the home or work space. If submitted from a user’s home area, the user should cd into the Lustre work space directory prior to running the executable through aprun. An error similar to the following may be returned if this is not done:
```
        aprun: [NID 94]Exec /tmp/work/userid/a.out failed: chdir /autofs/na1_home/userid
        No such file or directory
```
Input must reside in the Lustre work space.
Output must also be sent to the Lustre file system.

OpenMP

CNL supports threaded programming within a socket. Threads cannot span across sockets. To run a code with 256 MPI tasks and four threads per task, you would use the following:

export OMP_NUM_THREADS=4
aprun -n256 -N1 ./a.out

NOTE: csh/tcsh users should replace the first line with

setenv OMP_NUM_THREADS 4

This aprun command specifies 256 tasks (-n256) and asks for one task per socket (-N1). The OMP_NUM_THREADS environment variable tells the system to spawn four threads per MPI task.

NOTE: To use threads under PGI, the -mp=nonuma option must be added to the compile line.

Single-Core and Multi-Core Modes

Unlike with yod and Catamount, there are not specific flags for single-core and multi-core modes under aprun. Instead, this must be controlled with combinations of aprun options. For example, to launch a.out on four cores per socket, use the following:

aprun -n 1024 a.out

To launch a.out on 1,024 cores, one core per socket, use the following:

aprun -n 1024 -N1 a.out

Task Layout

The default MPI task layout is sequential.

For example,

aprun -n8 a.out

will run the MPI executable a.out on a total of eight cores, four cores on two compute nodes. The MPI tasks will be allocated in the following sequential fashion:

Compute Node 0					Compute Node 1
core 0	core 1	core 2	core 3		core 0	core 1	core 2	core 3
0	1	2	3		4	5	6	7

The layout order can be changed using the environment variable MPICH_RANK_REORDER_METHOD. See man intro_mpi for more information.

Task layout can be seen by setting the environment MPICH_RANK_REORDER_DISPLAY variable to 1.

National Center for Computational Sciences