NCCS | User Info | search  

PBS on Phoenix


Contents


Introduction

The portable batch system (PBS) is the batch-job scheduler for Phoenix. It also allocates cpus for interactive parallel jobs. This document provides information for getting started with the batch facilities of PBS.

Queues

Different users may have access to different queues, and different queues may have different job limits or may target different nodes.

Use the "qstat -q" command to see the current list of queues.

$ qstat -q
  server: phoenix1f1.ccs.ornl.gov

Queue            Memory CPU Time Walltime Node   Run   Que   Lm  State
---------------- ------ -------- -------- ---- ----- ----- ----  -----
x_interactive      --      --       --     --      0     0   --   E R
batch              --      --    12:00:00  --      0     0   --   E R
sys                --      --       --     --      0     0   --   E R
special            --      --    24:00:00  --      0     0   --   E R
interactive        --      --    02:00:00  --      0     0   --   E R
long               --      --    12:00:00  --      2     1   --   E R
short              --      --    06:00:00  --      1     1   --   E R
standby            --      --       --     --      0     0   --   E R
immediate          --      --    24:00:00  --      0     0   --   E R
                                               ----- -----
                                                   3     2
The "batch" queue is the default queue for jobs submitted as PBS scripts. Specifically, if you don't specify the queue, then the default queue is batch.

The only queues that can be submitted to are the routing queues and they are as follows:

  • batch - default
  • interactive - for interactive batch work
  • special - for use in special situations only, need authorization
  • sys - for use by system personnel
The "interactive" queue is only used by adding the "-I" option to the submittal command. The other queues (execution queues) are for controlling job throughtput.

Interactive jobs

To run an interactive job, you use "qsub -I". For example, to do an interactive job with 1 process, you might use
  qsub -I -lwalltime=1:00:00,mppe=1,mem=4Gb
  cd $SYSTEM_USERDIR
  aprun -n 1 test_example
The first line starts up a new shell in your home directory. Then you need to "cd" to where ever you want to run, and then use aprun to run your code just like what you might find in a batch script.

aprun is the Cray method of starting an application on application nodes. aprun has a few options that the user must know about:

Frequently Used aprun Parameters
Parameter FormatDefinition
-n
Specifies the number of MSPs or SSPs
-c core=unlimited
Changes the default core size limit to unlimited. Default is 0. Only do this if running in $SYSTEM_USERDIR.
-d 
Specifies the number of threads per process.

Job command files

To run a batch job under PBS, you first need to write a job command file. PBS command files have two components: PBS keyword statements and shell commands. The LoadLeveler keyword statements are preceded by "#PBS", making them appear as comments to a shell. The shell commands follow the last "#PBS" keyword statement and represent the executable content of the batch job. If any ##PBS" lines follow executable statements, they will be treated as comments only. Note that a user's job may not run if the user's start-up files (i.e .cshrc, .login, or .profile) contain commands which attempt to set terminal characteristics. Any such command sequence within these files should be skipped by testing for the environment variable PBS_ENVIRONMENT. You should also be aware that commands in your startup files should not generate output when run under PBS. Commands that write to stdout should not be run for a PBS job. This can be done as shown in the following sample .login:
  ...  
  setenv MANPATH /usr/man:/usr/local/man:$MANPATH 
  if ( !  $?PBS_ENVIRONMENT ) then 
    do terminal settings here 
    run command with output here 
  endif 
If the user's login shell is csh the following message may appear in the standard output of a job:
  Warning: no access to tty, thus no job control in this shell
This message is produced by many csh versions when the shell determines that its input is not a terminal. Short of modifying csh, there is no way to eliminate the message. Fortunately, it is just an informative message and has no effect on the job.

Below you will find an example of a command file, specifying some typical PBS keywords.

  #PBS -N test
  #PBS -j oe
  #PBS -q batch
  #PBS -l walltime=1:00:00,mem=4Gb,mppe=1

  cd $SYSTEM_USERDIR
  aprun  -n 1 ./test
Line 1 shows how to name the job. Line 2 show how one can join stdout and stderr into a file named "<batch_script_name>.o$PBS_JOBID". Note that in the example above standard error is added to standard output. If the order is changed, then standard output is added to standard error. Line 3 specifies the queue the job will be submitted to, the default will be batch. Line 4 specifies some resources limits which are important, like walltime, memory, number of CPUs (MSPs) and number of SSPs per MSP. See the MPI jobs section below for more information.
Frequently Used QSUB Parameters
Parameter FormatDefinition
#PBS -A acct
Causes the job time to be charged to "acct".
#PBS -a date_time
Declares the time after which the job is eligible for execution.
#PBS -q batch
The '-q' parameter directs the job to a specified queue, in this case, the 'batch' or default queue
#PBS -j {eo,oe}
Causes the standard error and standard output to be combined in one file.
  • eo - standard output is added to standard error
  • oe - standard error is added to standard output
#PBS -l <resource>
Resources
  • mem - memory , default is 4 GB
  • mppe - specifies the maximum number of MSPs used in a job. This can be used to reserve MSPs for MSP and SSP codes. This is independent of the mppssp. Default is 0.
  • mppssp - specifies the maximum number of SSPs used in a job. This can only be used for SSP codes. This is independent of mppe. Default is 0. (it is NOT threads per process)
  • walltime - wall clock time
#PBS -m {a,b,e}
Causes mail to be sent to the user when:
  • a - the job aborts
  • b - the job begins running
  • e - the job ends running
#PBS -N name
Sets the job name to "name" instead of the name of the script file.
#PBS -o name
Sets the standard output file to "name" instead of script_file_name.o$PBS_JOBID. $PBS_JOBID is an environment variable created by PBS that contains the PBS job identifier.
#PBS -e name
Sets the standard error file to "name" instead of script_file_name.e$PBS_JOBID.
#PBS -S <shell>
Sets the shell to use. Make sure the full path to the shell is correct.
#PBS -V 
Declares that all environment variables are to be exported to the batch job.
#PBS -W 
Used to set job dependencies between two or more jobs.

A useful environment variable is PBS_O_WORKDIR. This is set by PBS when your batch job starts to the directory from where your batch job was submitted. By default, a PBS batch job starts in your home directory.


MPI jobs

Here is an example command file for a parallel MPI.
  #PBS -N test
  #PBS -j oe
  #PBS -q batch
  #PBS -l walltime=1:00:00,mem=16Gb,mppe=4

  cd $SYSTEM_USERDIR
  aprun  -n 4 ./test
This job requires upto an hour of runtime, 16 GB of memory, and 4 MSPs. This batch script would be used to run a 4 MPI process executable, with the assumption that the executable multistreams and thus uses the 4 SSPs available per MSP.

If the executable "test" where an SSP code, then this batch job would only use 4 SSPs of the 16 that were reserved (as 4 MSPs). This is perfectly acceptable, but note that you will get charged for use of 4 MSPs or equivalently 16 SSPs.

Note that if your executable is an SSP code, you could use a command file that looks like the following:

  #PBS -N test
  #PBS -j oe
  #PBS -q batch
  #PBS -l walltime=1:00:00,mem=4Gb,mppssp=4

  cd $SYSTEM_USERDIR
  aprun  -n 4 ./test
This gets 4 SSPs (likely from one MSP but not guaranteed) and since your code is an SSP executable, the "aprun -n 4" part knows to use 4 SSPs rather than 4 MSPs.

Important: It is probably the case for most users that only mppe or mppssp be used. Not both. Specifying both mppe and mppssp indicates that you want mppe+mppssp resources. The mppe resource can be used to run MSP or SSP codes, so it will probably be used most.


Environment Variables

All PBS-provided environment variable names start with the characters PBS_ . Some are then followed by a capital O (PBS_O_ ) indicating that the variable is from the job.s originating environment (i.e. the user's). The following short example lists some of the more useful variables, and typical values.

  • PBS_O_HOME=/spin/home/<username>
  • PBS_O_LOGNAME=<username>
  • PBS_O_SHELL=/bin/ksh
  • PBS_O_HOST=phoenix1f1.ccs.ornl.gov
  • PBS_O_WORKDIR=<directory from where you submitted the job>
  • PBS_O_QUEUE=batch
  • PBS_O_TZ=EST5EDT
  • PBS_JOBNAME=INTERACTIVE
  • PBS_JOBID=149.phoenix1f1.ccs.ornl.gov
  • PBS_QUEUE=batch
  • PBS_ENVIRONMENT=PBS_INTERACTIVE

Submitting jobs

Use "qsub" to submit a job command file for batch execution. The job shell will NOT inherit the working directory from where you submitted the job, so you might want to use PBS_O_WORKDIR to reference the directory from where the job was submitted. Also, unless you use full path names, the standard output and standard error files will be saved in this same directory.

If you forget to supply a "wall_clock_limit", your job will get the default limit, regardless of class.


Job status

Use "qstat -a" to check the status of submitted jobs.

phoenix1f1.ccs.ornl.gov: ORNL/CCS
                                                            Req'd  Req'd   Elap
Job ID          Username Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
149.phoenix1f1. user1    short    STDIN         813  --   1    1gb 02:00 R 01:10
150.phoenix1f1. user2    batch    job1          860  --   4    4gb 12:00 R 00:54
151.phoenix1f1. user3    batch    test          898  --  16    4gb 12:00 R 00:32
152.phoenix1f1. user4    batch    runit         914  --   4    1gb 08:00 R 00:01
153.phoenix1f1. user1    sys      job           --   --   4    4gb 12:00 Q   -- 
154.phoenix1f1. user5    batch    n16           --   --  16   64gb 24:00 W   --
The first column is the id of each job (which has been truncated) and the second column is the owner. The "S" column gives the status of each job. Here are some common status values.
E Job is exiting after having run
H Held
Q Queued, eligible to run
R Running
S Job is suspended
T Job is being moved to new location
W Waiting for its execution time

Stopping jobs

You can use "qdel" with a job id to cancel that job. The command removes waiting jobs and aborts running jobs.
$ qdel 12816

You can also keep a job from running without removing it from PBS using "qhold <jobid>" with a list of job names. You can then use "qrls <jobid>" to release held jobs and allow them to run.

One can also change the order in which two of the user's jobs are processed using the "qorder" command.


Documentation

Phoenix has "man" pages for each of the PBS commands as well as a PBS man page.
phoenix | ram | cheetah | eagle
ornl | nccs | ccs | computers | disclaimer

URL http://www.ccs.ornl.gov/Phoenix/PBS.html
Updated: Monday, 14-Feb-2005 14:10:12 EST
consult@ccs.ornl.gov