NCCS | User Info | search  


PBS on Ram



Contents


Introduction

An eight-CPU "login" partition on Ram supports login services, code development, and operating system services. Because all operating system calls use the login CPUs, it is important to keep them from becoming overloaded; otherwise the entire system will slow down. For this reason, all production work should be done through the batch system. Resource-intensive sequential processes and parallel jobs must be run through the batch system, not on the login CPU set. If the login CPU set gets overloaded with user processes, we may be forced to halt processes that use more than their fair share.

On Ram, the Portable Batch System (PBS) is used to schedule batch-jobs and and to allocate nodes for interactive use. This document provides information for getting started with the batch facilities of PBS.


Queues

Different users may have access to different queues, and different queues may have different job limits or may target different nodes.

Use the "qstat -q" command to see the current list of queues.

$ qstat -q
  server: ram1.ccs.ornl.gov

Queue            Memory CPU Time Walltime Node Run Que Lm  State
---------------- ------ -------- -------- ---- --- --- --  -----
standby            --      --       --     --    0   0 --   E R
batch              --      --    12:00:00  --    0   0 --   E R
short              --      --    06:00:00  --    1   0 --   E R
long               --      --    12:00:00  --    0   0 --   E R
interactive        --      --    02:00:00  --    1   0 --   E R
special            --      --    24:00:00  --    0   0 --   E R
immediate          --      --    24:00:00  --    0   0 --   E R
sys                --      --       --     --    0   0 --   E R
                                               --- ---
                                                 2   0
The "batch" queue is the default queue for jobs submitted as PBS scripts.

The only queues that can be submitted to are the routing queues, and they are typically the following:

  • batch - default
  • interactive - for short, software-development jobs
  • special - for use in special situations only, needing authorization
  • sys - for use by system administrators
The "interactive" queue is only used by adding the "-I" option to the submittal command. Unlike on Cheetah, you cannot start an interactive job on Ram directly. The remaining queues, execution queues, are for controlling job throughtput.

Interactive jobs

To run an interactive job, you must use "qsub -I". For example, to do an interactive job with 8 processes, you might use
  qsub -I -lwalltime=15:00,ncpus=8
  cd $SYSTEM_USERDIR
  mpirun -np 8 test_example
  exit
The first line starts up a new shell in you home directory, not your previous working directory. The second line moves to the scratch area, and the third starts a parallel application.

"mpirun" is the SGI method of starting a parallel executable on the application nodes of Ram.

NOTE:Please remember to "exit" the interactive shell when finished.


Job command files

To run a batch job under PBS, you first need to write a job command file. PBS command files have two components, PBS keyword statements and shell commands. The PBS keyword statements are preceded by "#PBS", making them appear as comments to a shell. The shell commands follow the last "#PBS" keyword statement and represent the executable content of the batch job.

Note that your job may not run if the your shell start-up files (eg. ".cshrc", ".login", or ".profile") contain commands that attempt to set terminal characteristics. Any such command sequences within these files should be skipped by testing for the environment variable "PBS_ENVIRONMENT". You should also be aware that commands in your startup files should not generate output when run under PBS. These problems can be avoided as shown in the following sample ".login":

  ...  
  setenv MANPATH /usr/man:/usr/local/man:$MANPATH 
  if ( !  $?PBS_ENVIRONMENT ) then 
    #do terminal settings here 
    #run command with output here 
  endif 
If your login shell is "csh" the following message may appear in the standard output of a job:
  Warning: no access to tty, thus no job control in this shell
This message is produced by many "csh" versions when the shell determines that its input is not a terminal. Short of modifying "csh", we know of no way to eliminate the message. Fortunately, it is just an informative message and has no other effect on the job.

Below you will find an example of a command file, specifying some typical PBS keywords.

  #!/bin/csh 
  #PBS -N test
  #PBS -j oe
  #PBS -q batch
  #PBS -l walltime=1:00:00,mem=4Gb,ncpus=1

  cd $SYSTEM_USERDIR
  mpirun  -np 1 ./test
Line 1 specifies the shell. Line 2 shows how to name the job. Line 3 show how one can join stdout and stderr into a file named "<batch_script_name>.o$PBS_JOBID". Note that "oe" adds standard error to standard output; with "eo", standard output is added to standard error. Line 4 specifies the queue the job will be submitted to ("batch" is also the default). Line 5 specifies important resource limits, like walltime, memory, and number of CPUs. See the MPI jobs section below for more information.
Frequently Used "qsub" Parameters
Parameter FormatDefinition
#PBS -A acct
Causes the job time to be charged to "acct".
#PBS -a time
Declares the time after which the job is eligible for execution.
#PBS -q queue
Directs the job to a specified queue, where "batch" is the default.
#PBS -j {eo|oe}
Causes the standard error and standard output to be combined in one file.
  • eo - standard output is added to standard error
  • oe - standard error is added to standard output
#PBS -l resources
Resources (separated by commas, with no spaces between):
  • ncpus=n - Maximum number of parallel processes.
  • walltime=hh:mm:ss - Total wall-clock time.
  • mem=ngb - Aggregate memory used by the job, in gigabytes. If you ask for more then 8 GB/processor, PBS will allocate more processors to your job than you requested.
#PBS -m {a|b|e}
Causes mail to be sent to the user when:
  • a - The job aborts.
  • b - The job begins running.
  • e - The job ends.
#PBS -N name
Sets the job name to "name" instead of the name of the script file.
#PBS -o name
Sets the standard output file to "name" instead of "script.o$PBS_JOBID". "$PBS_JOBID" is an environment variable created by PBS that contains the job identifier.
#PBS -e name
Sets the standard error file to "name" instead of "script.e$PBS_JOBID".
#PBS -S shell
Sets the shell to use. Make sure the full path to the shell is correct.
#PBS -V 
Declares that all environment variables are to be exported to the batch job.
#PBS -W 
Used to set job dependencies between two or more jobs.

A useful environment variable is "PBS_O_WORKDIR". This is set by PBS when your batch job starts to the directory from where your batch job was submitted. By default, a PBS batch job starts in your home directory.


MPI jobs

Here is an example command file for a parallel MPI.
  #!/bin/ksh 
  #PBS -N test
  #PBS -j oe
  #PBS -q batch
  #PBS -l walltime=1:00:00,mem=8gb,ncpus=4

  cd $SYSTEM_USERDIR
  mpirun  -np 4 ./test
This job requests an hour of runtime, 8 GB of memory, and 4 parallel processes.


OpenMP jobs

Here is an example command file for an OpenMP job.
  #!/bin/ksh 
  #PBS -N test
  #PBS -j oe
  #PBS -q batch
  #PBS -l walltime=1:00:00,mem=8gb,ncpus=4

  cd $SYSTEM_USERDIR
  export OMP_NUM_THREADS=4
  ./test
Here is an example command file combining OpenMP and MPI.
  #!/bin/ksh 
  #PBS -N test
  #PBS -j oe
  #PBS -q batch
  #PBS -l walltime=1:00:00,mem=8gb,ncpus=20

  cd $SYSTEM_USERDIR
  export OMP_NUM_THREADS=4
  export MPI_DSM_DISTRIBUTE=YES
  mpirun -np 5 ./test


Memory requirements

Ram has 2 TB of memory, or 8 GB per processor. The current implementation on Linux does not keep memory contained within the CPU set or keep other jobs' memory out. It does automatically increase the job's value of "ncpus" so that there is at least one processor reserved per 8 GB of memory requested.

Note that, if you use more processes than you requested CPUs in PBS, your processes will be confined to the CPU set. Your processes will share with each other, only affecting your own performance.


Environment Variables

All PBS-provided environment variable names start with the characters "PBS_". Some are then followed by a capital "O" ("PBS_O_"), indicating that the variable is from the job's originating, submission environment. The following short example lists some useful variables, along with typical values.

  • PBS_O_HOME=/spin/home/$USER
  • PBS_O_LOGNAME=$USER
  • PBS_O_SHELL=$shell
  • PBS_O_HOST=ram1.ccs.ornl.gov
  • PBS_O_WORKDIR=submission directory
  • PBS_O_QUEUE=batch
  • PBS_O_TZ=EST5EDT
  • PBS_JOBNAME=INTERACTIVE
  • PBS_JOBID=149.ram1.ccs.ornl.gov
  • PBS_QUEUE=batch
  • PBS_ENVIRONMENT=PBS_INTERACTIVE

Submitting jobs

Use "qsub" to submit a job command file for batch execution. The job shell will NOT inherit the working directory from where you submitted the job, so you might want to "cd $PBS_O_WORKDIR" at the beginning of your script.

Unless you use full path names, the standard output and standard error files will be saved in the submission directory.

If you forget to supply a "wall_clock_limit", your job will get the default limit.


Job status

Use "qstat -a" to check the status of submitted jobs.

ram1.ccs.ornl.gov: ORNL/CCS
                                                      Req'd  Req'd   Elap
Job ID    Username Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
149.ram1. user1    short    STDIN         813  --   1    1gb 02:00 R 01:10
150.ram1. user2    batch    job1          860  --   4    4gb 12:00 R 00:54
151.ram1. user3    batch    test          898  --  16    4gb 12:00 R 00:32
152.ram1. user4    batch    runit         914  --   4    1gb 08:00 R 00:01
153.ram1. user1    sys      job           --   --   4    4gb 12:00 Q   -- 
154.ram1. user5    batch    n16           --   --  16   64gb 24:00 W   --
The first column is the ID of each job (truncated in the example), and the second column is the owner. The "S" column gives the status of each job. Here are some common status values:
E Exiting after having run
H Held
Q Queued, eligible to run
R Running
S Suspended
T In transition, moving to a different queue
W Waiting for time specified by "qsub -a"

Stopping jobs

You can use "qdel" with a job ID to cancel that job. The command removes waiting jobs and aborts running jobs.
$ qdel 12816

You can use "qhold jobID" to hold jobs. And you can use "qrls jobID" to release held jobs which allows them to run.

You can also change the order of your jobs using the "qorder" command.

See the "man" page for each command for more details.


Documentation

Ram has "man" pages for each of the PBS commands, along with "man pbs" for an overview and "man pbs_resources" describing valid options for "qsub -l".

phoenix | ram | cheetah | eagle
ornl | nccs | ccs | computers | disclaimer

URL http://www.ccs.ornl.gov/Ram/PBS.html
Updated: Friday, 29-Apr-2005 09:06:49 EDT
consult@ccs.ornl.gov