NERSC logo National Energy Research Scientific Computing Center
  A DOE Office of Science User Facility
  at Lawrence Berkeley National Laboratory
 

Running Jobs on Jacquard

All parallel jobs on Jacquard must be run through the batch system. The batch software is PBS Pro.

Note: Do not set group (or world) write permission for your home directory. If you do this, pbs cannot run your jobs.

Interactive jobs

Running parallel jobs from the command line is possible through starting an interactive batch job. The following command

jacquard% qsub -I -q interactive -A reponame -l nodes=N:ppn=[1|2]

N is the number of nodes desired and ppn should be specified as either 1 or 2 processors per node. The interactive queue limits listed below will apply. If the reponame is not specified, charges will be applied against the user's default repo.

The preceeding command will start a new shell, from which you can lauch jobs with the mpirun or mpiexec command (see below). The directory the job was submitted from is defined in the environment variable $PBS_O_WORKDIR.

You can also use mpirun directly from the command line:

jacquard% mpirun -np number_of_tasks executable

In this case a wrapper script issues the appropriate qsub command on your behalf. The availability of nodes for interactive use may at times cause interactive jobs to stall or timeout.

Batch jobs

A batch script - a text file with PBS directives and job commands - is required to submit jobs. PBS directive lines, which tell the batch system how to run your job, begin with #PBS. A minimal script for Jacquard will be very similar to the following example:

#PBS -l nodes=8:ppn=2,walltime=00:30:00
#PBS -N jobname 
#PBS -o job.out
#PBS -e job.err
#PBS -A repo
#PBS -q debug 
#PBS -V

cd $PBS_O_WORKDIR
mpirun -np 16 ./a.out

or replace the last two lines as follows to use mpiexec

cd $PBS_O_WORKDIR
mpiexec -n 16 ./a.out

repo is to be replaced by the repository you want to charge the job against.

Currently mpirun does not propagate environment variables to all the tasks in the parallel job. Using mpiexec to launch the job is one way to accomplish this. You can also mpirun a shell script which defines the variables you need.

mpiexec supports many useful features that NERSC staff are working to see implemented in the standard mpirun. In order to accomodate the needs of as many users as possible for now we provide both launchers. See below for more information about these job launchers.

Notes:

  • If your login shell is bash and your parallel batch job returns the error message
    accept: Resource temporarily unavailable
    done.
    you must specify another shell for PBS using the -S directive, such as:
    #PBS -S /usr/bin/ksh
  • If you must use bash, do not use the -S directive; instead, use the following as the first line of your batch script:
    #!/bin/bash

Jobs that read or write large files should be executed in the $SCRATCH file system. In the sample script above, the line cd $PBS_O_WORKDIR changes the current working directory to the directory from which the script was submitted. The easiest way to run a job using $SCRATCH is to submit the job from a $SCRATCH directory. You may also cd to your $SCRATCH directory in place of cd $PBS_O_WORKDIR.

Common PBS Options/Directives
OptionDefaultDescription
-A repo Your default repo Charge this job to repo
-e filename <script_name>.e<job_id> Write STDERR to filename
-o filename <script_name>.o<job_id> Write STDOUT to filename
-j [eo|oe] Do not merge. Merge STDOUT and STDERR. If eo merge as standard error; if oe merge as standard output.
-m [a|b|e|n] a E-mail notification options:
a = send mail when job aborted by system
b = send mail when job begins
e = send mail when job ends
n = do not send mail
Options a,b,e may be combined.
-N job_name Job script name. Job Name: up to 15 printable, non-whitespace characters.
-q queue batch See Batch queues below.
-S shell Login shell Specify shell as the scripting language to use.
-V Do not import. Export the current environment variables into the batch job enviroment.

All options may be specified as either (1) qsub command-line options or (2) as directives in the batch script as #PBS option.

Parallel job launch commands

The standard MPI job launch command for MPICH/MVPICH programs is called mpirun. mpirun uses SSH to execute non-interactive remote commands on the compute nodes, and therefore does not propagate environment variables into the parallel compute environment.

jacquard% mpirun -np number_of_tasks executable

An alternative job launch program installed by NERSC is mpiexec. This program has the advantage over mpirun that environment variables are propagated from the batch script environment into the parallel run environment.

jacquard% mpiexec -n number_of_tasks executable

The mpiexec launcher talks directly to PBS services as opposed to using remote shells. More information on mpiexec is available here.

You are particularly encouraged to launch your job with mpiexec if you using a large numbers of nodes. See below for more information.

Note:

The mpirun launcher can be run directly from the command line, but the mpiexec launcher can only be used from within a pbs script.

Account (repo) charging

Jobs are charged against your default repository unless otherwise specified. (See Accounts and Charging on Jacquard for more information.)

The NIM web interface is used to view and change your default repo.

You can specify the repo to be charged in your PBS script. Use this keyword:

#PBS -A  repo_name

or, use the -A reponame option to qsub.

Interactive and debug jobs are charged at the regular priority rate.

Batch queues

There are four submit queues on Jacquard. The submit queues will route your job to the correct execution queue based on its requirements. The lower the relative priority of the queue the higher the actual priority of the jobs in the queue as far as the scheduler is concerned. Other things being equal a job in a queue with a relative priority of n will be scheduled ahead of one in a queue with a relative priority of n+1.

Submit
Queue
Exec
Queue
Nodes Max Wallclock Max Jobs per user1 Relative Priority
interactive interactive 1-16 30 mins 1 1
debug debug 1-32 30 mins 1 2
batch batch16 1-16 48 hours   7
batch32 17-32 24 hours   5
batch64 33-64 12 hours   4
batch1282 65-128 6 hours   6
batch2563 129-256 6 hours   3
low low 1-64 12 hours   8

Notes:

1 There is a maximum of 4 running jobs per user over the whole system.
2 Only one batch128 job is allowed to run at a time.
3 The batch256 queue will usually have a run limit of zero. NERSC staff will monitor this queue and make special arrangements to run jobs of this size. See below for more information about running large jobs on Jacquard.

Running large (>129 node) jobs

All large jobs on Jacquard, particularly those using 129 nodes or more, are encouraged to use the mpiexec to launch their jobs instead of mpirun.

The reason for this limit is that mpirun is actually a batch script, and beyond a certain node count jobs that use this script to launch may run into shell line length limitations.

STDOUT, STDERR buffering

PBS stages standard output and standard error to temporary files that are not written into a user's disk space until the job has completed, You can redirect STDOUT and STDERR from the command line into a file that is visible to you during the run, but this scheme may not work in all situations. NERSC is investigating ways to make this redirection more reliable.

STDIN, STDOUT redirection

If your code requires that you must redirect stdin or stdout on the command line, you may wish to try putting the command line part of the mpirun command in quotes:

jacquard% mpirun -np number_of_tasks "executable <inputfile >outputfile"

Sample batch script

This is a sample Jacquard pbs batch script which runs a 64 node 128 processor job with a 5 hour wall clock limit. The executable is called hello and puts the standard output of the job into a file named hello.out and any standard error into a file hello.err.

#PBS -l nodes=64:ppn=2,walltime=05:00:00
#PBS -N hello
#PBS -o hello.out
#PBS -e hello.err
#PBS -q batch
#PBS -A repo
#PBS -V

cd $PBS_O_WORKDIR
mpirun -np 128 ./hello

repo is the repository name against which to charge the job. The nodes keyword gives the number of nodes on which the job will run, and ppn the number of processors on each node which on Jacquard can only be 1 or 2.

If a queue is not specified in the script, the job will run on the batch queue.

The "./" before the name of the executable is required when invoking mpirun, even if "." is in $PATH.

Submitting a job

To submit a job for execution, type

jacquard% qsub batchscript

where batchscript is the name of the batch script. The output of the qsub command will include the jobid. Users should record this information, as it is very useful in debugging job failures.

Deleting a job

To delete a previously submitted job, type

jacquard% qdel jobid

where jobid is the job's identification, produced by the qsub command.

Job monitoring

Job progress can be monitored on the web, or on Jacquard with the PBS command qstat or the NERSC-provided command qs.

The NERSC qs command gives queue status information tailored to Jacquard. Output from qs is a terminal formatted summary of running and queued jobs.

jacquard% qs
  JOBID ST      USER        NAME   NDS       REQ      USED            SUBMIT
  57839  R     user5  calcoastNH    16  00:30:00  00:04:30    Aug 1 16:42:51
  57676  R     user1      fspack    64  05:00:00  03:30:52    Aug 1 10:23:32
  57666  R     user1      fspack    32  06:00:00  03:01:15    Aug 1 09:45:23
  57677  R     user1      fspack    32  05:00:00  01:49:11    Aug 1 10:24:58
  57801  R     user6           b     1  06:00:00  01:10:49    Aug 1 15:30:03
  57803  R     user6           c     1  06:00:00  01:10:48    Aug 1 15:31:05
  57809  R     user6           d     1  06:00:00  01:02:44    Aug 1 15:45:04
  57814  R     user6           e     1  06:00:00  00:54:48    Aug 1 15:52:50
  57824  R     user4      test77     4  06:00:00  00:40:43    Aug 1 16:08:05
  57836  R     user1      fspack    32  06:00:00  00:12:40    Aug 1 16:35:17
  57591  H     user7  tftr79100_    64  06:00:00         -   Jul 31 19:02:54
  57675  H     user3     prot_10    64  05:00:00         -    Aug 1 10:21:50
  57697  H     user7  jt6032841_    64  05:58:00         -    Aug 1 13:24:39
  57815  H     user8   a_to_cen_    64  03:00:00         -    Aug 1 15:55:02
  57584  H     user7  jt6032844_    64  06:00:00         -   Jul 31 18:35:55
  57838  H     user1      fspack    64  06:00:00         -    Aug 1 16:35:25
  57680  H     user1      fspack    32  06:00:00         -    Aug 1 10:30:20
  57817  H     user8      x0_39_    64  04:00:00         -    Aug 1 15:56:14
  57678  H     user3     prot17_    64  05:00:00         -    Aug 1 10:26:25
  57792  H     user2        J128    64  06:00:00         -    Aug 1 15:15:38
  57818  H     user8    hole_38_    64  04:00:00         -    Aug 1 15:56:57
  57816  H     user8      a0_43_    64  04:00:00         -    Aug 1 15:55:47
  57698  H     user7  tftr79128_    64  05:58:00         -    Aug 1 13:27:21
  57679  H     user1      fspack    32  06:00:00         -    Aug 1 10:30:04

The qs script includes -u username and -w options that allow decreasing or increasing the amount of information reported.

Output

Your standard output file will contain a system provided header before your actual job output and a sytem provided footer after your output.

The header will look like this:

PBS Leader node is jaccn150

Job setup time:  Thu Mar 3 12:13:05 PST 2005

Setting up security

Job startup at  Thu Mar 3 12:13:05 PST 2005

-------------------------------------------------------------------

Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.

The last two lines do not indicate an error.

The system provided footer will follow your job output, and it will look like this:

----------------------------------------------------------------
Job testbar1/21451.jacin03 completed Thu Mar  3 12:13:05 PST 2005
Submitted by mstewart/mstewart using null
Job Limits: ncpus=2,neednodes=1:ppn=2,walltime=00:02:00
Job Resources used: cpupercent=0,cput=00:00:00,mem=1488kb,ncpus=2,
	vmem=6000kb,walltime=00:00:00
Nodes used: jaccn150 

Killing any leftover processes...

Job completed.

Using script variables

The suggested mpirun job launch script does not propagate variables from the script environment into the parallel run environment, except for the LD_LIBRARY_PATH variable. This means that codes that need variables defined only in the script will fail at runtime. This includes any module commands that appear only in the script.

If your job needs the environment variables defined in the batch script environment, try using mpiexec ( see above) instead of mpirun.

VAPI_RETRY_EXC_ERR error

You may run into this error, particularly when running large node count jobs on Jacquard:

Got completion with error, code=VAPI_RETRY_EXC_ERR, vendor code=81

You can eliminate the problem by setting the following environment variables in your batch script before the mpirun/mpiexec job launch command.

bash login shell:

export VIADEV_DEFAULT_RETRY_COUNT=7
export VIADEV_DEFAULT_TIME_OUT=21

csh/tcsh login shell:

setenv VIADEV_DEFAULT_RETRY_COUNT 7
setenv VIADEV_DEFAULT_TIME_OUT 21

For assistance or to report problems, contact consult@nersc.gov.


LBNL Home
Page last modified: Mon, 04 Feb 2008 23:42:30 GMT
Page URL: http://www.nersc.gov/nusers/systems/jacquard/running_jobs.php
Web contact: webmaster@nersc.gov
Computing questions: consult@nersc.gov

Privacy and Security Notice
DOE Office of Science