Monitoring Job Status

PBS and Moab provide multiple tools to view queue, system, and job statuses. Below are the most common and useful of these tools.

qstat

Use qstat -a to check the status of submitted jobs.

> qstat -a 

nid00004: ORNL/CCS
                                                   Req'd  Req'd   Elap
Job ID  Username Queue  Jobname  SessID NDS  Tasks Memory Time  S Time
------- -------- ------ -------- ------ ---- ----- ------ ----- - -----
29668    user1   batch   job2     21909   1  24000   --   08:00 R 02:28
29894    user2   batch   run128    --     1    128   --   02:30 Q   --
29895    user3   batch   STDIN    15921   1   8192   --   01:00 R 00:10
29896    user2   batch   jobL     21988   1   4096   --   01:00 R 00:09
29897    user4   debug   STDIN    22367   1   2048   --   00:30 R 00:06
29898    user1   batch   job1     25188   1      4   --   01:10 C 00:00
>

The qstat output shows the following:

Job ID
The first column gives the PBS assigned job ID.
Username
The second column gives the submitting user’s user ID.
Queue
The third column gives the queue into which the job has been submitted.
Jobname
The fourth column gives the PBS job name. This is given by the PBS -n option in the PBS batch script. Or, if the -n option is not used, PBS will use the name of the batch script.
SessID
The fifth column gives the associated session ID.
NDS
The sixth column gives the PBS node count. Not accurate; will be one.
Tasks
The seventh column gives the number of cores requested by the job’s -size option.
Req’d Memory
The eighth column give the job’s requested memory.
Req’d Time
The ninth column gives the job’s given wall time.
S
The tenth column gives the job’s current status. See the status listings below.
Elap Time
The eleventh column gives the job’s time spent in a running status. If a job is not currently or has not been in a run state, the field will be blank.
Status value Meaning
E Exiting after having run
H Held
Q Queued; eligible to run
R Running
S Suspended
T Being moved to new location
W Waiting for its execution time
C Recently completed (within the last 5 minutes)

showq

The Moab utility showq can be used to view a more detailed description of the queue. The utility will display the queue in the following states:

Active
These jobs are currently running.
Eligible
These jobs are currently queued awaiting resources. A user is allowed two jobs in the eligible state.
Blocked
These jobs are currently queued but are not eligible to run. Common reasons for jobs in this state are jobs on hold and the owning user currently having two jobs in the eligible state.

checkjob

The Moab utility checkjob can be used to view details of a job in the queue. For example, if job 736 is a job currently in the queue in a blocked state, the following can be used to view why the job is in a blocked state:

>checkjob 736

The return may contain a line similar to the following:

BlockMsg: job 736 violates idle HARD MAXJOB limit of 2 for user  (Req: 1 InUse: 2)

This line indicates the job is in the blocked state because the owning user has reached the limit of two job currently in the eligible state.

xtshowcabs

The utility xtshowcabs can be used to see what jobs are currently running and, more importantly, where.