Monitoring Job Status
PBS and Moab provide multiple tools to view queue, system, and job statuses. Below are the most common and useful of these tools.
qstat
Use qstat -a
to check the status of submitted jobs.
> qstat -a nid00004: ORNL/CCS Req'd Req'd Elap Job ID Username Queue Jobname SessID NDS Tasks Memory Time S Time ------- -------- ------ -------- ------ ---- ----- ------ ----- - ----- 29668 user1 batch job2 21909 1 24000 -- 08:00 R 02:28 29894 user2 batch run128 -- 1 128 -- 02:30 Q -- 29895 user3 batch STDIN 15921 1 8192 -- 01:00 R 00:10 29896 user2 batch jobL 21988 1 4096 -- 01:00 R 00:09 29897 user4 debug STDIN 22367 1 2048 -- 00:30 R 00:06 29898 user1 batch job1 25188 1 4 -- 01:10 C 00:00 >
The qstat
output shows the following:
- Job ID
- The first column gives the PBS assigned job ID.
- Username
- The second column gives the submitting user’s user ID.
- Queue
- The third column gives the queue into which the job has been submitted.
- Jobname
- The fourth column gives the PBS job name. This is given by the PBS
-n
option in the PBS batch script. Or, if the-n
option is not used, PBS will use the name of the batch script. - SessID
- The fifth column gives the associated session ID.
- NDS
- The sixth column gives the PBS node count. Not accurate; will be one.
- Tasks
- The seventh column gives the number of cores requested by the job’s
-size
option. - Req’d Memory
- The eighth column give the job’s requested memory.
- Req’d Time
- The ninth column gives the job’s given wall time.
- S
- The tenth column gives the job’s current status. See the status listings below.
- Elap Time
- The eleventh column gives the job’s time spent in a running status. If a job is not currently or has not been in a run state, the field will be blank.
Status value | Meaning |
---|---|
E | Exiting after having run |
H | Held |
Q | Queued; eligible to run |
R | Running |
S | Suspended |
T | Being moved to new location |
W | Waiting for its execution time |
C | Recently completed (within the last 5 minutes) |
showq
The Moab utility showq
can be used to view a more detailed description of the queue. The utility will display the queue in the following states:
- Active
- These jobs are currently running.
- Eligible
- These jobs are currently queued awaiting resources. A user is allowed two jobs in the eligible state.
- Blocked
- These jobs are currently queued but are not eligible to run. Common reasons for jobs in this state are jobs on hold and the owning user currently having two jobs in the eligible state.
checkjob
The Moab utility checkjob
can be used to view details of a job in the queue. For example, if job 736 is a job currently in the queue in a blocked state, the following can be used to view why the job is in a blocked state:
>checkjob 736
The return may contain a line similar to the following:
BlockMsg: job 736 violates idle HARD MAXJOB limit of 2 for user(Req: 1 InUse: 2)
This line indicates the job is in the blocked state because the owning user has reached the limit of two job currently in the eligible state.
xtshowcabs
The utility xtshowcabs
can be used to see what jobs are currently running and, more importantly, where.