Interactive Batch Jobs

Batch scripts are useful for submiting a group of commands, allowing them to run through the queue, then viewing the results. It is also often useful to run a job interactively. However, users are not allowed to access compute nodes or run yod directly from a login session on Jaguar. Instead, users must use a batch-interactive PBS job. This is done by using the -I option to qsub.

Interactive Batch Example

For interactive batch jobs, PBS options are passed through qsub on the command line.

qsub -I -A XXXYYY -q debug -V -lsize=16,walltime=30:00

This request will

-I
Start an interactive session
-A
Charge to the XXXYYY project
-q debug
Run in the debug queue
-V
Import the submitting user’s environment
-lsize=16,walltime=30:00
Request 16 cores for 30 minutes

After running this command, you will have to wait until enough compute nodes are available, just as in any other batch job. However, once the job starts, you will have an interactive prompt on the head node of your allocated resource. From here commands may be executed directly instead of through a batch script.

Notice:
Compute nodes can see only the Lustre work space.

The NFS-mounted home, project, and software directories are not accessible to the compute nodes.

  • Executables must be executed from within the Lustre work space.
  • Batch jobs can be submitted from the home or work space. If submitted from a user’s home area, the user should cd into the Lustre work space directory prior to running the executable through aprun. An error similar to the following may be returned if this is not done:
            aprun: [NID 94]Exec /tmp/work/userid/a.out failed: chdir /autofs/na1_home/userid
            No such file or directory
  • Input must reside in the Lustre work space.
  • Output must also be sent to the Lustre file system.

Using to Debug

A common use of interactive batch is debugging. Below are points that may be useful while interactively debugging a code through PBS.

Quick Turnaround

The tips below may be used to help a job run quickly rather than sit in the queue.

Choosing a Job Size

You can use the showbf command (for “show back fill”) to see resource limits that would allow your job to be immediately backfilled (and thus started) by the scheduler. For example, the snapshot below shows that a job requesting seven compute nodes would run immediately.

$ showbf
Partition     Tasks  Nodes   StartOffset      Duration       StartDate
---------     -----  -----  ------------  ------------  --------------
ALL               16      16      00:00:00      INFINITY  12:43:19_03/30

The following command would then take advantage of this window for an interactive session:

qsub -q debug -I -lsize=8

See showbf –help for additional options. For more information, see the online user guide for the Moab Workload Manager.

TotalView

While debugging, it may be useful to run the TotalView debugger.

The syntax to use the TotalView debugger within your interactive batch job is as follows:

totalview aprun -a -n 16 a.out

This can only be done inside a qsub -I -V ... interactive job. The -V imports your environment, and thus X11 forwarding (if you are using it) will work.

Once the window comes up, you need to hit g (for “go”) or select “go” from the menu. Then TotalView will start running aprun. Once aprun gets to the point of spawning your processes for your code, a window should come up asking if you want to continue or stop. If you want to set breakpoints, then have TotalView stop; otherwise, continue. (This behavior is very similar to that on IBMs using poe.)