SGE Batch System

SGE is the batch system used at PDSF.

At this point all the nodes (with exclusion of some special purpose nodes) are affected by a 1 day equivalent wallclock limit. On the fastest nodes (3GHz) it is 1 day and on slower nodes the time limit is extended by the processor speed ratio (e.g. 3 days on a 1GHz node). A announcement will be sent out when this changes.

If you are planning to submit thousands of short jobs (<10 minutes) concurrently consider using job arrays if possible to reduce SGE accounting overhead. There is an entry in the FAQ that describes how to use SGE job arrays.

NOTE: Full SGE documentation , Manuals and How To's are available from the SUN site but below is a short table that should help you get started.

Action How to do it in SGE Comment
submit a job qsub -V script In SGE you have to always submit a script, not an executable.
If you want your job to inherit all the environmental variables of the submitting shell you have to request it with the -V option. Note: your job will not inherit your LD_LIBRARY_PATH (even if you specify -V).
submit a job with a dvio requirement qsub -hard -l dv<#>io=1 script Replace <#> by the number associated with your diskvault
submit a job to the debug queue qsub -q debug.q script Debug queue only has a few nodes and has a one hour time limit.
submit a job that depends on other jobs qsub -hold_jid [job_ID|job_name] script SGE just recognizes whether or not [job_ID|job_name] is finished before submitting your job, and it only lets you "AND" job IDs/job names.
get e-mail from your job upon completion no e-mail by default, add the -m option of qsub to request e-mail see man pages for details
check on your job (running or pending) qstat -u user_name If you skip the -u option, you'll get all scheduled and running jobs.
qstat_lite "lite" version, easier on the batch system, but updated only every 5 min. Recommended if you can live with this 5min uncertainity.
qstat_long Regular qstat truncates job names to 10 characters. If you need a full name - use qstat_long.
check on your finished job qacct -o user_name -j If you skip the -o option (-o for a change not -u like above !), you'll get a summary of all the jobs ran by all the users during the last accounting period. Don't forget the -j option, without it, you'll just get your own grand total.
kill a job qdel job_ID If qdel doesn't work try qdel -f job_ID
kill all your jobs qdel -u use_name You can do -u all. If I did it it would kill all running and pending jobs for everybody, but since you have not enough prioviledge, it will kill only your own. But it's a bad practice and a dangerous habit.
start an interactive session on a batch node qsh Note that batch system commands like qstat are not available on the batch nodes.
start an interactive session on a specific batch node qsh -l h=pc<#> -now n
use lower case l above
replace <#> with a node of your choice,
"-now n" means that you are willing to wait if the node of choice is not immediately available
Select a job to run first qalter -js NN job_ID
NN is some positive number
In SGE you control the relative priority of your jobs by adjusting their job shares. A larger job share results in a higher priority.
Also look up this FAQ

