SGE is the batch system used at PDSF.
At this point all the nodes (with exclusion of some special purpose nodes) are affected
by a 1 day equivalent wallclock limit.
On the fastest nodes (3GHz) it is 1 day and on slower nodes the time
limit is extended by the processor speed ratio (e.g. 3 days on a 1GHz node).
A announcement will be sent out when this changes.
If you are planning to submit thousands of short jobs (<10 minutes) concurrently consider using
job arrays if possible to reduce SGE accounting overhead. There is an entry in the FAQ that
describes how to use SGE job arrays.
Action |
How to do it in SGE |
Comment |
submit a job |
qsub -V script |
In SGE you have to always submit a script, not an executable.
If you want your job to inherit all the environmental variables of the submitting shell
you have to request it with the -V option. Note: your job will not inherit your
LD_LIBRARY_PATH (even if you specify -V). |
submit a job with a dvio requirement |
qsub -hard -l dv<#>io=1 script |
Replace <#> by the number associated with your diskvault |
submit a job to the debug queue |
qsub -q debug.q script |
Debug queue only has a few nodes and has a one hour time limit. |
submit a job that depends on other jobs |
qsub -hold_jid [job_ID|job_name] script |
SGE just recognizes whether or not [job_ID|job_name] is finished before
submitting your job, and it only lets you "AND" job IDs/job names.
|
get e-mail from your job upon completion |
no e-mail by default, add the -m option of qsub to request e-mail
| see man pages for details
|
check on your job (running or pending) |
qstat -u user_name |
If you skip the -u option, you'll get all scheduled and running jobs. |
qstat_lite |
"lite" version, easier on the batch system, but updated only every 5 min.
Recommended if you can live with this 5min uncertainity.
|
qstat_long |
Regular qstat truncates job names to 10 characters. If you need a full name - use qstat_long.
|
check on your finished job |
qacct -o user_name -j |
If you skip the -o option (-o for a change not -u like above !), you'll get a summary of all the jobs ran by all the users during the last accounting period. Don't forget the -j option, without it, you'll just get your own grand total. |
kill a job |
qdel job_ID |
If qdel doesn't work try qdel -f job_ID |
kill all your jobs |
qdel -u use_name |
You can do -u all. If I did it it would kill all running and pending jobs for everybody, but since you have not enough prioviledge, it will kill only your own. But it's a bad practice and a dangerous habit. |
start an interactive session on a batch node |
qsh |
Note that batch system commands like qstat are not available on the batch nodes. |
start an interactive session on a specific batch node |
qsh -l h=pc<#> -now n use lower case l above |
replace <#> with a node of your choice,
"-now n" means that you are willing to wait if the node of choice is
not immediately available |
Select a job to run first |
qalter -js NN job_ID
NN is some positive number |
In SGE you control the relative priority of your
jobs by adjusting their job shares. A larger
job share results in a higher priority. |