NERSCPowering Scientific Discovery Since 1974

Submitting Jobs

Submitting your job

If you are submitting your job on Genepool or Phoebe you do NOT need to source any batch settings.  The batch environment has be loaded into your path by default. 

If you are submitting a job from an external submit host you need to source the appropriate settings.sh file for Genepool or Phoebe.

source /opt/uge/genepool/uge/genepool/common/settings.sh

qsub commands and options

UGE (Univa Grid Engine) is the batch system used for Genepool/Phoebe. 

ActionHow to do itComment
Submit a job qsub script In UGE you need to submit a script, not an executable. If you need your job to inherit all the environmental variables of the submitting shell you have to request it with the -V option. Note: your job will not inherit your LD_LIBRARY_PATH (even if you specify -V).
Submit a job to the high priority queue qsub -l high.c script The high.c complex is for small fast turn around jobs
Submit a job that depends on other jobs qsub -hold_jid [job_ID|job_name] script UGE just recognizes whether or not [job_ID|job_name] is finished before submitting your job, and it only lets you "AND" job IDs/job names.
Submit a job to different project qsub -P [project]  script By default your job runs as the project corresponding to your primary unix group.  If SGE says you do not have access to the project you specify you'll need to file a ticket to get added to it.
Get e-mail from your job upon completion no e-mail by default, add the -m option of qsub to request e-mail see man pages for details
Set the virtual memory limit add "-l h_vmem=2G" Default virtual memory limit is 5GB and your jobs will crash if you hit the limit.  Note that this is a consumable resource so especially when the cluster is full the more memory you specify the harder it will be for SGE to schedule your jobs so you shouldn't set it higher than what you need.
Execute the job in the current directory add "-cwd" This option will execute the job in the current working directory where the qsub command is issued.  If this option is not present, the job will execute in the user's home directory.
Job rescheduling add "-r {y:n}" If y, this indicates the job should be rescheduled in case of node crash.  If the job is rescheduled, and environment variable RESTARTED is set.
Combine stdout and stderr output in one file add "-j {y:n}" Default is 'n'.
Reservation should be done add "-R {y:n}" Default is 'n'.
Environment variable specification add "-V" Specifies that all environment variables active within the qsub utility be exported to the context of the job
Redefine environment variables add "-v <variable>[=value][,...]" Defines or redefines the environment variable(s) to be exported to the execution context of the job
Specify validation level add "-w {e|w|n|p|v}" e[rror], w[arning], n[one], p[oke], v[erify].  Default is 'none'.

Consumable Resources

All of the following resources are requested with the -l flag in qsub.  Consumable resources are a tool used by SGE to make sure that use of fixed resources, such as memory, is planned for properly.  Once all of the memory on the cluster has been allocated to existing jobs, jobs in the queue will be delayed until more memory is freed up.  The queue the job is submitted to and the execution time requested are additional factors used to determine when a job is executed.  It is important to understand how much memory and execution time your job needs.  The more time and memory you request, the longer it will take for the job to execute.

ActionHow to do itComment
Queue add "-l <queue-type>.c"                              Default is normal.c; other options: bg.c, debug.c, short.c, timelogic.c
Memory usage add "-l ram.c 10G" Default is 5GB per slot.  Will be deprecated in favor of h_vmem
Memory usage add "-l h_vmem=2G" Specifies maximum amount of memory all job's processes are allowed to use.  The job will be killed if it attempts to use more memory than what is specified here.
Memory usage add "-l s_vmem=3G" Same as h_vmem, but will send a USR1 signal
Set runtime to N hours, hard limit   add "-l h_rt=N:00" Specifies a hard limit on the execution time (in hours) for the job.  The default is 12 hours. Job will be killed if this time is exceeded.  The shorter the requested time, the more likely it is the job will run sooner through the back filling mechanism.

Set runtime to N hours, soft limit                             

add "-l s_rt=N:00" Specifies a soft limit on the execution time (in hours) for the job.  USR1 signal is sent.  Signal can be trapped with a script to log necessary information