MySGE
Overview
- MySGE allows users to create a private Sun GridEngine cluster on large parallel systems like Hopper or Franklin. One the cluster is started, users can submit serial jobs, array jobs, and other through-put oriented workloads into the personal SGE scheduler. The jobs are then run within the user private cluster.
How it works
When the user executes vpc_start, a job is submitted to the standard system scheduler (Moab). The user can specify the requested time and number of cores using the normal moab syntax (i.e. -l mppwidth=240,mppnppn=24,walltime=1:00:00). When the system job is scheduled, MySGE launches an SGE scheduler and uses a single aprun command to start the SGE execuction daemons on the allocated compute nodes. The user can then source a setup file created by MySGE to configure the shell environment. Once this is done, the user can use typical SGE queue commands to submit jobs to the personal SGE scheduler. The user can stop the private cluster by running vpc_stop.
Instructions
- Load the mysge module
module load mysge
- Initialize mysge for your account (you do this only once). This takes roughly a minute to complete so don't ctrl-c it unless it hangs for several minutes.
mysge_init
(the defaults should be fine. So just hit enter.)
- Source the vpc setup file. You will need to do this on login or add to your dot file. Note that qstat will now apply to your VPC not the normal system batch system.
. ~/.vpc.sh
or
source ~/.vpc.csh
- Start the vpc. Use the debug queue for quicker testing. The default size is 240 cores. Use the normal batch options to request a different number (i.e. -l mppwidth=480).
vpc_start -q debug
- Wait for the VPC to start. You can use vpc_status to monitor the request.
canon@hopper06:~> vpc_status
canon@hopper06:~> vpc_status
265639.sdb canon debug MySGE -- -- -- -- 00:30 Q --
canon@hopper06:~> vpc_status
265639.sdb canon debug MySGE 23058 -- -- -- 00:30 R 00:00
- Submit jobs to your VPC.
canon@hopper06:~> qsub ./job.q
canon@hopper06:~> qstat
job-ID prior name user state submit/start at queue slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
36 0.00000 job.q canon qw 04/08/2011 11:46:24 1
- To shutdown the VPC, run vpc_stop
vpc_stop
Important Considerations
Please be aware that while the virtual private cluster (VPC) is running, the user will be charged for all of the allocated cores regardless of whether there is an MySGE job running on the cores. This is because when the VPC is running, the cores are dedicated to the MySGE cluster and cannot be used by other users on the system. Once the user is finished with the cluster, they should issue a vpc_stop to stop the cluster and return the cores back to the standard scheduler.