SGE 6 is significantly different from SGE 5.3. With SGE 6 come cluster-queues and host-groups — SGE 5.3's queues still exist and are now called queue-instances.
In SGE 5.3 a queue lived on a particular machine/node. In contrast, with SGE 6, we have cluster-queues which can be thought of as both a front-end to a cluster of nodes to which jobs can be submitted for execution on one of the nodes in the cluster, and as a type/class of which instances live on particular machines/nodes.
With SGE 6, unless you are actually dealing with MPI, PVM, etc, there is no need to handle parallel-environments, at all. Good.
To aid in the set up of cluster-queues, host-groups exist, which are simply a group (list) of machines/nodes.
...for a user to be able to submit a bunch (e.g., a dozen) of single-processor jobs to a "queue" which is a front-end to a number of machines/nodes each of which has say, 1 or 2 CPUs on it, and run say 3 or 4 jobs at once on each node (think hyperthreading, multicore) and for 3 × nodes (or 4 × nodes) jobs to run at once, with remaining jobs waiting their turn.
N.B. Ignore parallel-environments a la SGE 5.3; use a cluster-queue and instances of it.
qname simonh.q hostlist @allhosts ## instances of this queue will ## exist on these hosts seq_no 0 load_thresholds np_load_avg=1.75 suspend_thresholds NONE nsuspend 1 suspend_interval 00:05:00 priority 0 min_cpu_interval 00:05:00 processors 1 qtype BATCH INTERACTIVE ## ckpt_list NONE pe_list NONE rerun FALSE slots 1 ## number of jobs expect to run in ## an instance (i.e., on a compute ## node) usually the number of CPUs ## or cores (on a node) tmpdir /tmp shell /bin/bash . .
algorithm default schedule_interval 0:0:5 maxujobs 10 ## IMPORTANT! Max num ## of total jobs per user ## across all queues queue_sort_method load job_load_adjustments np_load_avg=0.50 load_adjustment_decay_time 0:7:30 load_formula np_load_avg schedd_job_info true flush_submit_sec 1 ## default is 0 flush_finish_sec 1 ## default is 0 params none reprioritize_interval 0:1:0 halftime 168 usage_weight_list cpu=1.000000,mem=0.000000,io=0.000000 . .
Configure, modify the state of, monitor queues using a gui. The most significant windows:
Show or modify the configuration of the given queue.
-mq <queuename> modify a queue config -sq <queuename> show queue config -sql list all queues -ssconf show schedular config
Modify the state of the given queue, e.g., disable it, clear an error state...
-cq <queuename> clear error state (E) of given queue (instance)
It seems that the schedular that comes with SGE is quite limited:
Use either qmon -> Scheduler Configuration or
qconf | grep -i schedul [-k{m|s}] shutdown master|scheduling daemon [-msconf] modify scheduler configuration [-Msconf fname] modify scheduler configuration from file [-sss] show scheduler state [-ssconf] show scheduler configuration [-tsm] trigger scheduler monitoring
Have a look in
$SGE_ROOT/default/spool/qmaster/messages