Stuff

UoM::RCS::Talby::Danzek::SGE



Page Contents:


Page Group

How can a user influence job priority?

 -- deadline jobs
 -- posix priority
 -- resource reservation
 -- advance reservation

Bugs/Features

Troubleshooting

Job Scheduling







SGE Notes

 -- job is first task, control slaves...

 -- loose integration

 -- tight integration
     -- openmpi
     -- hp-mpi

Fair shares:

1. http://ait.web.psi.ch/services/linux/hpc/merlin3/sge/admin/


2. http://wikis.sun.com/display/gridengine62u3/How+to+Create+Project-Based+Share-Tree+Scheduling+With+QMON
   http://wikis.sun.com/display/gridengine62u3/Configuring+the+Share-Based+Policy#ConfiguringtheShare-BasedPolicy-ConfiguringtheShareTreePolicyWithQMON






-----------------


http://gridengine.sunsource.net/news/SGE62u5-announce.html

 -- includes topology-aware stuff


-----------------

http://wiki.gridengine.info/wiki/index.php/Main_Page

-----------------

Grid Engine Portal

 -- http://gridengine.sunsource.net/gep/GEP_Intro.html


Users authenticate to a portal interface from anywhere on the internet via a browser and can then:

    * Securely access and execute applications via a transparent interface to Grid Engine
    * Monitor the status of jobs running in Grid Engine
    * Securely upload input files to the Portal Server with the click of a button
    * Securely download output files to a local workstation with the click of a button
    * View X-windows based applications using VNC

Administrators can also remotely access the portal and perform administrative functions such as:

    * Registering applications for use with the GEP in a matter of minutes
    * Quickly building HTML interfaces to applications using templates that prompt users for input
    * Monitoring Grid Engine usage and statistics




----------------

 -- The battle:  Globus vs LSF --- is there not a third way via SGE's SDM.
 -- how is the multiclustering gonna work?
     -- requires common filesystems 
     -- requires standard s/w stack
         -- not gonna work...


 -- SGE is licensed under GPL
 -- howtos http://gridengine.sunsource.net/howto/howto.html
 -- drmaa api
 -- ARCo accounting and reporting (MySQL or Oracle)

---------------

SDM

http://blogs.sun.com/templedf/entry/service_domain_manager

supports

 -- cloud bursting
 -- powers down idle and underutilized machines
 -- not a metasheduler --- moves compute nodes from one cluster to another

http://wikis.sun.com/display/GridEngine/Using+SDM+With+the+Sun+Grid+Engine+Adapter


----------------

SGE on Campus

 -- redqueen
 -- mace01
 -- man2
 -- usto oran (MACE)
 -- pacemaker (MHS)
 -- templar (FLS)
 -- agent (FLS)  
 -- epsilon (EPS)
 -- Brian Blower's cluster (MHS)
 -- terra (Duncan Irving, Earth Sciences)
 


-----------------


Topology Aware Scheduling

http://blogs.sun.com/templedf/entry/topology_aware_scheduling


------------------


Checkpointing


http://gridengine.sunsource.net/howto/checkpointing.html
 -- integrates with Condor libraries/compiler

https://upc-bugs.lbl.gov//blcr/doc/html/FAQ.html
 -- BLCR