LSF Deployment
| |
 |
| |
A partial LSF listing of running
and pending production jobs on NCAR's bluevista cluster system.
Platform Computing's Load Sharing Facility (LSF) provides dynamic
resource management and batch job scheduling services across
heterogeneous systems. LSF eliminates users' need to learn different
batch job submittal systems for various computers by presenting a
common user interface for submitting and managing production jobs
across all CISL-managed supercomputer clusters deployed since 2004. |
Two years ago, CISL undertook a business process re-engineering
effort to further our priority of deploying an efficient,
interoperable computing environment that facilitates collaborations.
This re-engineering effort specifically targeted three areas:
production batch job scheduling, resource management, and system
accounting. As a result, our legacy batch scheduling and accounting
systems that had been developed in-house were replaced with Platform
Computing's Load Sharing Facility (LSF), a commercial, off-the-shelf
product. LSF was chosen because:
- It provides a platform-independent, common user interface.
- It supports consolidated and centralized system accounting.
- It is easily extensible and reconfigurable.
This effort supports the NCAR strategic priority of "Developing
and providing advanced services and tools."
In FY 2006, CISL staff completed the transition to LSF. It is now
the common batch job scheduling system on all of the CISL-managed
supercomputers that have been deployed since 2004. In FY 2007, CISL
staff will begin a data-mining project against the consolidated batch
system accounting database. LSF has enabled CISL to combine the batch
system accounting data from all of the supercomputer clusters into a
common database, which will, in turn, facilitate data-mining
activities to support further scheduling optimizations and
improved resource allocations.
This project is made possible through NSF Core funds including CSL
funding.
|