Director's Message  |   Executive Summary  |   Divisional Narrative   |   Publications   |   Educational Activities   |   Awards   |   Community Service   |   Staff   |   Visitors and Collaborators   |   NCAR FY2003 ASR   |
 

Computing center operations and infrastructure

Computer system installations

bluesky computer room cabling
under floor

FY2003 saw the completion of not one but two significant installations of IBM equipment. The first was the culmination of all the planning for the second-phase ARCS system "bluesky." Well over six months of planning went into the final installation of this system which resulted in a smooth commissioning of the system in early FY2003.

In the summer of FY2003, plans were developed to increase the computing capabilities of bluesky. Ultimately plans were put into place to add 14 additional compute nodes. This addition increased the stresses on the computing infrastructure as the heat generation of each node represents an additional 42,000 BTU/hr. For the first time, SCD gained the ability to quantitatively measure the increase in electrical consumption based on the workload in the system. The following chart shows the kVA consumption of the system during a floating-point-intensive benchmark run. The three peaks in the chart coincide with progressively larger runs of the same benchmark.

computer room power
consumption based on workload

With the significant demands of the additional IBM equipment, OIS managed several upgrades to the infrastructure. These projects inculded the addition of more chilled water capacity, replacement of compressor-based air-conditioning units, and upgrades to the electrical distribution. These projects were completed with designs that also enhance reliability and eliminate single points of failure where possible.

Beyond the large installation of the IBM equipment, numerous server and network equipment changes in the data center required OIS participation. This included installations, commissioning, upgrading, and decommissioning.

Standby power progress

backup power generator

A number of delays continued to plague the installation of standby generator systems. Late in FY2003 the construction portion of the project was awarded with that work proceeding early in FY2004. One of two 1.2-MW generators is shown below. At the end of FY2003, both are poised for installation. The need for backup generation capabilities continues to be demonstrated as two electrical outages resulted in lost productivity on the supercomputing systems in FY2003.

Equipment tracking, maintenance, and licensing

Throughout FY2003, OIS purchased computing facility equipment. OIS ensures all UCAR policies are followed and documented. Hardware maintenance contracts and software licenses are required in the operations of all systems. Providing hardware and software support requires interfacing with UCAR/NCAR personnel about their requirements, researching equipment prices, and contacting vendors.

Other OIS responsibilities include tagging, tracking, and disposing of assets. Working closely with the UCAR Property Office, OIS tracks over 200 fixed assets (equipment and systems costing over $5,000). The Computer Production Group (CPG) staff continue to tag the equipment and enter its information into a database. The equipment is then delivered either to its usage location or to one of the systems groups for configuration and deployment. This process has greatly increased the accuracy and detail of the asset information contained within the database.

Computing center operations

During FY2003 the conversion of the 3490E media was completed. This reduced the total mounts on the manual tape drives by 80 percent. Further, this allowed the elimination of temporary staff positions that were utilized primarily for this purpose.

Our 24 x 7 Operations staff has recently gone through some transformations. In accordance with NCAR/UCAR's initative for developing human capital, several changes were made. To create an environment where operators can work alongside other staff in SCD, the schedule was transitioned to a rotating format. The rotating format allows all operators an opportuntity to work on projects, enhance their training, and have an equitable opportunity to interact with the rest of SCD.

OIS continues to manage the SCD portion of the business continuity plan. The plan contains vital information needed for recovering critical functions in the event of a catastrophic event. The plan is revised and tested twice per year as part of bi-annual preventative maintenance shutdowns.

Application infrastructure developments

The SCD Portal is a web-based entry point to SCD computing resources. The portal was released to all users in September 2003. Many groups contributed to the successful release of the portal. The Infrastructure Applications Group (IAG) completed the development work, and the Computer Production Group (CPG) and Technical Consulting both contributed to testing. CPG in particular executed one of the more complete and thorough test plans for any web-based application IAG has released. The use of eight operators as testers allowed full coverage with two browser combinations and four operating systems.

SCD portal user
interface

The initial prototype of a workflow system for providing lightweight, web-based software infrastructure for supporting distributed workflow-based collaboration called METIS was completed in FY2003. SCD contributed the design, development, and implementation of the execution engine portion of the system in collaboration with CU Boulder and USRA. An architectural diagram of the execution engine is shown in the figure below. The system is now being used by a number of digital libraries, and the effectiveness of the system is being evaluated.

METIS execution
engine diagram

IAG continues to support the Remedy-based problem-tracking system within the division. IAG also continues to support the Room Reservation System and the SKIL database for tracking SCD publications, community service activities, education and outreach activities, and visitors and collaborators.