Reintroduction of liquid cooling for high performance computing
![]() |
|||
|
Piping, hoses, connectors, valves, temperature sensors, and a heat-exchange system are necessary to capture the waste heat from blueice. Chilled water flows from the insulated white piping to the heat exchanger where warm water is transferred back to the large-scale chilled water system. The manifold directs cool water to the rear doors of the computer cabinets on the left. This cooling technology increases the complexity of the facility required to operate increasingly high-performance supercomputers. CISL is developing valuable experience with this technology that will be critical for managing future supercomputing facilities. |
|||
With the installation of the IBM POWER5+ computer, the technology of cooling supercomputer-class systems comes full circle. Just under one decade ago the last liquid-cooled Cray system was decommissioned and removed from the Mesa Lab computing facility. During FY2007, CISL staff and members of UCAR Physical Plant Services planned and installed water cooling infrastructure to cool blueice, the first phase of the ICESS procurement.
The blueice system utilizes IBM's "coolblue" technology. This system captures the hot air expelled from the back of the computer and directs it across a large coil built into the rear door of each computer cabinet. This system captures nearly 60% of the waste heat before it enters the air in the computer room and transfers it into the chilled water system. Water cooling is significantly more efficient and is identified as a state-of-the-art practice by the recently published EPA guidelines for data centers. There is a certain amount of irony in this guideline given the lengthy history of liquid-cooled systems at the Mesa Lab.
While the system is much more efficient, water cooling does present certain challenges. Clearly, top-quality installations must be followed as water is moving much closer to the computers, and leaks are a risk that must be rigorously controlled. CISL has been very successful using in-house expertise to perform this rigorous level of installation. One additional risk is introduced with this technology specifically because it is more efficient. Because heat is captured directly, there is less time for staff to react to a failure of the cooling system. Chillers, pumps, and other mechanical devices take some time to restart after even momentary power anomalies. This issue becomes even more critical as we prepare to install the IBM POWER6 system for the second phase of the ICESS procurement. This system uses direct methods of heat transfer and further reduces the reaction time for cooling systems. Planning is already underway to add thermal storage to the Mesa Lab systems.
The concept of thermal storage is analogous to Uninterruptible Power Supplies (UPS) for the electrical systems. For mechanical systems, a reservoir of water, ice, or other liquid is stored in the system and works just like the batteries in a UPS system.
This advancement in computing facility technology fulfills NCAR's strategic goal to "Provide robust, accessible, and innovative information services and tools," and the related strategic priority of "Enhancing capability and capacity of NCAR supercomputing." This work is supported by NSF Core funding.
