Experimental Computing Systems: Blue Gene/L
| |
 |
| |
Mach profiles of separated flow over a sphere (top) and dynamic stall of a NACA-0012 airfoil (bottom). The objective of this investigation is to assess the fidelity of a newly developed variational multiscale method for large eddy simulation of separated turbulent flows on unstructured moving meshes. These images were produced from flow simulations done on frost, the IBM Blue Gene/L system at NCAR, and they demonstrate the power of this experimental system. (Image courtesy of C. Farhat and A. Rajasekharan.) |
In March 2005, NCAR became one of the first sites in the world to receive an IBM Blue Gene/L (BG/L) supercomputing system. The system, named frost, consists of a single BG/L rack (2,048 compute processors, 64 I/O processors, 5.73 TFLOPS peak) and appears as the 61st fastest computer in the world in the 25th Top500 List (released in June 2005). Frost is an experimental system supporting 12 researchers from NCAR, the University of Colorado at Boulder, and the University of Colorado at Denver who are investigating and addressing the technical obstacles to achieving practical petascale computing in geoscience, aerospace engineering, and mathematical applications. The opportunity to experiment with systems like BG/L is absolutely essential for NCAR to maintain its ability to provide capability and capacity supercomputing to the community. Moreover, low-power systems like BG/L (only 25 KW for 5.73 TFLOPS) offer the promise of significantly reducing the strain on the NCAR Mesa Lab's computing facility.
To consolidate experimental system research in CISL, the Research Systems Evaluation Team (ReSET) was formed in late 2005. The mission of ReSET is to administer and evaluate strategically selected experimental systems for the Laboratory in such a way as to gain the maximum knowledge of and impact from emerging technologies. ReSET is housed in CISL's Computational Science Section (now renamed Computer Science Section), but collaborates with staff members from other sections and groups across CISL to accomplish its mission. In mid-November 2005, frost became the first experimental system managed by ReSET.
During FY 2006, members of ReSET have worked with the frost user community to significantly increase both the number and breadth of applications capable of running on frost. One example of the impact of this effort is the scaling of
POP to 28,972 processors on a BG/L system at IBM's T.J. Watson Research Center. Though frost is an experimental system, ReSET's success in expanding the user and application code base has produced system usage levels that are similar to those seen on the production supercomputing systems managed by CISL. In addition to providing user support, the team continues to work through the Blue Gene consortium and SP-XXL to improve the BG/L system software stack and influence development of the software stack for the follow-on system, BlueGene/P. One example of this effort is the collaboration with Argonne National Laboratory to further develop Cobalt, the queuing system currently being used on frost, by incorporating alternate scheduling strategies.
In FY 2007, frost will move outside the UCAR security perimeter and become a
TeraGrid resource. This presents numerous cyberinfrastructure integration challenges (e.g., how to integrate Cobalt, or another scheduler, into the TeraGrid’s Coordinated TeraGrid Software and Services software suite) and opportunities (e.g., the ability to provide the newly acquired Lustre storage system as a tightly integrated resource to frost’s user community). An additional challenge is that frost will enter its third year of service in March 2007, and planning for a successor system is needed.
Here is the
frost wiki.
This work is made possible through NSF MRI Grants CNS-0421498, CNS-0420873, and CNS-0420985 and through the IBM Shared University Research (SUR) program with the University of Colorado. NSF Core funding also supports this system.
|