CISL 2007 annual report banner

Improving scalability of CCSM components

 
 
POP simulation rates

A plot of the simulation rates of 0.1-degree Parallel Ocean Program (POP) benchmark on the Earth Simulator (ES), Cray RedStorm XT3 (RS/XT3) and XT4 (RS/XT4), and Blue Gene Watson (BGW) supercomputers. This plot shows that several code modifications can double the POP benchmark's simulation rate compared to the best performance achieved in its standard form. The RS and BGW systems can sustain a simulation rate greater than five years per wall-clock day; this rate will allow long-timeframe climate simulations to be completed within usefully short times.

 

The recent re-emergence of highly parallel systems has placed a premium on code scalability. In particular, NSF plans to deploy a petascale sustained system in the 2010 timeframe has accelerated our need to significantly increase the number of processors on which application codes can run efficiently. In its standard configuration, one of NCAR's flagship applications—the Community Climate System Model (CCSM)—only utilizes approximately 200 processors. Clearly, the scalability of the CCSM model must be significantly increased to successfully utilize compute cycles on the upcoming 100,000 to 500,000-processor petascale systems. Currently the systems providing the largest scale parallelism are the IBM Blue Gene and Cray XT4 systems. These prototype petascale systems provide the opportunity to examine and improve the scalability of CCSM component models on large processor counts in preparation for the upcoming NSF petascale system.

We initially examined the ability of the Blue Gene/L system to perform high-resolution ocean modeling using the Parallel Ocean Program (POP). We used the POP 0.1-degree benchmark to gauge the impact of several code modifications designed to increase simulation rate. The code modifications involve: a) redesigning data structures in the POP, and b) adding a partitioning algorithm based on space-filling curves. Both modifications reduce POP's compute resource requirements, which allows for more efficient utilization of systems with large processor counts and limited memory bandwidth.

The figure shows the simulation rates for the POP 0.1-degree benchmark using the original and modified configurations on the Earth Simulator, IBM Blue Gene/L, and Cray RedStorm supercomputers. Note that the combination of the 1D-data-structure-based solver and space-filling curve partitioning increases the simulation rate on 30,000 IBM Blue Gene/L processors from 4.0 to 7.9 simulated years per wall-clock day. Our techniques also improved the simulation rate on 7,600 Cray RedStorm processors from 6.3 to 8.1 simulated years per wall-clock day. Based on our success with POP, we have used space-filling curves to increase the simulation rate of the Community Ice CodE (CICE) at 0.1 degrees by 33%.

In FY2007, the success with POP and CICE has spurred the development of a sequential CCSM coupler, a significant simplification in design versus the current design. The creation of the sequential coupler, improvements in the memory footprint of the Community Land Model (CLM) at large processor counts, and the improvements in the scalability of the Community Atmosphere Model (CAM) by our Department of Energy collaborators, has enabled the creation of a sequential CCSM capable of executing in low-memory environments. A development version of CCSM, based on the sequential coupler, currently runs on Blue Gene at production resolutions.

In FY2008, development will proceed on an ultra-high-resolution configuration that is suitable for execution on 10,000 to 30,000 processors.

This work advances NCAR's strategic priority of "Conducting research in computer science, applied mathematics, statistics, and numerical methods." It is supported by NSF Core funding, as well as the Department of Energy CCPP program grant DE-FC03-97ER62402 and DE-PS02-07ER07-06.