CISL Annual Report banner  
   

Improving Scalability of CCSM Components

  Simulation rate improvements
  A plot of the simulation rates of 0.1-degree Parallel Ocean Program (POP) benchmark on the Earth Simulator (ES), Cray RedStorm (RS), and Blue Gene Watson (BGW) supercomputers. This plot shows that several code modifications can double the POP benchmark's simulation rate compared to the best performance achieved in its standard form. The RS and BGW systems can sustain a simulation rate greater than five years/wall-clock day; this rate will allow long-timeframe climate simulations to be completed within usefully short times.

The recent re-emergence of highly parallel systems has placed a premium on code scalability. In particular, NSF plans to deploy a Petascale sustained system in the 2010 timeframe has accelerated our need to significantly increase the number of processors on which application codes can run efficiently. In its standard configuration, one of NCAR's flagship applications—the Community Climate System Model (CCSM)—only utilizes approximately 200 processors. Clearly, the scalability of the CCSM model must be significantly increased to successfully utilize compute cycles on the upcoming 100,000 to 500,000-processor petascale systems. Currently the systems providing the largest scale parallelism are the IBM Blue Gene and Cray XT3 systems. These prototype petascale systems provide the opportunity to examine and improve the scalability of CCSM component models on large processor counts in preparation for the upcoming NSF Petascale system.

We initially examined the ability of the Blue Gene/L system to perform high-resolution ocean modeling using the Parallel Ocean Program (POP). We used the POP 0.1-degree benchmark to gauge the impact of several code modifications designed to increase simulation rate. The code modifications involve: a) redesigning data structures in the POP solver to use 1-dimensional (1D) data structures versus the existing 2-dimensional (2D) data structures, and b) adding a partitioning algorithm based on space-filling curves. Both modifications reduce POP's compute resource requirements, which allows for more efficient utilization of systems with large processor counts and limited memory bandwidth.

The figure shows the simulation rates for the POP 0.1-degree benchmark using the original and modified configurations on the Earth Simulator, IBM Blue Gene/L, and Cray RedStorm supercomputers. Note that the combination of the 1D-data-structure-based solver and space-filling curve partitioning increases the simulation rate on 30,000 IBM Blue Gene/L processors from 4.0 to 7.9 simulated years per wall-clock day. Our techniques also improved the simulation rate on 7,600 Cray RedStorm processors from 6.3 to 8.1 simulated years per wall-clock day. Our discovery suggests that it may be possible to increase the scalability of the entire CCSM model through similar code modifications. Currently, work is underway to examine and improve the scalability of both the Community Ice (CICE) model and the Community Land Model (CLM). Our future plans include examining the scalability of the Community Atmosphere Model (CAM) and the CCSM Flux-Coupler (CPL).

This work advances NCAR's strategic priority of "Conducting research in computer science, applied mathematics, statistics, and numerical methods." It is supported through the NSF cooperative Grant NSF01, and through the Department of Energy CCPP program grant DE-FC03-97ER62402.