Improving Scalability of CCSM Components
||A plot of the simulation rates of
0.1-degree Parallel Ocean Program (POP) benchmark on the Earth Simulator
(ES), Cray RedStorm (RS), and Blue Gene Watson (BGW) supercomputers.
This plot shows that several code modifications can double the POP
benchmark's simulation rate compared to the best performance achieved in
its standard form. The RS and BGW systems can sustain a simulation rate
greater than five years/wall-clock day; this rate will allow long-timeframe
climate simulations to be completed within usefully short times.
The recent re-emergence of highly parallel systems has placed a
premium on code scalability. In particular, NSF plans to deploy a
Petascale sustained system in the 2010 timeframe has accelerated our
need to significantly increase the number of processors on which
application codes can run efficiently. In its standard configuration,
one of NCAR's flagship applicationsthe Community Climate System
Model (CCSM)only utilizes approximately 200 processors. Clearly,
the scalability of the CCSM model must be significantly increased to
successfully utilize compute cycles on the upcoming 100,000 to
500,000-processor petascale systems. Currently the systems providing
the largest scale parallelism are the IBM Blue Gene and Cray XT3 systems.
These prototype petascale systems provide the opportunity to examine and
improve the scalability of CCSM component models on large processor counts
in preparation for the upcoming NSF Petascale system.
We initially examined the ability of the Blue Gene/L system to perform
high-resolution ocean modeling using the Parallel Ocean Program (POP).
We used the POP 0.1-degree benchmark to gauge the impact of several code
modifications designed to increase simulation rate. The code modifications
involve: a) redesigning data structures in the POP solver to use
1-dimensional (1D) data structures versus the existing 2-dimensional (2D)
data structures, and b) adding a partitioning algorithm based on
space-filling curves. Both modifications reduce POP's compute resource
requirements, which allows for more efficient utilization of systems
with large processor counts and limited memory bandwidth.
The figure shows the simulation rates for the POP 0.1-degree benchmark
using the original and modified configurations on the Earth Simulator,
IBM Blue Gene/L, and Cray RedStorm supercomputers. Note that the
combination of the 1D-data-structure-based solver and space-filling curve
partitioning increases the simulation rate on 30,000 IBM Blue Gene/L
processors from 4.0 to 7.9 simulated years per wall-clock day. Our
techniques also improved the simulation rate on 7,600 Cray RedStorm
processors from 6.3 to 8.1 simulated years per wall-clock day. Our
discovery suggests that it may be possible to increase the scalability
of the entire CCSM model through similar code modifications. Currently,
work is underway to examine and improve the scalability of both the
Community Ice (CICE) model and the Community Land Model (CLM). Our
future plans include examining the scalability of the Community
Atmosphere Model (CAM) and the CCSM Flux-Coupler (CPL).
This work advances NCAR's strategic priority of "Conducting research
in computer science, applied mathematics, statistics, and numerical
methods." It is supported through the NSF cooperative Grant NSF01, and
through the Department of Energy CCPP program grant DE-FC03-97ER62402.