1998 ASR Home
Back
SCD ASR Index
Next
SCD Home

Computational science

The SCD Computational Science Section (CSS) provides state-of-the-art expertise in computing that is beneficial to the atmospheric and oceanic sciences communities. Our efforts are focused in the following areas:

The results of our work are shared through collaborations, publications, software development, and seminars.

CSS conducts research in areas such as numerical analysis, computational fluid dynamics, parallel communications algorithms for massively parallel architectures, and numerical solutions to partial differential equations.

The section engages in collaborative research and development projects with groups within NCAR and from other institutions and agencies. These projects benefit the broader NCAR community. Currently, CSS is involved in projects in conjunction with the Climate and Global Dynamics (CGD) Division of NCAR and with support from the Department of Energy's (DOE) Computer Hardware, Advanced Mathematics, and Model Physics Project (CHAMMP) initiative, to develop a state-of-the-art coupled climate model framework that will run efficiently and effectively on distributed memory parallel computers. These projects are described below.

CSS is involved in technology tracking (performance monitoring and benchmarking studies) of both hardware and software. This work evaluates and ensures the efficient use of future computing resources and is critical in selecting the most appropriate computers for the future production computing needs of NCAR and the university community. On the software side, staff play an active role in the Fortran standards effort as well as evaluating programming languages, programming environments, and paradigms.

In addition, we develop software packages to make use of our research results and our numerical and computational expertise. These libraries of scientific software are used by researchers in the atmospheric and related sciences community.

Finally, CSS staff are active in the area of education, outreach, and knowledge transfer. We organize workshops, guest-lecture at universities, host post docs and summer visitors, and give seminars and talks at conferences.

Research

An efficient spectral transform method for solving the shallow water equations on the sphere

William F. Spotz, Mark A. Taylor, and Paul N. Swarztrauber have redirected attention to a fast pseudospectral method for the spherical shallow water equations developed by Philip Merilees in 1973, recently revived by Bengt Fornberg. In these works, the required spatial derivatives are computed by the formal differentiation of one-dimensional Fourier series approximations to both scalar and vector functions on the surface of the sphere. Filters must be used to alleviate prohibitive time-stepping restrictions and maintain stability on the non-isotropic latitude-longitude grids. Merilees' original filter was eventually found to be unusable, as it was unstable for longer runs. In this research, alternatives to Merilees' filter have been examined. In particular, a spherical harmonic filter has been used that consists of a harmonic analysis followed directly by a synthesis. The resulting stability and accuracy are identical to the traditional spectral transform method. Fewer Legendre transforms are required, since they are limited to the filter and not used to compute spatial derivatives. In theory, this approach can also be viewed as a fast spectral method since fast harmonic filters exist in the literature. Alternate fast Fourier filters are also being examined with intent to reproduce the accuracy and stability provided by the harmonic filter.

Transposing arrays on multicomputers using de Bruijn sequences

Transposing an NxN array that is distributed row- or column-wise across P=N processors is a fundamental communication task that requires time-consuming interprocessor communication. It is the underlying communication task for the fast Fourier transform (FFT) of long sequences and multi-dimensional arrays. It is also the key communication task for certain weather and climate models.

This paper presents a parallel transposition algorithm for hypercube and mesh-connected multicomputers with programmable networks. The optimal scheduling of network transmissions is not unique and known to be non-trivial. Here, scheduling is determined by a single de Bruijn sequence of N bits. The elements in each processor are first pre-ordered and then, in groups of log2P adjacent elements, either transmitted or not transmitted, depending on the corresponding bit in the de Bruijn sequence. The algorithm is optimal both in overall time and the time that any individual element is in the network.

The results are extended to other communication tasks including shuffles, bit reversal, index reversal, and general index-digit permutation. The case P not equal to N and rectangular arrays with non-power-of-two dimensions are also discussed. Algorithms for mesh-connected multicomputers are developed by embedding the hypercube in the mesh. The optimal implementation of the algorithms requires certain architectural features that are not currently available in the marketplace. This paper has been accepted for publication in the Journal of Parallel and Distributed Computing.

Spectral and high-order methods for atmospheric applications

Research is being conducted on ways of approximating the spatial derivatives required to solve the shallow water equations on the sphere which are faster and scale better than the spectral transform method currently in use. This has led to a revival of Merilees' pseudospectral method and Gilliland's fourth-order compact method. Both these methods require filtering to avoid the restrictive CFL (time-step) condition associated with latitude-longitude grids and to improve the stability of explicit time-stepping methods. In the past, sufficiently robust filters could not be found and the methods were abandoned.

We have found a robust and stable, albeit slow, filter that uses spherical harmonics. This filter has demonstrated that any reasonably accurate method used for spatial derivatives in spherical geometry is all that is needed, as long as the filter is sufficiently robust. Thus, the problem reduces to either (1) finding a fast version of the spherical harmonic filter, or (2) sufficiently emulating the spherical harmonic filter's stability properties with a more traditional FFT-based filter. Finding a fast Legendre transform has long proved elusive, but the fast spherical harmonic filter problem has provided new opportunities in this area. Similarly, projecting the spherical harmonic filter response onto FFT-space has helped guide us in designing new FFT filters, giving us two promising approaches to the fast filter problem.

References:
Spotz, Taylor, and Swarztrauber, Fast shallow-water equation solvers in latitude-longitude coordinates, J. Comput. Phys. 145(1), (1998) 432-444.

Spotz, Accuracy and performance of numerical wall boundary conditions for steady, 2D incompressible stream-function vorticity, accepted for publication by Int. J. of Numerical Methods in Fluids.

Spotz and Carey, Formulation and experiments with high-order compact schemes for nonuniform grids, Int. J. of Numerical Methods for Heat & Fluid Flow, 8(3) (1998) 288-303.

SEAM: Spectral Element Atmospheric Model

We have completed the development SEAM, a spectral element atmospheric model. This is a complete GCM dynamical core that uses a spectral element discretization in the horizontal direction (the surface of the sphere). In the vertical direction, SEAM makes use of a sigma coordinate finite difference discretization taken from the NCAR CCM3 [Kiehl et al. NCAR/TN-420+STR, 1996]. Spectral elements are a type of finite element method in which a high-degree spectral method is used within each element. The method provides spectral accuracy while retaining both parallel efficiency and the geometric flexibility of unstructured finite elements grids.

Performance and model validation

In 1998, most of the work for SEAM model validation and performance on DSMs was finished. SEAM is ideal for modern DSMs such as the HP Exemplar and Silicon Graphics Cray Origin, achieving between 100 and 150 MFlops per processor on as many as 240 processors. (Tested on the 256-processor HP SPP-2000 at Caltech.) The spectral accuracy of SEAM has also been demonstrated using two-dimensional shallow water test cases and the three-dimensional Held-Suarez dry dynamical core benchmark [Held and Suarez, Bull. Amer. Met. Soc. 1994]. This work has been described in:

Taylor, Loft, Tribbia, Performance of a spectral element atmospheric model (SEAM) on the HP Exemplar SPP-2000, NCAR/TN-439+EDD (1998).

Taylor, Tribbia, Iskandarani, The spectral element method for the shallow water equations on the sphere, J. Comput. Phys. 130 (1997) 92-108.

Haidvogel, Curchitser, Iskandarani, Hughes, Taylor, Global Modeling of the Ocean and Atmosphere Using the Spectral Element Method, Atmosphere-Ocean 35 (1997) 505-531.

The polar vortex

The parallel performance of SEAM coupled with the availability of NCAR's 64-processor HP Exemplar has allowed us to run the model at resolutions not possible with more conventional global atmospheric models. High-resolution runs of SEAM are currently being used to study several interesting atmospheric phenomena such as the polar vortex. This problem models the evolution of wintertime atmospheric flow in the polar stratosphere, i.e., at an altitude of 10 to 50 kilometers above sea level. The initial condition shows a cylindrical polar vortex visualized using the potential vorticity of the flow, which acts as a tracer. The sharp gradients of potential vorticity at the vortex edge isolate the polar air from the air at lower latitudes -- a condition favorable for wintertime polar ozone depletion. The simulation shows the effect of a pulsed excitation applied at the lower boundary. This produces a Rossby wave propagating upward along the vortex edge. The wave grows exponentially in amplitude as the density decreases with height, eventually tearing the vortex apart. Such vortex breakdown events are observed once or twice each winter in the Northern Hemisphere.

In collaboration with R. Saravanan (NCAR/CGD) and L. Polvani (Columbia University), we have been running some very-high-resolution simulations of the polar vortex. With SEAM, we have been able to perform these calculations with 200 levels in the vertical and an average grid spacing of 70 km over the globe, amounting to over 22 million grid points. The 25-day simulation took 2.5 days on the the 64-processor HP Exemplar. This run together with many runs at lower resolutions represents one of the few numerical resolution studies for three-dimensional atmospheric flows. The results are being used to develop a primitive equation test case for spherical geometry.

In collaboration with John Clyne (SCD), a high-resolution three-dimensional visualization has been made from these simulations. This video has been shown at many conferences, and it has been added to the official NCAR Visualization Lab demonstration tape. The video will also be submitted to the 1998 American Physical Society's Gallery of Fluid Motion.

Decaying turbulence on the sphere

Simulating three-dimensional decaying turbulence on the sphere is another experiment that requires the type of global high resolution that SEAM is capable of. We are currently using SEAM to study this problem in collaboration with J. Tribbia (NCAR/CGD), R. Saravanan, and L. Polvani. This work is similar to the well-known three-dimensional quasi-geostrophic experiment by J. McWilliams, J. Weiss, and I. Yavneh, "Anisotropy and coherent vortex structures in planetary turbulence," in Science 264 pp 410-413, 1994. However, with the SEAM model on a 64-processor DSM, we will be able to perform this turbulence simulation for the first time using the full primitive equations at very high resolution and in full spherical geometry.

Spectral elements on triangles

At present, the diagonal mass matrix spectral element method (as used in SEAM described above) can only be used with conforming quadrilateral grids. This is because the method relies on a Gauss-like quadrature formula where there are the same number of quadrature points as the dimension of the functional space. It is generally believed that such optimal quadrature formulas do not exist in the triangle. In this work we propose a more general derivation of the diagonal mass matrix spectral element method based on Fekete points. For quadrilateral elements, this new derivation is identical to the conventional spectral element method. But it also allows many other types of elements like triangles, hexagons, and tetrahedra, while retaining the exponential convergence and the diagonal mass matrix of the original method. This work has been done in collaboration with Beth Wingate (Los Alamos National Laboratory) and Rachel Vincent (Rice University).

We first solve a Sturm-Liouville problem for the element of interest (such as the triangle or square). The resulting eigenfunctions are used to determine the correct functional space to be used in the method. Once the functional space is known, we use the Fekete criterion to compute near-optimal grids for these spaces that have the same number of points as the dimension of the functional space. This allows the construction of a well-behaved cardinal function basis that leads to a diagonal mass matrix.

The Sturm-Liouville problem for the square is not new, and it leads to the standard diamond truncation of polynomials. The Fekete points for the square with this truncation of polynomials are known to be the tensor product of Gauss-Lobatto points, making this method equivalent to the standard spectral element method. The Sturm-Liouville problem for the triangle suggests the correct functional space is the triangular polynomial truncation. Unlike optimal Gauss-Lobatto integration points, Fekete points are also defined for the triangle and this functional space. Furthermore, theoretical and numerical evidence suggests that Fekete boundary points are the one-dimensional Gauss-Lobatto points, making Fekete point triangular elements and quadrilateral elements naturally conform. Thus triangles and quadrilaterals can be combined in the same grid while retaining a diagonal mass matrix. The method naturally extends to other domains such as tetrahedra and hexagons.

This work has been described in:

R. E. Vincent, Computation of Fekete Points in a Triangular Domain, SOARS/NCAR report, 1998.

M. Taylor and B. Wingate, A generalized diagonal mass matrix spectral element method for non-quadrilateral elements, submitted Appl. Num. Math. 1998.

M. Taylor and B. Wingate, The Fekete collocation points for triangular spectral elements, submitted SIAM J. Numer. Anal. 1998.

B. Wingate and M. Taylor, The natural function space for triangular spectral elements, submitted SIAM J. Numer. Anal. 1998.

Optimum Parallel Algorithms and Architectures for the Harmonic Transform: The Vector Multiprocessor

The first massively parallel processors (MPP) became available in the late 1980s, with considerable expectations brought about by their impressive peak performance. However, it soon became evident that the sustained or actual performance of these machines could be significantly less than peak because of the time required for interprocessor and related communication. In an effort to improve performance, parallel communication algorithms were developed, which in theory, demonstrated that actual performance could be substantially improved, to the extent that it met or even exceeded that of traditional multiprocessors. However, this was not possible at that time because the MPP architectures did not have the features necessary to provide optimal implementation of these algorithms. That is, wallclock time could not match theoretical algorithmic performance without certain architectural features that were not available in the marketplace.

These features have been incorporated into a recently patented parallel computer called the Vector Multiprocessor (VMP). It is the product of an effort to determine the optimum algorithmic and architectural environment for weather/climate models. It is a novel, general-purpose, high-performance, scalable multiprocessor that evolved from the answers to such questions as: How do we minimize communication and maximize performance? Will communication ultimately dominate multiprocessor performance? Is there a "best" multiprocessor architecture?

In addition to the usual complement of logic and arithmetic units, each processor contains a programmable communication unit. Interprocessor communication tasks are performed to and from local vector registers in the same way that computational tasks are performed on a vector uniprocessor. The VMP brings to the multiprocessor system what vectorization brought to the single processor. Here we determine optimum multiprocessor performance for the key computational kernels used in spectral models of climate and global dynamics, and in doing so, define the VMP. This paper has been submitted to the International Journal of High Performance Computing.

Linux PC clusters

Recently, significant price-performance advantages have been demonstrated on applications running on systems constructed entirely of commodity hardware components. These systems, called Beowulf clusters, have recently demonstrated sustained price-performance levels below $100 per sustained MFlops for certain problems.

At the same time, maturing numerical methods, such as spectral element methods, have made it possible for a certain classes of problems in earth systems modeling to obtain greatly improved scaling and performance on parallel RISC systems.

In response to these developments, John Dennis and Dr. Rich Loft in CSS have been experimenting with Beowulf systems since March 1998. They have created a prototype Beowulf system at NCAR. The initial system consisted of two 2-processor PC nodes. The processor used is the 300-MHz Pentium II. Each node has 128 MBytes of memory. The interconnect is 100 Mbit ethernet based.

In July of this year, the prototype was expanded to contain four 2-processor nodes. Measured results on this prototype eight-processor Beowulf system include 580 MFlops for the Spectral Element Atmosphere Model (SEAM), and an estimated 500 MFlops sustained for a very-high-resolution T1023 spectral shallow water model. This demonstrates that price-performance levels of below $20/MFlops are achievable for at least some classes of atmospheric or ocean models. These results are encouraging.

Developing expertise with the hardware and software components, system administration practices, and tools required to deploy Beowulf systems tuned for earth systems modeling at aggressive price-performance levels is another goal of this project. Progress has been made in areas including the scalability of system administration solutions, system-wide security, network performance, net-booting of processing nodes, and system performance monitoring.

Technology tracking

The Computational Science Section (CSS) has responsibility for tracking the developments in high-performance computing technology for SCD and NCAR. CSS staff assess the capabilities of the latest systems (hardware and software) available from vendors and evaluate the performance of these systems on problems typically found in the atmospheric sciences community. These systems, from workstations to supercomputers, are evaluated primarily through hands-on work and benchmarks.

The NCAR benchmark suite was initially developed for use in the 1995 ACE open procurement. These benchmarks are characteristic of the applications run at NCAR represent the NCAR computing workload.

Description of benchmarks

The NCAR benchmark codes are categorized as kernels and applications. The complete set is available through anonymous FTP, and descriptions of the benchmarks are given in README files for each code. The codes are published in the CSS ftp site ftp.ucar.edu/css/

The kernels are a set of relatively simple codes that measure important aspects of the system such as CPU speed, memory bandwidth, and efficiency of intrinsic functions. In addition, there is a shallow water benchmark. The shallow water equations give a primitive but useful representation of the dynamics of the atmosphere. The model has been used over the past decade to evaluate the high-end performance of computer systems. Finite difference approximations are used to solve the set of two-dimensional partial differential equations. There is a single-processor version (with Cray multitasking directives), a parallel PVM version, and a Fortran 90 version of the model. The benchmark varies over a variety of grid size decompositions and gives feedback on problem decomposition for good cache utilization. The kernel codes are as follows:

In addition, the NCAR benchmark suite includes codes that serve as more extensive tests and some full applications. There is the Spectral Element Atmospheric Model (SEAM), which is a dry dynamical core. Full applications include the Los Alamos-developed ocean model called POP, the NCAR/Penn State University mesoscale code MM5, and the NCAR Community Climate Model Version 3.2 (CCM3.2), which is a three-dimensional global climate model. This version of the CCM was released in 1997.

System benchmarking

U.S. high performance computer manufacturers are predominantly focusing on building their products from commodity processors and offering systems with a distributed shared memory (DSM) architecture. Two such systems, one with 64 processors from HP and one with 128 processors from Silicon Graphics, are available to NCAR users. The HP was installed in spring 1997, and the Silicon Graphics system was installed in the spring of 1998. Below we show how these two systems perform relative to some elements of the benchmark suite.


DSM Systems at NCAR

Vendor System Installed Clock
(MHz)
Processor
Peak
(MFLOPS)
HP Exemplar SPP-2000 5/97 180 720
Silicon Graphics Cray Origin2000 6/98 195 390

SEAM benchmark performance

POP benchmark performance

CCM3.2 benchmark performance

Software libraries

MUDPACK

National and international distribution continues. In 1997 and 1998, 55 people from foreign and U.S. institutions received software and consulting help. The entire package was rewritten during the last year. The new Fortran 90 version 4.0 is more efficient and amenable to parallelization than earlier versions. Preliminary tests to create a portable parallel version from 4.0 are underway. A software website has been established to automate dissemination of software, descriptive material, and source code for users who sign a UCAR licensing agreement. Earlier users of MUDPACK are being notified of the availability of the new version of MUDPACK on the web.

SPHEREPACK

An article "SPHEREPACK 3.0: A Model Development Facility" has just been accepted for publication in the Monthly Weather Review. During the last year version 3.0 of SPHEREPACK was created from version 2.0 to correct problems with mixed types in argument passage. A SPHEREPACK website has been established for disseminating software information and source code. Fifty-one people from U.S. and foreign instutions have downloaded versions 2.0 and 3.0 of SPHEREPACK. The software has been incorporated into the CCM processor for manipulating and displaying scalar and vector data on the sphere in conjunction with NCAR Graphics.

REGRIDPACK

The original TLCPACK was converted to a Fortran-90-compatible version now called REGRIDPACK. This package "regrids" or transfers data between higher dimensional grids using vectorized linear or cubic interpolation. For portability, all codes have been tested on different platforms with both Fortran 77 and Fortran 90 compilers.

A REGRIDPACK website describing the package and providing access to the software has been established.

Collaboration

CHAMMP Parallel/Distributed Coupled Modeling

CSS staff works on the collaborative Parallel Climate Model (PCM) project with Warren Washington et al in CGD at NCAR as well as with scientists at Los Alamos National Lab and the Naval Postgraduate School. This work is sponsored in part by the U.S. Department of Energy. The model consists of three components -- CCM3.2, POP, and ICE -- that can be run independently in standalone mode for verification, or together with the flux coupler (FCD) in coupled mode. The atmosphere model is run at a nominal 2.8-degree horizontal resolution. The ocean model is run at a nominal 2/3-degree horizontal resolution. The ice model is split into to patches over the north and south poles and has a resolution of 1/4 degree. The source code for each component resides in a shared CVS repository so that the CGD PCM group and the CSS group can develop the model concurrently using the same code base.

The initial work in 1996 involved design and development of a coupled modeling framework suitable for coupled modeling on the T3D. After considering several alternatives, it was determined that the best solution for the T3D was to incorporate all component models into a single executable. However, the code retains an architecture amenable to splitting the components back out into separate executables. As part of this effort, we developed a distributed memory flux-coupling strategy for this coupled model.

In 1997, we developed an MPI version of PCM that runs efficiently on the latest generation of distributed shared memory (DSM) parallel computers, as well as on traditional massively parallel processors (MPPs).

Our goal was to develop a code that can readily exploit the three potential production platforms likely to be at our disposal during the next 12-24 months: the Cray T3E at NERSC, Silicon Graphics Cray Origin2000s at LANL and NCAR, and the HP Exemplar SPP-2000 X Class at NCAR. This has been accomplished. In addition, this coupled model will port, with only a modest effort, to other systems that support standard MPI.

The MPI version of CCM3.2 is now running and validated on all three platforms listed above. In addition, our performance measurements using PCM on the 64-PE Origin2000 at NCAR demonstrate that this model runs at approximately five hours per simulated year of climate. The current resolutions of PCM are CCM3.2 at T42L18, POP at 2/3 degree and 32 levels, and a 10-km ice model.

In terms of scalability, we have demonstrated that dPCM (PCM with atmospheric forcing rather than CCM3) scales well to 256 processors on the NERSC T3E. Currently, CCM3.2 T42L18 is limited to 64 PEs because of its latitudinal decomposition.

We are collaborating closely with Warren Washington and CGD staff to ensure that code improvements and efficiencies meet their science objectives.

Future work

Our planned work for the next year focuses first on validating a two-dimensional parallel decomposition of CCM3.2 that we will use to study scalability to 100s of processors for T42L18. Then we will pursue a two-dimensional decomposition of CCM3.6, the latest version of CCM3. This will then be incorporated into PCM. We will also modify the atmospheric part of the flux coupler to ensure that it too will scale beyond 64 processors. Finally, we will performance-tune all component models and investigate scalability of the entire coupled model on the T3E and clustered DSM systems. We have a performance target of one hour per simulated year of climate.

Other collaborations

CSS maintains active research and development collaborations with NCAR staff and members of the academic community. Some of our collaborators include:

Joe Tribbia (NCAR/CGD)
John Clyne (NCAR/SCD)
Lorenzo Polvani (Columbia University)
Beth Wingate (LANL)
Rachel Vincent (Rice University)

Education, outreach, and technology transfer

Mark Taylor participated in the SOARS program as a science mentor for SOARS protege Rachel Vincent. Steve Hammond was a writing mentor for SOARS protege Michelle Dunn.

CSS staff co-sponsored the Workshop on Climate, Ocean, and Weather Modeling Benchmarks at the NCAR Mesa Lab, September 22-23, 1998.

During FY1998, CSS staff gave 22 scientific and technical presentations.

During the first half of FY1998, CSS processed all outside requests for computing at NCAR. This included both the general Community machines and the Climate Simulation Laboratory systems. In that time frame, CSS approved 110 requests for 59,386 General Accounting Units (GAUs). In addition, we processed requests requiring panel action for use of the Climate Simulation Laboratory resources totalling 413,996 equivalent Cray C90 hours. The panel allocated 278,704 hours.

Streamlining and automating the process for handling user requests for SCD computing time

New websites were established and maintained for the SCD Advisory Panel, the CSL Advisory Panel, and for users wanting to apply for computing resources.

A link was established to an existing website for users to access new user documentation online. This eliminated printing costs estimated to be $11.00 per packet plus the handling of the numerous hardcopy documents by two SCD staff members that had been required to send hardcopy materials to new users.

The first electronic CSL Panel Book was made available for the April meeting of the CSL Advisory Panel.

It was determined that only electronic requests for computing time should be accepted. This procedure was approved by the SCD Advisory Panel and implemented.

The review process was changed. All hardcopy forms and letters were eliminated. Reviews along with a copy of the computing request were sent to potential reviewers. Turnaround time for reviewing requests decreased substantially.

The notification process for approving requests for computing time was reformed and is now handled electronically. In addition, the procedure for notifying users that the NSF grants were expiring and/or GAU usage had exceeded the approved amount were modified and prepared automatically via reports generated from the Oracle database.

These efforts were leading to the final stages of the planned goal of automating the entire allocations process. The task of handling computing allocations as well as the automating and streamlining project was transferred from the Computational Science Section to the Operations and Information Support Section of SCD effective June 1, 1998.

Technology transfer

CSS staff continue to interact with SRC Computers of Colorado Springs, Colorado and Tera Computer Corporation of Seattle, Washington. These are two relative newcomers to the US HPC market. CSS staff have visited both companies and provided benchmark codes to enable these startups to evaluate their progress relative to our applications.

1998 ASR Home
Back
SCD ASR Index
Next
SCD Home