|
|
|
|
|
|

The results of our work are shared through collaborations, publications, software development, and seminars.
CSS conducts research in areas such as numerical analysis, computational fluid dynamics, parallel communications algorithms for massively parallel architectures, and numerical solutions to partial differential equations.
The section engages in collaborative research and development projects with groups within NCAR and from other institutions and agencies. These projects benefit the broader NCAR community. Currently, CSS is involved in projects in conjunction with the Climate and Global Dynamics (CGD) Division of NCAR and with support from the Department of Energy's (DOE) Computer Hardware, Advanced Mathematics, and Model Physics Project (CHAMMP) initiative, to develop a state-of-the-art coupled climate model framework that will run efficiently and effectively on distributed memory parallel computers. These projects are described below.
CSS is involved in technology tracking (performance monitoring and benchmarking studies) of both hardware and software. This work evaluates and ensures the efficient use of future computing resources and is critical in selecting the most appropriate computers for the future production computing needs of NCAR and the university community. On the software side, staff play an active role in the Fortran standards effort as well as evaluating programming languages, programming environments, and paradigms.
In addition, we develop software packages to make use of our research results and our numerical and computational expertise. These libraries of scientific software are used by researchers in the atmospheric and related sciences community.
Finally, CSS staff are active in the area of education, outreach, and knowledge transfer. We organize workshops, guest-lecture at universities, host post docs and summer visitors, and give seminars and talks at conferences.
This paper presents a parallel transposition algorithm for hypercube and mesh-connected multicomputers with programmable networks. The optimal scheduling of network transmissions is not unique and known to be non-trivial. Here, scheduling is determined by a single de Bruijn sequence of N bits. The elements in each processor are first pre-ordered and then, in groups of log2P adjacent elements, either transmitted or not transmitted, depending on the corresponding bit in the de Bruijn sequence. The algorithm is optimal both in overall time and the time that any individual element is in the network.
The results are extended to other communication tasks including shuffles, bit reversal, index reversal, and general index-digit permutation. The case P not equal to N and rectangular arrays with non-power-of-two dimensions are also discussed. Algorithms for mesh-connected multicomputers are developed by embedding the hypercube in the mesh. The optimal implementation of the algorithms requires certain architectural features that are not currently available in the marketplace. This paper has been accepted for publication in the Journal of Parallel and Distributed Computing.
We have found a robust and stable, albeit slow, filter that uses spherical harmonics. This filter has demonstrated that any reasonably accurate method used for spatial derivatives in spherical geometry is all that is needed, as long as the filter is sufficiently robust. Thus, the problem reduces to either (1) finding a fast version of the spherical harmonic filter, or (2) sufficiently emulating the spherical harmonic filter's stability properties with a more traditional FFT-based filter. Finding a fast Legendre transform has long proved elusive, but the fast spherical harmonic filter problem has provided new opportunities in this area. Similarly, projecting the spherical harmonic filter response onto FFT-space has helped guide us in designing new FFT filters, giving us two promising approaches to the fast filter problem.
References:
Spotz, Taylor, and Swarztrauber,
Fast shallow-water equation solvers in latitude-longitude
coordinates, J. Comput. Phys. 145(1), (1998) 432-444.
Spotz, Accuracy and performance of numerical wall boundary conditions for steady, 2D incompressible stream-function vorticity, accepted for publication by Int. J. of Numerical Methods in Fluids.
Spotz and Carey, Formulation and experiments with high-order compact schemes for nonuniform grids, Int. J. of Numerical Methods for Heat & Fluid Flow, 8(3) (1998) 288-303.
Taylor, Loft, Tribbia, Performance of a spectral element atmospheric model (SEAM) on the HP Exemplar SPP-2000, NCAR/TN-439+EDD (1998).
Taylor, Tribbia, Iskandarani, The spectral element method for the shallow water equations on the sphere, J. Comput. Phys. 130 (1997) 92-108.
Haidvogel, Curchitser, Iskandarani, Hughes, Taylor, Global Modeling of the Ocean and Atmosphere Using the Spectral Element Method, Atmosphere-Ocean 35 (1997) 505-531.
We first solve a Sturm-Liouville problem for the element of interest (such as the triangle or square). The resulting eigenfunctions are used to determine the correct functional space to be used in the method. Once the functional space is known, we use the Fekete criterion to compute near-optimal grids for these spaces that have the same number of points as the dimension of the functional space. This allows the construction of a well-behaved cardinal function basis that leads to a diagonal mass matrix.
The Sturm-Liouville problem for the square is not new, and it leads to the standard diamond truncation of polynomials. The Fekete points for the square with this truncation of polynomials are known to be the tensor product of Gauss-Lobatto points, making this method equivalent to the standard spectral element method. The Sturm-Liouville problem for the triangle suggests the correct functional space is the triangular polynomial truncation. Unlike optimal Gauss-Lobatto integration points, Fekete points are also defined for the triangle and this functional space. Furthermore, theoretical and numerical evidence suggests that Fekete boundary points are the one-dimensional Gauss-Lobatto points, making Fekete point triangular elements and quadrilateral elements naturally conform. Thus triangles and quadrilaterals can be combined in the same grid while retaining a diagonal mass matrix. The method naturally extends to other domains such as tetrahedra and hexagons.
This work has been described in:
R. E. Vincent, Computation of Fekete Points in a Triangular Domain, SOARS/NCAR report, 1998.
M. Taylor and B. Wingate, A generalized diagonal mass matrix spectral element method for non-quadrilateral elements, submitted Appl. Num. Math. 1998.
M. Taylor and B. Wingate, The Fekete collocation points for triangular spectral elements, submitted SIAM J. Numer. Anal. 1998.
B. Wingate and M. Taylor, The natural function space for triangular spectral elements, submitted SIAM J. Numer. Anal. 1998.
These features have been incorporated into a recently patented parallel computer called the Vector Multiprocessor (VMP). It is the product of an effort to determine the optimum algorithmic and architectural environment for weather/climate models. It is a novel, general-purpose, high-performance, scalable multiprocessor that evolved from the answers to such questions as: How do we minimize communication and maximize performance? Will communication ultimately dominate multiprocessor performance? Is there a "best" multiprocessor architecture?
In addition to the usual complement of logic and arithmetic units, each processor contains a programmable communication unit. Interprocessor communication tasks are performed to and from local vector registers in the same way that computational tasks are performed on a vector uniprocessor. The VMP brings to the multiprocessor system what vectorization brought to the single processor. Here we determine optimum multiprocessor performance for the key computational kernels used in spectral models of climate and global dynamics, and in doing so, define the VMP. This paper has been submitted to the International Journal of High Performance Computing.
At the same time, maturing numerical methods, such as spectral element methods, have made it possible for a certain classes of problems in earth systems modeling to obtain greatly improved scaling and performance on parallel RISC systems.
In response to these developments, John Dennis and Dr. Rich Loft in CSS have been experimenting with Beowulf systems since March 1998. They have created a prototype Beowulf system at NCAR. The initial system consisted of two 2-processor PC nodes. The processor used is the 300-MHz Pentium II. Each node has 128 MBytes of memory. The interconnect is 100 Mbit ethernet based.
In July of this year, the prototype was expanded to contain four 2-processor nodes. Measured results on this prototype eight-processor Beowulf system include 580 MFlops for the Spectral Element Atmosphere Model (SEAM), and an estimated 500 MFlops sustained for a very-high-resolution T1023 spectral shallow water model. This demonstrates that price-performance levels of below $20/MFlops are achievable for at least some classes of atmospheric or ocean models. These results are encouraging.
Developing expertise with the hardware and software components, system administration practices, and tools required to deploy Beowulf systems tuned for earth systems modeling at aggressive price-performance levels is another goal of this project. Progress has been made in areas including the scalability of system administration solutions, system-wide security, network performance, net-booting of processing nodes, and system performance monitoring.
The NCAR benchmark suite was initially developed for use in the 1995 ACE open procurement. These benchmarks are characteristic of the applications run at NCAR represent the NCAR computing workload.
The kernels are a set of relatively simple codes that measure important aspects of the system such as CPU speed, memory bandwidth, and efficiency of intrinsic functions. In addition, there is a shallow water benchmark. The shallow water equations give a primitive but useful representation of the dynamics of the atmosphere. The model has been used over the past decade to evaluate the high-end performance of computer systems. Finite difference approximations are used to solve the set of two-dimensional partial differential equations. There is a single-processor version (with Cray multitasking directives), a parallel PVM version, and a Fortran 90 version of the model. The benchmark varies over a variety of grid size decompositions and gives feedback on problem decomposition for good cache utilization. The kernel codes are as follows:
In addition, the NCAR benchmark suite includes codes that serve as more extensive tests and some full applications. There is the Spectral Element Atmospheric Model (SEAM), which is a dry dynamical core. Full applications include the Los Alamos-developed ocean model called POP, the NCAR/Penn State University mesoscale code MM5, and the NCAR Community Climate Model Version 3.2 (CCM3.2), which is a three-dimensional global climate model. This version of the CCM was released in 1997.
| DSM Systems at NCAR | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Vendor | System | Installed | Clock (MHz) | Processor Peak (MFLOPS) HP
| Exemplar SPP-2000
| 5/97
| 180
| 720
| Silicon Graphics Cray
| Origin2000
| 6/98
| 195
| 390
| |
A REGRIDPACK website describing the package and providing access to the software has been established.
The initial work in 1996 involved design and development of a coupled modeling framework suitable for coupled modeling on the T3D. After considering several alternatives, it was determined that the best solution for the T3D was to incorporate all component models into a single executable. However, the code retains an architecture amenable to splitting the components back out into separate executables. As part of this effort, we developed a distributed memory flux-coupling strategy for this coupled model.
In 1997, we developed an MPI version of PCM that runs efficiently on the latest generation of distributed shared memory (DSM) parallel computers, as well as on traditional massively parallel processors (MPPs).
Our goal was to develop a code that can readily exploit the three potential production platforms likely to be at our disposal during the next 12-24 months: the Cray T3E at NERSC, Silicon Graphics Cray Origin2000s at LANL and NCAR, and the HP Exemplar SPP-2000 X Class at NCAR. This has been accomplished. In addition, this coupled model will port, with only a modest effort, to other systems that support standard MPI.
The MPI version of CCM3.2 is now running and validated on all three platforms listed above. In addition, our performance measurements using PCM on the 64-PE Origin2000 at NCAR demonstrate that this model runs at approximately five hours per simulated year of climate. The current resolutions of PCM are CCM3.2 at T42L18, POP at 2/3 degree and 32 levels, and a 10-km ice model.
In terms of scalability, we have demonstrated that dPCM (PCM with atmospheric forcing rather than CCM3) scales well to 256 processors on the NERSC T3E. Currently, CCM3.2 T42L18 is limited to 64 PEs because of its latitudinal decomposition.
We are collaborating closely with Warren Washington and CGD staff to ensure that code improvements and efficiencies meet their science objectives.
CSS staff co-sponsored the Workshop on Climate, Ocean, and Weather Modeling Benchmarks at the NCAR Mesa Lab, September 22-23, 1998.
During FY1998, CSS staff gave 22 scientific and technical presentations.
During the first half of FY1998, CSS processed all outside requests for computing at NCAR. This included both the general Community machines and the Climate Simulation Laboratory systems. In that time frame, CSS approved 110 requests for 59,386 General Accounting Units (GAUs). In addition, we processed requests requiring panel action for use of the Climate Simulation Laboratory resources totalling 413,996 equivalent Cray C90 hours. The panel allocated 278,704 hours.
A link was established to an existing website for users to access new user documentation online. This eliminated printing costs estimated to be $11.00 per packet plus the handling of the numerous hardcopy documents by two SCD staff members that had been required to send hardcopy materials to new users.
The first electronic CSL Panel Book was made available for the April meeting of the CSL Advisory Panel.
It was determined that only electronic requests for computing time should be accepted. This procedure was approved by the SCD Advisory Panel and implemented.
The review process was changed. All hardcopy forms and letters were eliminated. Reviews along with a copy of the computing request were sent to potential reviewers. Turnaround time for reviewing requests decreased substantially.
The notification process for approving requests for computing time was reformed and is now handled electronically. In addition, the procedure for notifying users that the NSF grants were expiring and/or GAU usage had exceeded the approved amount were modified and prepared automatically via reports generated from the Oracle database.
These efforts were leading to the final stages of the planned goal of automating the entire allocations process. The task of handling computing allocations as well as the automating and streamlining project was transferred from the Computational Science Section to the Operations and Information Support Section of SCD effective June 1, 1998.
|
|
|
|
|
|