
![]()
[Previous]
[Table of contents]
[Next]
The SCD Computational Science Section (CSS) provides state-of-the-art expertise in computing that is beneficial to the atmospheric and related science communities. Our efforts are focused in the following areas:
- Research
- Technology tracking
- Software libraries
- Collaboration
- Education, outreach, and knowledge transfer
The results of our work are shared through direct collaborations, publications, software development, and seminars.
CSS conducts research in areas such as numerical analysis, computational fluid dynamics, parallel communications algorithms for massively parallel architectures, and numerical solutions to partial differential equations. A new area that we are starting to investigate is software frameworks and object oriented techniques for climate modeling.
The section engages in collaborative research and development projects with groups within NCAR and from other institutions and agencies. These projects benefit the broader NCAR community. Currently, CSS is involved in projects in conjunction with the Climate and Global Dynamics (CGD) Division of NCAR and with support from the Department of Energy's (DOE) Climate Change and Prediction Program (CCPP), to develop a state-of-the-art coupled climate model framework that will run efficiently and effectively on distributed-memory parallel computers. A new project funded by the Next Generation Internet Program in the DOE Office of Science is aimed at developing a prototype "Earth System Grid." These projects are described below.
CSS is involved in technology tracking (performance monitoring and benchmarking studies) of both hardware and software. This work evaluates and ensures the efficient use of future computing resources and is critical in selecting the most appropriate computers for the future production computing needs of NCAR and the university community. On the software side, staff play an active role in the Fortran standards effort as well as evaluating programming languages, programming environments, and paradigms.
In addition, we develop software packages to make use of our research results and our numerical and computational expertise. These libraries of scientific software are used by researchers in the atmospheric and related sciences community.
Finally, CSS staff are active in the area of education, outreach, and knowledge transfer. We organize workshops, provide guest lectures at universities, host post docs and summer visitors, and give seminars and talks at conferences.
An efficient spectral transform method for solving the shallow water equations on the sphere
William F. Spotz and Paul N. Swarztrauber are developing a new, faster, more memory-efficient spectral model for the shallow water equations on the sphere. The model is based on a true double Fourier expansion of variables, meaning the transformation between spectral and physical space can be accomplished with fast Fourier transforms rather than the slower associated Legendre transforms. It is well-known that this model, by itself, is unstable due to non-isotropic representation of waves near the poles. We solve this problem by projecting the prognostic variables onto the space of spherical harmonics at the end of every time step. For the shallow water equations, this reduces the number of associated Legendre transforms per time step from nine to six, and concentrates the transforms into a projection operator that can be further optimized. Work has concentrated on improving the efficiency of this projection (sometimes called a spherical filter in the literature) and implementing it in a double Fourier model.The projection has been improved in two significant ways. First, the memory requirement has been improved by a factor of N (where N is the number of latitude points) by introducing and using a complementary space of basis functions orthogonal to the traditional associated Legendre functions. Second, the operation count for the projection operator has been cut roughly in half by using an orthogonal complement representation of the projection matrix, which leads to a faster algorithm for half the zonal wave numbers. The projection is still an O(N3) algorithm, but is twice as fast as doing a traditional forward transform followed by a backward transform.
The double Fourier model that makes use of the projection is remarkably like the traditional spectral transform method. It solves the vorticity-divergence form of the shallow water equations by advancing spectral coefficients forward in time. All of its communication on parallel computers is concentrated in nine transposes per time step of quantities between spectral coefficients decomposed by zonal wave number and physical grid quantities decomposed by latitude. Differences between the double Fourier and traditional spectral model include an equally spaced latitude grid with points at the poles; fast Fourier transforms in the meridional direction; simple tridiagonal solvers required for both time-stepping and the solution of elliptic equations required for velocity calculations and semi-implicit time-stepping; and the projection at the end of the time-step. We anticipate a significant savings in the memory used and time spent in Legendre transforms when all the components of the model have been completed, while maintaining the accuracy, stability, and parallel efficiency of the traditional spectral transform model.
Reference:
Paul N. Swarztrauber and William F. Spotz, "Generalized discrete spherical harmonic transforms," accepted by J. Comp. Phys.
Transposing arrays on multicomputers using de Bruijn sequences
Transposing an NxN array that is distributed row- or column-wise across P=N processors is a fundamental communication task that requires time-consuming interprocessor communication. It is the underlying communication task for matrix-vector multiplication and the fast Fourier transform (FFT) of long sequences and multi-dimensional arrays. It is also the key communication task for certain weather and climate models.A parallel transposition algorithm is presented for multiprocessors with programmable network channels. The optimal scheduling of these channels is non-trivial and has been the subject of a number of papers. Here, scheduling is determined by a single de Bruijn sequence of N bits. The elements are first ordered in each processor and, in sequence, either transmitted or not transmitted, depending on the corresponding bit in the de Bruijn sequence. The de Bruijn sequence is the same for all processors and all communication cycles. The algorithm is optimal both in the overall time and the time that any individual element is in the network (span).
The results are extended to other communication tasks including shuffles, bit reversal, index reversal, and general index-digit permutation. The case P not equal to N and rectangular arrays with non-power-of-two dimensions are also discussed. The results are extended to mesh connected multiprocessors by embedding the hypercube in the mesh. The optimal implementation of these algorithms requires certain architectural features that are not currently available in the marketplace.
Work with MMM on time-stepping schemes and solvers
In collaboration with the numerical modeling group (Joe Klemp, Bill Skamarock, Jimy Dudhia) in the MMM division, we are developing fully 3D semi-implicit schemes and associated elliptic solvers for the dynamical core of the new Weather Research and Forecast (WRF) mesoscale model.The WRF model prototype initially employs a split-explicit time-stepping scheme but will be expanded to support several different options including a semi-implicit, semi-Lagrangian scheme for use at NCEP (Jim Purser). The WRF model also supports two different vertical coordinates. The standard terrain-following coordinate employed in mesoscale and cloud scale models is height based (as opposed to pressure levels) and is due to Gal-Chen and Somerville (1975). The Gal-Chen coordinate is also used in the Clark-Hall and Smolarkiewicz models in MMM. A hydrostatic pressure or mass coordinate due to Rene Laprise (1992) is also implemented, and we are developing a novel semi-implicit scheme with Joe Klemp and Bill Skamarock. The Laprise coordinate is now used in the Canadian (GEM) and French (Aladin) models and is being considered by other modeling groups around the world such as HIRLAM, DWD, and Australia.
The Gal-Chen transformation leads directly to a non-orthogonal 3D coordinate system, and therefore a semi-implicit time discretization results in an elliptic problem containing cross-derivatives (Skamarock et al 1997, Thomas et al 1998, 1999). The resulting linear system is non-symmetric, thus requiring a generalized conjugate residual (GCR) or generalized minimal residual (GMRES) Krylov-type iterative solver. To accelerate solver convergence, a preconditioner based on a linearization of the model governing equations is being designed.
Currently, the model is linearized about a base state that varies with height, and so the resulting elliptic problem contains variable coefficients. We are designing a preconditioner based on an isothermal atmosphere (constant temperature) in hydrostatic equilibrium. This approach leads to a constant coefficient problem which is also separable and therefore amenable to solution by the O(N^2logN) fast-transform techniques developed by Paul Swarztrauber or O(N^2) multigrid solvers. Despite the existence of efficient preconditioners, parallel scalability of the solver and data transposition for the FFT could be limiting factors for WRF as well as other NCAR models so we are also investigating "scalable" iterative elliptic solvers that attempt to maintain constant a CPU time per degree of freedom as both the problem size and number of processors are increased.
References:
W. C. Skamarock, P. K. Smolarkiewicz and J. B. Klemp. Preconditioned conjugate-residual solvers for Helmholtz equations in nonhydrostatic models. Mon. Wea. Rev., vol. 125, 1997, pp. 587-599.
S. J. Thomas, C. Girard, R. Benoit, M. Desgagne, and P. Pellerin. A new adiabatic kernel for the MC2 model. Atmosphere-Ocean, vol. 36, no. 3, July 1998, pp. 241-270.
S. J. Thomas, C. Girard, G. Doms and U. Schaettler. Semi-implicit scheme for the DWD Lokal-Modell. Meteorology and Atmospheric Physics, Springer-Verlag, to appear, 1999.
R. Laprise. The Euler equations of motion with hydrostatic pressure as an independent variable. Mon. Wea. Rev., vol. 120, 1992, pp. 197-207.
T. Gal-Chen and R. J. C. Somerville. On the use of a coordinate transformation for the solution of the Navier-Stokes equations. J. Comp. Phys., vol. 17, 1975, pp 209-228.
Optimum parallel algorithms and architectures for the harmonic transform: The vector multiprocessor
The first massively parallel processors (MPP) became available in the late 1980s, with considerable expectations brought about by their impressive peak performance. However, it soon became evident that the sustained or actual performance of these machines could be significantly less than peak because of the time required for interprocessor and related communication. In an effort to improve performance, parallel communication algorithms were developed, which in theory, demonstrated that actual performance could be substantially improved, to the extent that it met or even exceeded that of traditional multiprocessors. However, this was not possible at that time because the MPP architectures did not have the features necessary to provide optimal implementation of these algorithms. That is, wallclock time could not match theoretical algorithmic performance without certain architectural features that were not available in the marketplace.These features have been incorporated into a recently patented parallel computer called the Vector Multiprocessor (VMP). It is the product of an effort to determine the optimum algorithmic and architectural environment for weather/climate models. It is a novel, general-purpose, high-performance, scalable multiprocessor that evolved from the answers to such questions as: How do we minimize communication and maximize performance? Will communication ultimately dominate multiprocessor performance? Is there a "best" multiprocessor architecture?
In addition to the usual complement of logic and arithmetic units, each processor contains a programmable communication unit. Interprocessor communication tasks are performed to and from local vector registers in the same way that computational tasks are performed on a vector uniprocessor. The VMP brings to the multiprocessor system what vectorization brought to the single processor. Here we determine optimum multiprocessor performance for the key computational kernels used in spectral models of climate and global dynamics, and in doing so, define the VMP. This paper has been submitted to the International Journal of High Performance Computing.
Linux PC clusters
Recently, significant price-performance advantages have been demonstrated on applications running on systems constructed entirely of commodity hardware components. These systems, called Beowulf clusters, have recently demonstrated excellent sustained price-performance-level MFlops for an increasingly large set of applications.At the same time, maturing numerical methods, such as spectral element methods, have made it possible for certain classes of problems in earth systems modeling to obtain greatly improved scaling and performance on parallel RISC systems.
In response to these advances, John Dennis and Dr. Rich Loft in CSS have ongoing development with Beowulf systems. This development has included the procurement of two 16-processor systems. The initial system consisted of eight 2-processor PC nodes. The processor used is the 300-MHz Pentium II with 192 MB of memory. The interconnect is based on 100-Mbps ethernet. A follow-on system purchased from Alta Technology consisted of eight 2-processor 450-MHz Pentium II nodes. Each node contains 512 MB of memory, and is connected with a 1-Gbps myrinet interconnect.
Preliminary results on the second 8-processor Beowulf system include 718 MFlops for CCM3.2 and represents a price-performance of $76/Mflop. In collaboration with Lorenzo Polvani of Columbia University and CGD scientist R. Saravanan, a scalable high-resolution spectral shallow water model (BOB) has been developed. This model running at T341 has achieved 1,361 MFlops on 16 processors and represents a price-performance of $40/MFlop. A comparison of BOB's performance between the 450-Mhz system using both the myrinet and 100Mbps fast ethernet and the IBM SP system with WinterHawk 1 nodes can be seen in the figure below. It shows that the myrinet-equipped Beowulf cluster runs at 64% the performance of the IBM SP system on 16 processors. These results are very encouraging, especially considering the cost differential between the two systems.
![]()
In addition to benchmarking these systems, there has been considerable effort developing expertise with the software components, system administration practices, and tools required to deploy Beowulf systems. In addition to scalable system administration solutions, high-capacity disk storage and a queue system is critical. To address the disk storage issue, a collection of old disks from the HP supercomputer were recycled and converted to provide several large-capacity filesystems. A freely available queue system, PBS, was installed and configured to allow reliable and easy control of system resources.
The Computational Science Section (CSS) has responsibility for tracking the developments in high-performance computing technology for SCD and NCAR. CSS staff assess the capabilities of the latest systems (hardware and software) available from vendors and evaluate the performance of these systems on problems typically found in the atmospheric sciences community. These systems, from workstations to supercomputers, are evaluated primarily through hands-on work and benchmarks.The NCAR benchmark suite was initially developed for use in the 1995 ACE open procurement. These benchmarks are characteristic of the applications run at NCAR and represent the NCAR computing workload.
Description of benchmarks
The NCAR benchmark codes are categorized as kernels and applications. The complete set is available through anonymous FTP, and descriptions of the benchmarks are given in README files for each code. The codes are published in the CSS FTP site ftp.ucar.edu/css/The kernels are a set of relatively simple codes that measure important aspects of the system such as CPU speed, memory bandwidth, and efficiency of intrinsic functions. In addition, there is a shallow water benchmark. The shallow water equations give a primitive but useful representation of the dynamics of the atmosphere. The shallow water model has been used over the past decade to evaluate the high-end performance of computer systems. Finite difference approximations are used to solve the set of two-dimensional partial differential equations. The model has a single-processor version (with Cray multitasking directives), a parallel PVM version, and a Fortran 90 version. The benchmark varies over a variety of grid size decompositions and gives feedback on problem decomposition for good cache utilization. The kernel codes are as follows:
- ELEFUNT: elementary function test
- PARANOIA: arithmetic operation test
- IA: indirect addressing speed
- XPOSE: array transpose
- COPY: memory-to-memory test
- SHALLOW: shallow water code with varying grid decomposition
- RADABS: single-processor performance
In addition, the NCAR benchmark suite includes codes that serve as more extensive tests and some full applications. Full applications include the Los Alamos-developed ocean model called POP, the NCAR/Penn State University mesoscale code MM5, and the NCAR Community Climate Model Version 3.6 (CCM3.6), which is a three-dimensional global climate model. This version of the CCM was released in 1998.
System benchmarking
U.S. high performance computer manufacturers are predominantly focusing on building their products from commodity processors and offering systems that are clusters of shared memory nodes. Each node has multiple processors and supports cache-coherent uniform memory access (ccUMA) or cache coherent nonuniform memory access (ccNUMA). In August 1999, SCD took delivery of an IBM SP Winterhawk. The Winterhawk nodes each have two IBM Power3 processors. There are 128 compute nodes on this system as well as separate nodes dedicated to I/O and network support. Below we show how this system performs relative to some elements of the benchmark suite.
IBM System at NCAR
Vendor System Installed Clock
(MHz)Processor
Peak
(MFLOPS)IBM SP Winterhawk 8/99 200 800
![]()
![]()
MUDPACK
The entire package was rewritten during the last two years, and wide distribution both nationally and internationally continues. The new Fortran 90 version 4.0 is more efficient and amenable to parallelization than earlier versions. Preliminary tests to create a portable parallel version from 4.0 are under way. A software website has been established to automate dissemination of software, descriptive material, and source code for users who sign a UCAR licensing agreement. Earlier users of MUDPACK are being notified of the availability of the new version of MUDPACK on the web.SPHEREPACK
An article "SPHEREPACK 3.0: A Model Development Facility" has just been accepted for publication in Monthly Weather Review. During the last year, version 3.0 of SPHEREPACK was created from version 2.0 to correct problems with mixed types in argument passage. A SPHEREPACK website has been established for disseminating software information and source code. Fifty-one people from U.S. and foreign instutions have downloaded versions 2.0 and 3.0 of SPHEREPACK. The software has been incorporated into the CCM processor for manipulating and displaying scalar and vector data on the sphere in conjunction with NCAR Graphics.REGRIDPACK
The original TLCPACK was converted to a Fortran-90-compatible version now called REGRIDPACK. This package "regrids" or transfers data between higher dimensional grids using vectorized linear or cubic interpolation. For portability, all codes have been tested on different platforms with both Fortran 77 and Fortran 90 compilers.A REGRIDPACK website describing the package and providing access to the software has been established.
CHAMMP parallel/distributed coupled modeling
CSS staff works on the collaborative Parallel Climate Model (PCM) project with Warren Washington et al in CGD at NCAR as well as with scientists at Los Alamos National Lab and the Naval Postgraduate School. This work is sponsored in part by the U.S. Department of Energy Climate Change and Prediction Program. The model consists of three components -- CCM3.2, POP, and CICE -- that can be run independently in standalone mode for verification, or together with the flux coupler (FCD) in coupled mode. The atmosphere model is run at a nominal 2.8-degree horizontal resolution. The ocean model is run at a nominal 2/3-degree horizontal resolution. The ice model is split into two patches over the north and south poles and has a resolution of 1/4 degree. The source code for each component resides in a shared CVS repository so that the CGD PCM group and the CSS group can develop the model concurrently using the same code base.During 1998 and 1999, the directed effort within SCD has continued the collaboration with Warren Washington's coupled climate modeling research group. As the PCM has moved from development and testing into production, our focus has shifted from porting to the realization of longer-range goals of PCM portability, scalability, and performance.
The current implementation of CCM3.2 in PCM is a one-dimensional decomposition by latitude. At the operational resolution of T42L18, CCM is limited to a maximum of 64 processors. This in turn means that 64 processors is the maximum number that PCM can use. To eliminate this limitation, we undertook the task of developing a two-dimensional decomposition version of CCM3.2 capable of scaling beyond 64 processors.
It seemed logical to take as a starting point the two-dimensional decomposition of CCM2 developed at ORNL and Argonne under the CHAMMP project. This model, called PCCM2 is different from CCM3.2 in three significant respects. First, CCM3.2 is a true couple-able model. CCM2/PCCM2 was designed to work with a statistical ocean, simulated by climatological sea surface temperatures. In contrast, CCM3.2 has split the physics/dynamics calculations in two large pieces, between which the coupling to the other climate system components occurs via a flux coupler. The resulting changes to the organization of the physics were dramatic. Second, the physics content in CCM3.2 has been dramatically modified from CCM2.
Finally, a Land Surface Model (LSM) was added in CCM3.2. It took about six months to reorganize PCCM2 dynamics and integrate this with CCM3.2 physics and the LSM to form PCCM3.2. PCCM3.2 has been tuned and benchmarked on a variety of platforms at this point. For example, on the Cray T3E we get 8 GFlops on 512 processors, and on the SGI-CrayOrigin2000 we get 6 GFlops on 128 processors. We have also measured performance of this code on Compaq EV-6, IBM, and SGI equipment. PCCM3.2 model output has been partially validated for time scales on the order of months. Longer runs are needed to fully validate the model, but this requires a full restart capability that remains to be implemented.
The portability of PCM has been further enhanced during the last year. The system now builds and executes on SGI, T3E, Sun, HP, Compaq, and IBM equipment. We have verified that PCM compiles using the Portland Group compiler in a Linux environment, but we have not executed PCM on a Beowulf due to memory limitations. Finally, the ease of use of PCM has been simplified and enhanced by restructuring the build environment.
Other work:
A performance model was developed based on message passing profile data collected for the PCCM3.2 model. This performance model was used to project performance for PCCM3.2 on hypothetical tera-scale computers to assess the importance of latency, bandwidth, and node performance on system performance. These results were used to present a talk at the 1998 Parallel Meteorological conference at ECMWF. The talk was entitled "Computational Considerations for Tera Scale Climate Modeling."
Another area where we contributed was our work with a group including Bill Dannevik (LLNL), Bob Malone (LANL), Dave Bader (PNL), Ian Foster (ANL), and John Drake (ORNL) to develop the ACPI requirements and implementation document. This was delivered to Tom Dunning in March and is a building block for DOE's SSI program.
The final area of work is supporting the exploratory high-resolution climate simulations. These computer simulations have been and will continue to be completed at LANL on the open ASCI blue machines. We set up the standard release of NCAR's community atmospheric model CCM3.6.6 to run at resolution T170L18 which involved interpolating lower resolution datasets, spinning up the atmosphere, and then tuning the code to reduce run time. We ran for about two months on a near-dedicated 128-processor Origin2000.
Future Work:
- Work with the NCAR CCM Core group to help develop an efficient and portable message-passing version of the atmosphere component. We will explore the possibility of using a hybrid shared-memory paradigm incorporating both OpenMP and MPI directives.
- Investigate observed scaling problems with POP in the barotropic component. If necessary, improve scalability and load balance of the barotropic solver in the POP ocean model.
- Collaborate with LANL staff to develop hybrid shared-memory/message-passing paradigm for POP using both OpenMP and MPI.
- Continue high-resolution atmospheric model simulations with CCM3 to calibrate parameters and provide model output to Jim Hack, Akira Kasahara, and Dave Williamson at NCAR to study tropical cyclone formation.
Publications:
Drake, John B., Steven W. Hammond, Rodney James, and Patrick H. Worley, "Performance Tuning and Evaluation of a Parallel Community Climate Model", to appear in the proceedings of SC99, Portland, Oregon, November 1999.
Presentations:
Steve Hammond, Rodney James, and Rich Loft, "Computational Considerations for Tera Scale Climate Modeling." Presented at the European Center for Medium Range Weather Forecasting workshop on the use of Parallel Processors in Meteorology -- Toward TeraComputing, Reading, UK, November 16-20, 1998.
Steve Hammond, Rodney James, and Rich Loft, "Computational Considerations for Tera Scale Climate Modeling" Presented at the 6th Joint Japan/U.S. International Workshop on Next Generation Climate Models in the Advanced Computing Facilities, East West Center of the University of Hawaii, March 1-3.
Other collaborations
CSS maintains active research and development collaborations with NCAR staff and members of the academic community. Some of our collaborators include:
Joe Tribbia (NCAR/CGD) John Clyne (NCAR/SCD) Lorenzo Polvani (Columbia University) Martin Herbordt (University of Houston)
Code migration
With the arrival of the new IBM system, it was a high priority for CSS to work with major code developers at NCAR to convert their models from the traditional Cray and SGI systems that have been the primary compute platform in SCD for the past decade, then get those models up and running on the new system. Here we briefly summarize our progress to date with that activity.We worked with Bill Hall for the Clark/Hall anelastic cloud model. Early in 1999, we converted from CMIC to OpenMP to run on Origin and Compaq clusters. In addition, we successfully ported this model to the IBM SP (single processor) and are almost done with an OpenMP implementation so that it uses all the processors on a single node. We are investigating the development an MPI/OpenMP hybrid version of this application.
We ported the DOE-supported coupled climate model PCM1.3 to the IBM. This work was completed and validated in July. Since then the code has been running in production with the sustained performance of 49 seconds per model day (64 pes).
We collaborated with Jim Rosinski and Mariana Vertenstein of the climate modeling section in CGD to migrate CCM3.8 to the IBM. This was completed and validated in September. Currently this model has a sustained performance at resolution (T42L30) of:
4 PEs -- 385 sec/day
8 PEs -- 215sec/day
16 PEs -- 122sec/day
32 PEs -- 86sec/day
64 PEs -- 63sec/day
We ported the distributed-memory version of CCM3.6.6 to the IBM. This was completed, tuned, and validated in September. The sustained performance (at T42L18) for this code on 64 PEs is about 32 seconds per model day.
In addition, we ported the magnetohydrodynamics code (3D MHD) of Mark Rast in HAO to the IBM. 3DMHD is an all-Fortran code of approximately 5000 lines. The current version is a direct translation from CM-Fortran and features many 3D loop nests that are manifestly parallel, but involve little-to-no cache re-usability. The initial version uses T3E SHMEM library calls (broadcast, min-to-all, max-to-all, sum-to-all, barrier, and put).
Finally, we have ongoing collaborations with staff in CGD to migrate the NCAR coupled model CSM to the IBM, and we are in contact with other CSL project PIs to assist them and provide advice on moving their codes to the new IBM system.
As part of this work, here are some general lessons learned:
- MASS intrinsics library can enhance performance of codes such as CCM by as much as 20%.
- Switch is too slow to use O(100) or more processors on current climate resolutions; for example, 2/3 degree POP scales as follows:
16 PEs -- 25.5 sec/day
32 PEs -- 18.0 sec/day
64 PEs -- 9.9 sec/day
128 PEs -- 10.4 sec/day
256 PEs -- 12.5 sec/day
- Hybrid (OpenMP intra-node, MPI inter-node) parallelism should be most efficient if well implemented, primarily because it eliminates contention for the switch adapter among multiple MPI processes ... this will be even more important when there are 4, 8, or 16 processors on a node.
Steve Hammond was a writing mentor for SOARS protege Michelle Dunn.
CSS staff co-sponsored the Workshop on Climate, Ocean, and Weather Modeling Benchmarks at the NCAR Mesa Lab, June 14-15, 1999. The goal of this effort is to develop a suite of numerical benchmarks for the Climate/Ocean/Weather modeling community. These benchmarks can then be used as standard metrics for the community to compare the performance of various computers.
During FY1999, CSS staff gave 23 scientific and technical presentations.
Technology transfer
CSS staff continue to interact with SRC Computers of Colorado Springs, Colorado and Compaq Computer Corporation. These are two relative newcomers to the U.S. HPC market. CSS staff have visited both companies and provided benchmark codes to enable these startups to evaluate their progress relative to our applications.