1998 ASR Home
Back
SCD ASR Index
Next
SCD Home

Introduction to SCD's FY1998 Annual Scientific Report

The NCAR Scientific Computing Division (SCD) provides computing resources and expertise to university scientists who are grantees of the Geoscience Directorate of NSF, to NCAR scientists, and to special projects -- national and international -- that are relevant to atmospheric and related science. SCD works closely with its Advisory Panel, the NCAR Director's Committee, and NSF to establish priorities and plans. Our schematic of the NCAR computing facility illustrates the various computing resources that are currently available. The figure below shows the configuration at the end of FY1998.

Community resources

SCD provides high performance computing facilities to support the NCAR scientific program and the supercomputing needs of the Community. The facilities include a substantial infrastructure of compute servers, networking, visualization, data communications, and user services to support the development and execution of large, long-running models and the manipulation, analysis, and archiving of extremely large datasets.

In FY1998, the equipment delineated as "Community Computing" in the functional diagram above was used by approximately 550 university scientists and approximately 500 NCAR researchers in support of the following areas of research:

Science on Community Computers

The total computing capacity available to the community was doubled in FY1998.

NCAR Climate Simulation Laboratory

NCAR, in collaboration with the NSF Atmospheric Sciences, Ocean Sciences, and Mathematical and Physical Sciences Divisions, operates a national, special-use, dedicated climate system modeling computing facility, the Climate Simulation Laboratory (CSL). The CSL is a multi-agency, United States Global Change Research Program (USGCRP) facility that provides high performance computing, data storage, and data analysis systems to support large, long-running simulations of the Earth's climate system.

The total computing capacity of the CSL was doubled in FY1998.

Mass Storage System (MSS)

A major SCD objective is to maintain an MSS that is commensurate with the compute power available to users and is capable of handling the increasing file sizes and data transfer rates required by users' applications. Significant upgrades to the MSS have been made over the past couple of years in anticipation of increased compute power and growing user needs. Specifically, new storage media capability and additional robotic tape handling capability have been integrated into the MSS. The amount of data SCD can archive has increased by a minimum of a factor of four. In addition, data rates have been significantly increased.

Major accomplishments during FY1998

Computing capacity was doubled

During FY1998, these changes in the SCD production supercomputer environment doubled the overall computing capacity:

Supercomputing system advances

Significant enhancements to the NCAR user environment in FY1998 included:

Progress in transitioning to and using highly parallel technology

In December 1997, after a lengthy evaluation of domestic supercomputer technology by SCD and select representatives from other NCAR divisions, it was decided to invest the FY1998 Climate Simulation Laboratory (CSL) computational budget in a 128-processor Silicon Graphics Cray Origin2000. The system was delivered to SCD on May 18. At the end of FY1998, the Silicon Graphics Cray Origin2000 (ute) was being used extensively. System utilization has climbed from approximately 50% in July to exceed 80% by the end of the fiscal year. This high level of utilization is exceptional and unexpected for a DSM architecture. (Other centers typically observe, and have expressed concerns that, DSM architectures cannot be expected to exceed about 2/3 utilization.)

DataPark

In FY1998, SCD added 130 GB of local disk storage to the winterpark system. This additional local disk enabled special projects to access dedicated blocks of high-speed storage for long periods of time. The first two such projects were SCD's Data Support Section's Reanalysis Project and the joint Climate and Global Dynamics Division-SCD CSM Post-Processor Project.

Year 2000 compliance and testing

A preliminary UCAR Year 2000 Plan was drafted during the first quarter of FY1998 and submitted to the NSF for review. The UCAR plan has five phases corresponding to those identified by the GAO. This plan attempts to ensure that all UCAR-owned and/or managed systems are "Year 2000 compliant" well before 1 January 2000.

Visualization Lab

The confluence of harnessed commodity computational power, capacious storage, display technologies, and high-bandwidth networks constitutes the critical mass for the next generation of computing: an era of visual computing. During FY1998, the Visualization Lab was upgraded in several important areas. Most importantly, the 4-year-old flagship visual supercomputer was replaced by an 8-processor R10000 Silicon Graphics Onyx-2 Infinite Reality Engine with at least 2 GB of physical memory. This system and high performance networking has enabled forays into new and more difficult scientific and technical domains.

Networking infrastructure enhancements

Jeffco Network Infrastructure Completion (JEFNIC) project
Ethernet packet-switch re-engineering project
Router backbone re-engineering project

Network and computer security

Based upon recommendations of the UCAR Computer Security Advisory Committee, SCD led a UCAR-wide project that implemented significant new gateway route filters to greatly improve network security for UCAR. Preparations and installation of these filters was highly opportune because hacker probes of UCAR computer defenses had been swelling in prior weeks. These probes caused some significant problems and consumed a significant amount of staff time throughout UCAR. Most of the problems ceased after the installation of the security filters.

End-of-life systems

In compliance with a recommendation from the most recent five-year review by NSF, SCD established in FY1997 the End-of-Life Systems (EOLS) project to develop definitions and procedures for reviewing all existing supported hardware and software systems in SCD to determine whether support of each system is still justified. SCD's ability to adopt and support new technologies often depends not only on its resources, but on its ability to retire older systems and services. Charles Dickens of Stanford University appropriately stated: "The ability to advance the leading edge of technology is constrained by the ability to prune the trailing edge."

A wide range of hardware and software was retired in FY1998.

Trouble Ticket System

SCD is developing a Trouble Ticket system that will provide a homogeneous method for different SCD sections to track and provide solutions for user problems and requests. The system will provide a knowledge base and a collective set of solutions to known problems, thereby increasing the efficiency of the division and its users.

By the end of FY1998, we had:

  1. Completed and demonstrated the system interface to SCD staff
  2. Started friendly user testing for SCD staff
  3. Demonstrated a prototype web interface

Requirements and options for future computing upgrades

U.S. atmospheric science modelers currently enjoy global leadership in several areas of research that depend on high performance computers. To maintain that leadership, they need computing capabilities that are comparable to their international peers. For example, a 1-km regional forecast using 4DVAR with full physics adjoint is feasible, but to use such in time-critical (less than one hour) forecasting will probably require a machine that can sustain at least 50 GFLOPS. Another example is a recently developed NCAR global chemistry model (MOZART). To complete 100-year simulations of the climate within a reasonable timeframe, this model needs a computer that can sustain 20 to 40 GFLOPS.

The situation is particularly acute in climate modeling and is exemplified by the computational requirements of the NCAR Coupled System Model (CSM). Currently, the primary computer for executing the CSM is the Cray C90. It takes approximately 16 days to simulate 100 years of climate, with the model running 24 hours a day at five GFLOPS. Scientists routinely need to simulate several climate scenarios and perform at least four sensitivity studies for each scenario. Thus, a single 100-year study may involve 20 or more 100-year simulations and each 100-year simulation produces hundreds of gigabytes of data that must be archived and analyzed. Within two years, the computational requirements of the CSM will quadruple due to a modest increase in resolution and enhancements in the model's semi-Lagrangian dynamics and prediction of cloud water, and the addition of a sulfate aerosol model. At that time, at least one machine with approximately 40-GFLOPS capability will be required to execute a 100-year CSM simulation in a reasonable amount of time (approximately one calendar week). Within five years, a machine capable of sustaining a trillion floating-point operations per second (TeraFLOPS) or more will be needed to provide similar turnaround time due to significant increases in resolution requirements and the inclusion of a global atmospheric chemistry model such as MOZART, which is being developed by the Atmospheric Chemistry Division of NCAR.

A brief history of supercomputer architecture -- An exercise in speed versus economics

The desire of scientists to expand the set of solvable problems produces a constant need for more powerful supercomputers. This led to the development of vector processing capability in the mid-70s. Then in the 80s, several vector processors were incorporated into a single system to create Parallel Vector Processor (PVP) supercomputers. Processors in a PVP typically share a common memory that each processor can access uniformly in time. Thus, PVPs are members of a larger class of architecture known as Symmetric Multi-Processor (SMP) systems diagrammed in Figure 1 below. PVPs continue to be among the most powerful supercomputers available. For example, 1999-vintage vector processors will sustain up to 2-3 billion arithmetic operations per second (GFLOPS), and parallel execution of them in a 1999-vintage PVP may sustain 50 to 250 GFLOPS.

By the mid-80s, microprocessors offered attractive performance per unit-of-cost. This made possible Massively Parallel Processor (MPP) systems containing hundreds, even thousands, of microprocessors. Typically, the processors in a MPP do not share a common memory. Rather, each has its own memory, and an inter-processor communication system (typically, some form of message passing) enables the moving of data between processors.

By the mid-90s, several large computer manufacturers had developed parallel systems that incorporate tens of low-cost microprocessors wherein each microprocessor does not have uniform access to the shared memory. Further, each microprocessor has one or more levels of cache. In order to manage these processors with a single operating system, coherence of data in the caches must be maintained; i.e. if a datum is changed in the cache of one processor, then it must be changed in all other caches that it resides in. Such systems are called cache-coherent Non-Uniform Memory Access (ccNUMA) systems (see Figure 2). These are widely used as UNIX servers -- a multi-billion dollar market. ccNUMA systems are a subset of Distributed Shared Memory (DSM) systems wherein data consistency is maintained by either cache-coherence in the hardware or by software implemented on top of message passing, thus preserving the programming ease of and portability with SMPs.

Today, the dominant trend in high speed computing architecture is to cluster dozens of shared memory systems of various types. For example, PVPs such as the Silicon Graphics SV1 and NEC SX can be clustered; also, DSM systems such as the HP SPP and Silicon Graphics Cray Origin2000 can be clustered. Each shared memory system in a cluster is called a "node" and may have from tens to hundreds of processors (Figure 3). Well-established multitasking techniques are used for programming within a node, and message passing is used for communication among nodes. Clusters make it theoretically possible to apply thousands of vector processors and/or tens of thousands of microprocessors in parallel to a single application. Such systems have the potential to sustain trillions of arithmetic operations per second (TeraFLOPS).

The options

During FY1997-1998, NCAR staff conducted a review of the plans of U.S. computer manufacturers. From that review it became evident that to achieve the high levels of performance needed, (40 GFLOPS or more), two or more parallel systems must be clustered. The review also revealed that several U.S. manufacturers will offer clusters of DSMs by FY1999-2000, and one will offer a cluster of PVPs.

In FY1998, the DSM that performed best on the NCAR CCM provided a substantially better performance per unit-of-cost than U.S.-manufactured PVPs. So, in FY1998, NCAR purchased a 128-processor DSM (Silicon Graphics Cray Origin2000) for the CSL. The machine became operational in July 1998, and is being heavily used by the NCAR CSM and the DOE Parallel Coupled Model (PCM). It provides 2-3 GFLOPS of sustained performance per million dollars of cost and its capacity exceeds that of the Cray C90.

The DOE ASCI project is developing technology for clustering Silicon Graphics Cray Origins, and the CSL will have the option to take advantage of that development if it proves successful. However, the next generation of U.S.-manufactured PVPs may provide a substantial increase in both performance and performance per unit-of-cost, and the manufacturer will support clusters of them. SCD is positioned to go either direction. Demonstrated performance, performance per unit-of-cost, and ease of use will be key factors in choosing between the two options.

Organization of this review

SCD's FY1998 activities are reviewed in the following sections of this Annual Scientific Report:

1998 ASR Home
Back
SCD ASR Index
Next
SCD Home