1998 ASR Home
Back
SCD ASR Index
Next
SCD Home

High availability of distributed systems

As technology changes, user expectations for that technology grow. What once was considered to be a "special case" is now generally commonplace. The users of NCAR's computer systems and networks demand more computing system reliability and stability than they have in the past. It is no longer acceptable for critical services such as electronic mail or Domain Name Service (DNS) to be interrupted, let alone unavailable for long periods of time. It is thus vital to protect such services with more robust system configurations that can sustain hardware and software failures without disrupting service to the users. A "high-availability" configuration of such systems ensures continued use of or access to a critical service despite system failures in either hardware or software.

The Scientific Computing Division (SCD) has established a Machine Dependencies Committee that has reviewed system inter-dependencies at the Mesa Lab and targeted possible single points of failure. The two most important services requiring continuous availability are DNS and user authentication, since all other systems in the NCAR environment depend on these services. To ensure continuous availability of DNS and authentication services, the Machine Dependencies Committee recommended a high-availability configuration that would contain these services (code named the "Phoenix Project"). The Phoenix Project was placed into operation during FY1997 and was expanded to the Foothills Lab in the spring of 1998. DNS and user authentication services are among the set of operation-critical services provided by the "phoenix" systems.

The new UCAR gateway security server (see the Computer and network security report for more information), which monitors network activity for security violations and filters incoming Internet user access, was set up in a high-availability configuration in summer 1998. An additional "hot stand-by" server is planned for early FY1999 to be used in the gateway security server configuration.

The Distributed Systems Group (DSG) in the High Performance Systems section of SCD expanded the use of the high-availability configurations within the SCD computing environment in FY1998. It developed a high-availability configuration using a commercial software solution (QualixHA+) to set up continuous Network Information Service (NIS), mail spooling service, and printing service to SCD users in a machine configuration called crestone. The hardware comprising the crestone server will have its servers upgraded to Sun Ultra 2 in early FY1999, thus providing an even higher level of system stability and reliability.

Other measures have been used to increase availability of services within SCD, such as adding RAIDs on numerous servers, use of dual power supplies on servers, and mirroring of system disks.

Further, the Office Systems Group's (OSG's) Wintel server configuration has taken advantage of a high-availability configuration to ensure uninterrupted Wintel client service. The servers run Microsoft NT server software, which supports the use of high availability in a production environment. The Microsoft fail-over software, unlike UNIX high availability, has been massively deployed throughout the computer industry and has proven itself to be extremely reliable.

SCD will continue to evaluate high-availability products to determine their suitability, efficacy, and cost-effectiveness in the SCD and NCAR computing environments. SCD will deploy those determined to be suitable.

1998 ASR Home
Back
SCD ASR Index
Next
SCD Home