January 30, 2006

NCAR’s Mass Storage System gets bigger, faster

Transition from HiPPI to Fibre Channel technology increases MSS capacity and bandwidth

   NCAR's Mass Storage System
 
The MSS, located in the Computer Room at NCAR’s Mesa Laboratory, is one of the largest archives in the world dedicated to geoscience research. Some of the data it stores originate from field experiments and observations: international climate records from the past 100 years include data from weather stations, ships, planes, and satellites. The bulk of MSS data, however, is generated by global climate simulations and other Earth systems models that run on high-performance computers. As these computers become larger and faster, they generate an exponential amount of output data to be archived. Even greater demands for archiving data result from the growing use of coupled atmosphere, ocean, and sea ice climate models.
   

Back in the 1960s, NCAR supercomputers performed at the then-astonishing peak speed of 1.3 megaflops (1.3 million floating-point operations per second). Today, NCAR’s IBM Cluster 1600, a system named bluesky, has a peak speed of 8.3 teraflops—an increase of 6 orders of magnitude.

As NCAR supercomputers have evolved over the years, so has the repository for the data they use and generate: NCAR’s Mass Storage System (MSS).

In January 2006, total data holdings on the MSS exceeded 2.5 petabytes (the equivalent of 2.5 billion 500-page paperback novels), while the quantity of unique data files is increasing by about 35 terabytes each month. Each day, the MSS handles an average of 40,000 requests for data, transporting more than 3.8 terabytes of data to and from NCAR supercomputers.

Ensuring that this enormous amount of information can be stored and accessed speedily, safely, and reliably by geoscientists around the world is the job of NCAR’s Scientific Computing Division (SCD), which designed the MSS in the mid-1980s and has been extending its capabilities it ever since.

A significant upgrade

This month SCD made a significant upgrade to the MSS by completing the transition from High Performance Parallel Interface (HiPPI) technology to Fibre Channel technology.

“HiPPI is being retired after 13 years of faithful service,” says John Merrill, head of SCD’s Mass Storage Systems Group. “For several years we’ve been moving from HiPPI to Gigabit Ethernet and Fibre Channel as a means to access the MSS. Two weeks ago we decommissioned the last of the HiPPI devices and eliminated all remaining HiPPI data traffic. All MSS reads and writes are currently being carried through Fibre Channel.”

Data files can now be transferred three to four times faster than before, although users may not notice a sudden change because the transition has been gradual. In addition, the storage capacity of Fiber Channel-enabled media and tape drives is 3.3 times greater than the devices they replace.

These improvements will allow the MSS to expand yet further into the multi-petabyte range, while reducing the latency to access MSS files.

The HiPPI era

   SCD's John Merrill
 
John Merrill, head of SCD's Mass Storage Systems Group, bids adieu to the last HiPPI switch as it leaves the NCAR Computer Room. SCD decommissioned HiPPI technology at NCAR on January 12, 2006. Photo detail
   

HiPPI, a popular technology in the late 1980s to mid-1990s, was an early high-speed Local Area Network (LAN) protocol. Designed for connecting supercomputers and storage devices, it offered near-gigabit data transfer rates at a time when Ethernet was still rated at 10 megabits per second (Mbps) and leading-edge OC-3 technology was rated at 155 Mbps.

“Back in 1993, we needed a high-speed connection to the MSS, and HiPPI was the only technology available,” says Merrill. “We’d been using HYPERchannel, which had a top data transfer rate of 50 megabits per second. HiPPI was faster and more flexible.” Although the actual transfer rate was limited by the speed of the various MSS devices, HiPPI could make multiple transfers simultaneously, providing an aggregate rate of up to 800 Mbps.

Merrill recalls getting HiPPI up and running on two small Sun servers in December 1992. The next challenge was to install it on shavano, a CRAY Y-MP. “We first moved data to and from shavano via HIPPI on March 24, 1993,” Merrill says. “After that, all our computers had HiPPI. We usually had 20 to 25 machines all using the HiPPI interface. Every time we got a new system, we had to put HiPPI on it before it could talk to the MSS. Each one had a different way of accessing the interface, so we always had to modify and test the software. It kept us busy.”

A single HiPPI connection initially made use of two heavy-duty cables, each containing 100 copper wires running in parallel. By the late 1990's, SCD had replaced most HiPPI connections with serial fiber-optic cables, although some equipment continued to use the older copper cables. Serial HiPPI had many advantages, including lower cost and increased reliability.

The data path between NCAR supercomputers and the MSS consisted of HiPPI cables from the supercomputers to a set of HiPPI adapters, and ESCON channels from the adapters to the tape and disk devices of the MSS. (ESCON, which stood for Enterprise System CONnection, was a fiber-optic protocol developed by IBM.) While revolutionary in their time, HiPPI and ESCON were losing support by the late 1990s. The cost of HiPPI technology was high and the number of vendors was small.

At the same time, Gigabit Ethernet and Fibre Channel technologies were on the rise. Gigabit Ethernet, a standard for hardware, communication, and cabling, is one of the most common methods of connecting computers in a LAN. Fibre Channel is especially suited for interconnecting storage controllers and drives. Both Gigabit Ethernet and Fibre Channel offer data transfer rates of 1 gigabit per second.

By the early 2000s, supercomputer vendors were offering high-performance systems that were by default supplied with Gigabit Ethernet, while storage devices were increasingly using Fibre Channel rather than ESCON.

Accordingly, SCD developed a new, software-based Storage Manager to take advantage of these new technologies, superseding the HiPPI/ESCON data path. Gigabit Ethernet gradually took the place of HiPPI, and Fibre Channel supplanted ESCON. (In 2001, for example, SCD decommissioned the MSS’s ESCON-attached disk farm and installed a high-speed Fibre Channel disk cache. Soon after, SCD began swapping out ESCON tape drives with faster Fibre Channel drives.)

On January 12, 2006, SCD decommissioned the last of the ESCON tape drives and terminated the HiPPI/ESCON data path. It was the end of an era.

“We had our share of problems and difficulties with HiPPI, especially in the early years, but overall it served us well,” says Merrill. “It will remain an important part of the history of the NCAR Mass Storage System.”

The larger picture: MSS evolution

The move to Gigabit Ethernet and Fiber Channel is part of the larger, two-decade evolution of the MSS toward technologies that are faster and more reliable.

In the 1980s, the MSS was comprised strictly of tapes that were mounted manually by human operators. In November 1989, SCD acquired the first StorageTek Powderhorn data silo, which employed robotic arms to mount tapes at the blazing speed of 350 per hour. In 1995, an upgrade increased the speed to 450 mounts per hour.

Related Links

While early MSS tapes held only 200 megabytes of data, over the years storage technology advanced until today, the same-sized tapes hold 200 gigabytes—a thousand-fold increase over the original cartridge capacity. MSS tape drives have also improved to accommodate higher storage densities and faster data transfers, while the number of data silos has increased to five.

The “brain” of the MSS, the computer that controls the entire storage facility, is called the Mass Storage Control Processor (MSCP). SCD has managed a steady succession of better, faster MSCPs; the current model is a high-speed, high-performance IBM z/890-320.

Currently SCD is at work on a new software implementation of the MSS metadata catalog, which will further increase bandwidth, accessibility, and reliability for data transfers.

As technology continues to evolve and computational output multiplies, SCD remains committed to providing optimum, cost-effective mass storage for the Earth sciences community—just as it has since 1978, when NCAR’s first rudimentary archival system contained less than 1 terabyte of data.—Lynda Lester


Photos: Lynda Lester, CISL/NCAR


The Scientific Computing Division (SCD) is part of the Computational and Information Systems Laboratory (CISL) of the National Center for Atmospheric Research (NCAR) in Boulder, Colorado. NCAR is operated by the University Corporation for Atmospheric Research under the primary sponsorship of the National Science Foundation.