CISL Annual Report banner  
   

Storage Area Network (SAN) Deployment for CDP and ESG

  SAN architecture
  The Storage Area Network (SAN) allows 60 TB of data to be shared between different nodes using a high-speed Fibre Channel switch and network. The SAN makes it possible for changes made on a dataset by a computational server (e.g. bison) to be instantaneously available via the shared file system on the SAN. The user community can access these datasets through the Community Data Portal (CDP) and the Data Support Section (DSS) server huron, and we plan to attach a TeraGrid server to the SAN in FY 2007 using a web interface that lists the datasets available for downloads.

During the summer of 2004, the Distributed Systems Group (DSG) became part of a joint SCD and University of Colorado (CU) effort to evaluate high-performance shared file systems. The use of high-performance shared file systems was driven by the need to share common data between diverse operating systems at speeds exceeding the use of network file systems (NFS). In this shared file system project, DSG was charged to set up a testbed storage area network (SAN) that would house the shared file system, the Quantum/ADIC StorNext FileSystem (SNFS), between attached servers. Quantum/ADIC was chosen because it works well in a heterogeneous operating system environment such as SCD, and it is not dependent on any specific hardware vendor for components such as switches and storage units. CU would investigate the use of more Linux-specific shared file systems such as IBM's GPFS and Lustre.

There was a pressing need for the Community Data Portal (CDP) system and the main Data Support Section (DSS) server to share and provide large data sets to the user community. Both of these servers were running the Sun Solaris operating system. As a result, the initial test was set up to determine how Sun servers interacted with the SNFS over a SAN. SCD's Mass Storage System Group supported this project by running benchmarks for file transfer speed rates using the SNFS. The results were compared to speeds of directly attached storage (DAS) units. The SNFS ran at speeds comparable to the DAS systems, and it was decided to put a large shared file system into production between the CDP and DSS servers. During FY 2006, a DSS Sun server was added to the SAN for computational support for DSS datasets, and a dedicated Sun server was added to handle the TIGGE project, bringing the total number of servers sharing the SAN to four. The storage space was augmented to more than 50 TB of available space.

This project supports the NCAR strategic priority of "Developing and providing advanced services and tools."

For FY 2006, datasets on the SAN include:

  • ECMWF ERA-40 Reanalysis Data (ERA40)
  • NCEP North American Regional Reanalysis Data (NARR)
  • International Comprehensive Ocean-Atmosphere Data Set (ICOADS)
  • CME (Carbon in the Mountains Experiment)—collaboration between CGD, EOL, ACD, NASA, NOAA, and several universities
  • ACD models and visualization clients code (including MOZART and TUV models) (Model for OZone And Related chemical Tracers, Tropospheric Ultraviolet and Visible radiation model)
  • HIAPER test flights data
  • University of Oklahoma hurricane Isabel case study
  • WACCM model data (Whole Atmosphere Community Climate Model)—collaboration between ACD, HAO, and CGD
  • WRF forecast for hurricane Katrina (Weather Research and Forecast model)

Many activities for the CDP/ESG (Earth System Grid) shared file system are planned for FY 2007. The ESG project was moved off the dataportal server and set up as a gateway, called datagrid, during the first phase of integrating NCAR into the national TeraGrid. The datagrid server will be attached to the SAN as soon as it passes its security analysis. It should be online in early FY 2007. The amount of data being housed for ESG will be expanded to more than 80 TB by the end of the year.

The SAN is made possible by NSF Core funds including CSL funding.