CISL Annual Report banner  
   

SCD Research: Computational Science

  Experimental computing
  This diagram shows the newly created experimental environment, its components, and how it relates to the NCAR production environment. The objective of this experimental environment is to provide the infrastructure required to track, evaluate, and adopt new technologies. This capability supports NCAR's leadership in high performance computing and helps it influence the design of coming generations of computing and information systems.

SCD's Computational Science Section (renamed at end-FY 2006 as the CISL Computer Science Section) is responsible for tracking and evaluating new computing technologies, making early adoption decisions, and for performing systems research. Section members are actively pursuing research in the following areas:

  • High-performance computing
  • Grid computing
  • Experimental systems
  • Linux clusters
  • Experimental networking and evaluation of high-performance interconnects
  • System and network performance analysis
  • High-performance file systems and archival storage systems
  • Parallel Algorithms and Architectures
  • Model development

Research results from the past year include our successes in model development, high-performance computing, Grid computing, and experimental systems.

One area of strategic importance to the Section is Linux cluster research. In collaboration the University of Colorado, we have evaluated various aspects of cluster design, including parallel file systems, diskless compute nodes, scalable management systems, and cluster interconnect solutions. Some results of this research are a better understanding of how to deploy cost-effective computational clusters that can accommodate the NCAR scientific workload while minimizing ongoing management overhead and complexity.

One essential infrastructure component required by the high-performance computing community is archival storage, as evidenced by NCAR's >3-PB Mass Storage System. Magnetic tape is a well-established archival storage medium, but tape systems have limited read and write throughput, require tape retrieval queue time, and risk storing all of the important data in one place. To address these limitations, we are building a reliable and high-performance file system for archival storage using low-density parity check codes (LDPC). The advantage of moving to an LDPC scheme based on an open software infrastructure is that it allows us to leverage emerging storage solutions. To date it has been shown that Tornado encoding schemes can be designed so that they are significantly more fault tolerant than either RAID or mirrored systems, and that by using cooperatively selected Tornado Code graphs to build a geographically distributed data stewarding system, one can obtain overall systems fault tolerance exceeding that of its constituent storage sites or site replication strategies.

These efforts support NCAR's strategic priorities of "Providing capability and capacity supercomputing to the community," "Developing and providing advanced services and tools," and "Creating an Earth system knowledge environment." These SCD research projects and programs are supported by NSF Core funding, with other support as indicated by the individual reports in this document.