CISL Annual Report banner  
   

TeraGrid Integration

  TeraGrid Resource Providers
  This image shows the geographic locations of current TeraGrid Resource Providers and the 10Gb/s network links that interconnect them. The TeraGrid is a virtual facility for scientific research that integrates computational, storage, information, and data analysis resources at the San Diego Supercomputer Center, the Texas Advanced Computing Center, the University of Chicago/Argonne National Laboratory, the National Center for Supercomputing Applications, Purdue University, Indiana University, Oak Ridge National Laboratory, the Pittsburgh Supercomputing Center, and the National Center for Atmospheric Research. As a TeraGrid Resource Provider, NCAR is committed to offering a highly distributed network of computational, data, and knowledge resources to multidisciplinary groups of researchers, students, educators, impact and assessment communities, and policy makers around the world.

The National Center for Atmospheric Research (NCAR) is dedicated to integration and collaboration across scientific projects, scientific organizations, and technology. A key accomplishment toward this goal came in FY 2006 when NCAR established a partnership with the NSF-supported TeraGrid facility and contributed to its success as a Resource Provider and catalyst for enhanced services to the Earth System science community. As a TeraGrid partner site, NCAR will offer increased access to specialized high-performance computing and data storage resources, climate data, and tools for data analysis and visualization. Access to these facilities will help Earth System scientists better understand complex phenomena such as global climate change, hurricanes and other severe storms, wildfires, air pollution, solar storms, and space weather. NCAR's Computational and Information Systems Laboratory (CISL) worked closely with TeraGrid and the NSF to establish the partnership. The opportunities afforded by this partnership enhance NCAR's ability to forge new collaborations for distributed modeling, access to geoscientific data, digital preservation and archiving, and the development and use of new Grid technologies.

Focused geoscience collaboration initiatives in converging modeling frameworks and metadata, knowledge and semantic systems, and analysis and visualization environments pave the way for the creation of a new generation of knowledge tools. CISL is playing a leadership role in the development of global, interoperable data systems including a strong contribution to the World Meteorological Organization (WMO) efforts. The confluence of these capabilities provides a foundation for delivering new, very large datasets to our community, including re-analyses, regional climate model results, and global weather ensembles. This effort supports NCAR's strategic priorities of "Engaging a broader and more diverse community" and "Developing and providing advanced services and tools."

In FY 2006, a 10Gb/s network connection to the TeraGrid was established along with a new, dedicated datagrid server for an Earth System Grid (ESG) Science Gateway. At the end of FY 2006, NCAR will begin serving as a conduit to Terascale climate model data through the use of Grid computing technologies. During FY 2007, NCAR will begin offering an experimental storage cluster and provide access to select supercomputing resources.

NCAR's participation in the TeraGrid project is supported through NSF Core funds and UCAR Communications Pool indirect funds.

Detailed Project Description

NCAR currently delivers a substantial computing resource to the Earth System sciences community with a significant portion of this resource devoted to the university research community. NCAR produces, collects, and manages one of the world's premiere collections of environmental and geoscience data. NCAR is funding a Cyberinfrastructure Strategic Initiative, one element of which is called the Community Data Portal (CDP). The CDP effort is aimed at providing a unified gateway to datasets and GIS databases from a wide range of scientific resources that are broadly useful to the research community. These datasets span observed and simulated data for climate, ocean, atmospheric reanalysis, biogeochemistry, carbon cycle, weather, space physics, chemistry, and many others. The CDP is aimed at developing hardware and software cyberinfrastructure and sustainable strategies for managing the breadth of scientific data being gathered and generated at NCAR and UCAR, such that these resources can start to become knowledge resources for the broad community.

NCAR's initial resource contribution to the TeraGrid is providing access to the DOE-funded Earth System Grid (ESG) project configured as a TeraGrid Science Gateway. The primary goal of ESG is to address the formidable challenges associated with enabling analysis of and knowledge development from global Earth System models. Through a combination of Grid technologies and emerging community technology, distributed federations of supercomputers and large-scale data and analysis servers provide a seamless and powerful environment that enables the next generation of climate research. ESG provides access to a 150-TB collection of climate and ocean model data including datasets for the Intergovernmental Panel on Climate Change. It also serves as a central point for community distribution of the Community Climate System Model (CCSM) model itself, as well as other related datasets, analysis, and visualization tools.

The primary point of entry into this data and modeling collection is the ESG portal. This resource serves as a nexus for Data Grid services, a storage cache for other TeraGrid scientific projects, a conduit to NCAR's archival storage system, and as a vehicle for providing a wealth of scientific environmental data for the TeraGrid community. Additional Data Grid services and geosciences-specific services and applications will be layered on the data holdings. ESG and other CDP datasets will be "published" on the TeraGrid, and access modalities will support models, applications, and web portals—adding up to a premier environmental data service for the TeraGrid and geosciences communities.

NCAR offers an abundant, exciting, and unique portfolio of science and technology projects that, we believe, constitute a primary resource contribution to the TeraGrid.

TeraGrid Background

  TeraGrid logo

TeraGrid is an NSF-funded national facility that integrates computational and data resources and security, accounting, documentation, and educational outreach services from resource provider partners (RPs) to serve the nation's science and engineering community. Common services and integration processes and components are provided by, or in some cases coordinated by, the Grid Infrastructure Group (GIG). The GIG is responsible for architecture, planning, managing, and enhancing the TeraGrid facility, providing a core set of services, and coordinating RP staff through distributed service teams ranging from user support to security to education, outreach, and training.

The objective of the RPs and GIG is to enable scientific discovery by providing integrated access to the highest performance resources available, integrated as a coordinated system that supports various use cases ranging from exploiting a single TeraGrid resource to combining resources in specialized workflow or cooperative computing modes. Resource integration and enhancement efforts are ranked through user input and evaluations of TeraGrid services as measured by operational, system, or service use metrics.

The set of long-term TeraGrid objectives toward providing cyberinfrastructure to national science and engineering researchers can be expressed in three interdependent sets of activities.

TeraGrid DEEP encompasses a set of initiatives aimed at fully exploiting the integrated capabilities of the TeraGrid facility to support scientific discovery that would not otherwise be possible. The GIG coordinates user support staff to provide both traditional user consulting support and a program called Advanced Support for TeraGrid Applications (ASTA). ASTA assigns user support staff to dedicate 25% of their time for 6-12 months assisting a science group to enable them to fully harness TeraGrid services and resources as an integrated facility.

TeraGrid WIDE recognizes that, traditionally, NSF's high-performance computing infrastructure has focused primarily on only a small fraction of the national science and engineering community. Thus, in addition to supporting a current and growing user community, the aim is to provide TeraGrid services to many more scientists and engineers over the coming years. Such scaling requires a new model for interacting with the community and for provisioning cyberinfrastructure: the creation of science gateways.

TeraGrid's broad-impact goals also extend to students and educators. TeraGrid's Education, Outreach, and Training (EOT) program is a coordinated effort to raise the awareness of the benefits of TeraGrid within research and education communities across all disciplines and all learning levels. The EOT team works closely with the science gateways to engage significantly larger numbers of scientists, educators, and students, with an emphasis on reaching out to under-represented groups.

TeraGrid OPEN involves the provision of a persistent, reliable national cyberinfrastructure. The TeraGrid facility is architected as a set of integrated services based on open standards wherever possible and embracing the heterogeneity represented by nearly 20 unique major resources operated by TeraGrid RPs. OPEN also describes the approach to presenting TeraGrid to NSF and the community as a truly extensible and adaptable facility.

Timeframe

NCAR's deployment of TeraGrid cyberinfrastructure will be a strategic and ongoing activity. The initial deployment will consist of the necessary 10Gb/s network fabric, a data server for accessing Earth System Grid (ESG) data, a 50-TB, RAID10 high performance storage system running the Lustre file system, and a 5.7-TFLOPS-peak, 2,048-processor IBM Blue Gene/L and its associated I/O subsystem. Together, these components will create, by the middle of FY 2007, a high performance TeraGrid node capable of providing both high performance computing and data services to TeraGrid users.

Subsequent out-year upgrades of the TeraGrid infrastructure will be accomplished with CISL's research equipment budget. While modest, this investment should enable CISL and NCAR to continue deploying resources of a scale sufficient to develop Grid expertise and learn vital lessons about providing domain-specific Grid services to NCAR's scientific community. In many ways it is an open-ended project and its precise direction is difficult to predict. As potential collaborations and services emerge, CISL will adapt the NCAR TeraGrid node and the objectives of the project appropriately.

FY 2006 Accomplishments in Detail

The NCAR TeraGrid activity began in FY 2006 with a visit to NCAR by the TeraGrid's Grid Integration Group (GIG) PI, Charlie Catlett in September 2005. During this meeting with CISL staff, it was agreed that it was in the interest of NCAR and the atmospheric science community it serves and in the interests of the TeraGrid that NCAR join the TeraGrid effort. Over the next three months, NCAR developed a detailed scientific and technical justification for its joining the TeraGrid. Subsequently, a statement of work for NCAR was added to the FY 2006 TeraGrid project plan, TeraGrid points of contact were established at NCAR, and NCAR's TeraGrid plans began to unfold. These plans were first presented and discussed at the quarterly TeraGrid meeting in March 2006 in Chapel Hill, North Carolina.

An outcome of the March meeting was the creation of a TeraGrid Requirements Analysis Team (RAT), led by Tony Rimovski. The responsibility of this RAT team was to work with the NCAR point of contacts to develop an integration plan for NCAR's TeraGrid resources. These plans were developed and refined over the ensuing four months. During this period, the first objective was to establish a 10Gb/s connection to the Denver TeraGrid hub. The equipment for this, procured at UCAR's expense, became operational in May 2006. The completion of the 10Gb/s network connection enabled NSF Director Dr. Arden Bement to announce NCAR's official addition the TeraGrid at the first TeraGrid annual meeting in Indianapolis in June 2006. This announcement was critical, providing official recognition of the efforts already underway and also provided the necessary momentum.

The second milestone in NCAR's TeraGrid deployment was to establish the DOE Office of Science-funded Earth System Grid (ESG) as a TeraGrid "Science Gateway." TeraGrid Science Gateways signal a paradigm shift from traditional high performance computing use. Gateways enable entire communities of users associated with a common scientific goal to use national resources through a common interface that helps simplify scientist's use of technology. The ESG project provides climate data federation services between supercomputing centers, for example between the data holdings at NCAR and Oak Ridge National Laboratory (ORNL), hosting a total of over 140 TB of model data across the entire ESG system, most from the most recent Intergovernmental Panel for Climate Change Assessment campaign. In addition to data, ESG hosts climate model source code and related tools for data exploration and visualization. The ESG therefore represents an important CI service for the climate research community.

If large-scale climate computations are to be performed on the TeraGrid, it is critically important that ESG's data federation services be extended to the TeraGrid's 10Gb/s network fabric. Therefore, during the summer of 2006, NCAR's dedicated TeraGrid data server, a Sun V890 system, was procured and deployed, again at UCAR expense. Once this task was completed, the migration of Earth System Software to the Solaris 10, Globus Toolkit 4.x environment began. This porting exercise was completed in September 2006. Establishing the TeraGrid data server as a secure exposed host, simultaneously meeting both TeraGrid and UCAR security policy requirements proved to be a challenge, requiring a number of workarounds. The most critical points were finding a proxy method of enabling access to the NCAR Mass Storage System and providing one-time password authentication for the data server outside the UCAR security perimeter. An outstanding security challenge remains providing NCAR's data server access to the NCAR ADIC storage area network. Nevertheless, NCAR was in a position at the end of FY 2006 to cut over ESG operations to the TeraGrid data server in October 2006. Beginning ESG data transfers over the 10Gb/s network between NCAR and ORNL is gated by the readiness of ORNL CI and staff to perform these tests.

In addition to deploying the Earth Systems Grid project as a Science Gateway on the TeraGrid, NCAR has been steadily working to procure and deploy the necessary 10Gb/s networking, supercomputing, and data storage systems needed to create a TeraGrid HPC environment. The first step of this process was to move one IBM 720 storage server from behind the UCAR security perimeter and use it to perform GPFS-WAN testing with the San Diego Supercomputing Center (SDSC). The GPFS-WAN file system provides multiple data flow streams and adaptable TCP window sizing for tuning performance. These highly specialized wide-area file systems are for users/groups that need common storage space for results from large multi-site runs and/or for those with large data libraries. The central goal of these file systems is the development of a next-generation file system that can serve clusters with tens of thousands of nodes with petabytes of storage and move hundreds of gigabytes per second with state-of-the-art security and management infrastructure.

In August 2006, NCAR first successfully mounted the SDSC GPFS-WAN file system on the IBM 720. Performance tuning of the network pathway has been performed and will be completed once the 10Gb/s infrastructure connecting NCAR's HPC components is delivered.

During FY 2006, NCAR also successfully procured a Luster storage system from Aspen Systems, consisting of 100 TB of RAID5 disk storage space mirrored for redundancy in such a way as to produce 50 TB of highly reliable RAID10 storage. The cluster consists of 12 Object Storage Targets (OSTs) and two metadata servers. This will allow this cluster to reliably serve data over the TeraGrid while saturating the 10Gb/s link. The data storage cluster was purchased using the CISL Director's reserve funds.

A procurement was also held to acquire a 10Gb/s switch to interconnect the TeraGrid network, the data storage cluster, and the Blue Gene/L system. At the end of FY 2006, NCAR selected a Force 10 E1200 switch for this purpose. The E1200 switch was procured using CISL research equipment funds.

Becoming a TeraGrid participant also enabled NCAR and SDSC to sign an important agreement to provide geographical replication and storage for each other's critical datasets. The agreement formalized a partnership to ensure the reliable, long-term preservation of scientific data vital to the missions of both institutions. Some of the first NCAR data to be stored at SDSC are portions of NCAR's Research Data Archive, which is managed and curated by CISL. The Research Data Archive contains precious historic records and data from satellites and field experiments, as well as output from global climate-simulation models, mesoscale weather models, and other Earth science models. The collaboration with SDSC is one of NCAR's first tangible uses of the 10Gb/s network connection afforded by the TeraGrid. The data replication effort is a pilot project of the Chronopolis Consortium, a partnership that includes NCAR, SDSC, the University of Maryland, and the University of California Library System. Chronopolis aims to organize, preserve, and make accessible the increasing number of digital holdings that represent vital intellectual assets—many of which, like NCAR's Research Data Archive, are irreplaceable.

Becoming a Resource Provider on the TeraGrid has been a cross-cutting activity across CISL, enlisting contributions from every section. Understanding and implementing local solutions that adhere to the policies and procedures of both the TeraGrid and UCAR, on such topics as security and accounting, has been particularly challenging.

Project Plan Evaluation Measures

In FY 2007, NCAR intends to complete the deployment of the cyberinfrastructure described in the previous section. In particular, the high performance Luster data storage system and the 10Gb/s E1200 switch will be deployed in October through November 2006. An allocation policy for this storage resource will have to be developed. To that end, NCAR is sending two staff to attend a data workshop at SDSC, scheduled for November 28 through December 1, 2006.

The deployment of the IBM Blue Gene/L supercomputer system will involve several steps. First, the system must be moved outside the UCAR security perimeter. Next, the CTSS software stack must be installed and tested on the system. These two steps, which will render the system ready for preproduction testing, will be complete by December 31, 2006. Finally, the Blue Gene/L system must be allocated as a resource. This will occur in the first quarter of 2007. By April 2007, the Blue Gene/L system should be in full production on the TeraGrid.

Several partnerships are expected to develop in FY 2007. The ESG team expects to develop data federation capabilities with the Climate Center at Purdue University. Installation of the Storage Resource Broker (SRB) on NCAR's TeraGrid data server will allow the cross archiving of data between SDSC and NCAR to begin as described under the MOU agreement. NCAR will establish partnerships with ORNL, PSC, and the University of Indiana for conducting Lustre-WAN testing with these TeraGrid resource providers. Finally, the possibility of integrating the SDSC Blue Gene/L resource with the NCAR Blue Gene/L will be explored. Cross mounting file systems using GPFS-WAN may facilitate this integration as well as the migration of users back and forth between the two systems.

Throughout the TeraGrid project, CISL has kept careful track of the staff resources required and consumed. Over FY 2006, CISL measured charges to the TeraGrid activity that equate to an annual burn rate of $374,000, i.e. approximately two fully loaded Software Engineer-3s being consumed across the organization. Of course, this is spread across several individuals. In FY 2006, CISL hired a half-time TeraGrid security officer and added one full time junior software engineer to administer the systems needed for this project.

It is expected that, as more equipment is deployed in FY 2007, the load on NCAR staff will increase. To that end, CISL has made strategic adjustments to free salary and is prepared to hire another half-time software engineer/administrator to support the deployed system.

Impact

The impacts of this project are open-ended and difficult to predict. NCAR's access to and integration with TeraGrid resources will help ensure continuity of the NSF's cyberinfrastructure plans, particularly between the Office of Cyberinfrastructure (OCI) and the Geoscience Directorate. The connection itself is expected to increase the ability of NCAR scientists and geoscientists to collaborate using TeraGrid resources. The resulting collaborations will likely center around data exchanges at first, but will inevitably expand into other aspects of scientific workflows, such as the sharing or coscheduling of HPC resources.