CISL Annual Report banner  
   

Workshop on High Performance Computing for Geosciences Research

  Facility architecture
  We propose a plan to fulfill the cyberinfrastructure needs of geoscience research. The diagram shows a distributed network of domain-specific science and technology centers linked to a central research hub facility that will supply computing and data resources beyond the reach of the partner centers. We call this architecture the geoscience collaboratory. The feasibility of this design has been demonstrated by the deployment of High Performance Computing (HPC) Grid technology in projects such as the NSF-funded TeraGrid and the eScience program in the United Kingdom. The proposed collaboratory will enable geo-specific HPC, data, and observation systems in the public and private sectors to collaborate more effectively on the most challenging problems in the geosciences.

NCAR has a reputation and a core vision of being an integrator, innovator, and community builder. NCAR has been contributing to efforts to define both the science and cyberinfrastructure requirements for the geosciences. These findings are outlined in the two Establishing a Petascale Collaboratory for the Geosciences documents: Scientific Frontiers and Technical and Budgetary Prospectus. NCAR is working to further develop the concepts and implement the vision outlined in these documents.

NCAR organized two NSF-funded workshops to encourage a common goal among the geosciences: to envision and develop the cyberinfrastructure necessary to support their full range of scientific objectives well into the future. These exploratory efforts support NCAR's strategic priorities of "Engaging a broader and more diverse community" and "Developing and providing advanced services and tools," and NCAR's strategic goal of "Improving understanding of the atmosphere, the Earth system, and the Sun."

"High Performance Computing in the Geosciences" is the second NCAR-organized workshop held in FY 2006 addressing the challenges of preparing to run geoscience applications on petascale computing systems and beyond. The first workshop, "Geoscience Application Requirements for Petascale Systems" (GARPA) was held June 1-2, 2006.

The High Performance Computing in the Geosciences workshop was organized and hosted by NCAR on September 25-27, 2006 in Boulder, Colorado. It attracted 125 participants, including university researchers with HPC research projects as well as representatives from other national laboratories, supercomputing facilities, and the NSF. Nine prominent NSF staff attended the workshop, including Margaret Leinen, Assistant NSF Director for the Geosciences; Jarvis Moyers, NSF/Division Director of Atmospheric Sciences; Art Goldstein, NSF Acting Division Director of Earth Sciences, and Eric Itsweire, NSF/Division of Ocean Sciences (OCE) Physical Oceanography Program Director.

Workshop Day 1

NCAR Director Tim Killeen welcomed participants to the workshop, then presentations were given by Dr. Steve Meacham (NSF Office of Cyberinfrastructure [OCI] Program Officer) and Dr. Margaret Leinen (NSF Geoscience Assistant Director). These presentations defined the overall NSF cyberinfrastructure strategy and provided context for further discussions.

Dr. Meacham covered the OCI Portfolio (data, data analysis, and visualization; HPC; virtual organizations; and learning and workforce development), the HPC spectrum (Track 1, Track 2, Track 3, etc.), and the enabling technologies for the TeraGrid.

Dr. Leinen described her view of the unique cyberinfrastructure requirements of the geosciences:

  • The geoscienes are a mature, well-organized domain
  • There are special features related to geospatial data and observing systems
  • A need exists for a domain-specific center with about 0.5 PFLOPS peak
  • Support services are an essential component of HPC
  • Preparing petascale codes for geoscience is a challenge

Dr. Leinen raised interesting questions for the participants:

  • What capabilities do you need for the next 10-20 years?
  • What are the pros and cons of having a discipline-specific center—what is unique about your problems that demand one?
  • What could you do with a discipline-oriented half-petascale platform by 2010?
  • What balances should we strike between making larger platforms and user services?

These NSF presentations were followed by three thought-provoking talks from Dr. Dale Haidvogel of the Institute of Marine and Coastal Sciences representing coastal oceanography, Dr. Thomas Jordan of the Southern California Earthquake Center (SCEC), and Dr. Robert Wilhelmson, Chief Science Officer at the National Center for Supercomputing Applications (NCSA). Each showed important scientific applications that could use petascale hardware. There were notable commonalities in their talks:

  1. They desire more complete descriptions of the systems they study. This is driving:
    • More complex (expensive) phenomenology:
      • Ecosystem models
      • Chemical reaction networks
    • Disciplinary models are moving toward interdisciplinary ones
    • This is leading to coupled models
  2. They desire end-to-end prediction systems. This leads to the need for:
    • Efficient data assimilation/inverse methods
    • Workflow management systems
    • On-demand computing (as needed during natural disasters)
    • Tools for geospatial data manipulation
  3. They want to capture multiscale phenomena. This is driving:
    • The push for higher resolution
    • The inclusion of adaptive mesh refinement techniques and the move to unstructured grids
    • The development of sub-grid-scale parameterizations
  4. They want more model coverage. This drives:
    • Bigger regions
    • More parameter values studied
    • Larger ensembles

It was clear from these presentations that all three researchers were facing the same challenges.

Dr. Haidvogel discussed these challenges from the perspective of petascale coastal oceanography. He noted that the last five years have seen a lot of progress called for in the 2002 Ocean Information Technology Infrastructure (OITI) report titled Information Technology Infrastructure Plan to Advance Ocean Sciences. In particular, circulation models, observing networks, and assimilation methods are in place (or soon will be). The challenges for coastal oceanography cited by Dr. Haidvogel for the next five years include:

  • Multi-variate data assimilation
  • Increasing eco-system complexity
  • Developing subgrid-scale parameterizations
  • Improving coastal observing networks
  • Robust models on unstructured grids
  • Solving technical issues with grid nesting, etc.

Dr. Jordan showed how SCEC is using frameworks and grid-based technology to solve earthquake problems. He also described the software development path for moving the TeraShake model to PetaShake. The petascale drivers for computational seismology cited include the need to:

  • Move from < 0.5 Hz to > 3 Hz in frequency resolution for engineering studies
  • Study large-scale regions, e.g. 800 km x 400 km x 100 km
  • Address geological complications: e.g.
    • Surface topography
    • Non-planar faults
    • Nonlinear wave propagation

Organizationally, he commented on the need for vertical integration of the national petascale initiatives and need for a mixture of capability, capacity, and data-intensive visualization systems.

Dr. Wilhelmson made the comment that many students still spend 80% of their time doing low-level work, and only 20% on science: he advocated building tools to flip that ratio. Achieving this might be as significant as deploying a petascale system. Regarding grid technology and virtual organizations, he went on to point out lessons learned from the George E. Brown, Jr. NEES grid, namely:

  • Communication is the key
  • View it as a partnership
  • Keep moving forward
  • Users must take responsibility
  • Build on previous work

Dr. Tim Palmer of the European Center for Medium Range Weather Forecasting (ECMWF) gave a well-received dinner talk titled "Petaflop Computing and Reliable Climate Prediction—a European Perspective." In brief, Dr. Palmer presented the argument that if there is currently no bigger problem facing society than climate change, then quantifying the threat of climate change using the best resources available (including HPC resources) must surely be of the highest priority.

Workshop Days 2-3

At the start of the second day, the background from the two-part report titled Establishing a Petascale Collaboratory for the Geosciences was presented by Drs. Frank Bryan and Richard Loft of NCAR. Volume 1, Scientific Frontiers, established some of the science drivers for petascale geoscience computing, while the second report, Technical and Budgetary Prospectus, developed a straw-man cost model and feasibility plan for a geoscience collaboratory. Dr. Loft's presentation emphasized that the views expressed in the report had evolved significantly to include the NSF's cyberinfrastructure strategic plan. In particular, it incorporates:

  • A distributed model for a geocollaboratory that leverages Grid technology.

Breakout sessions were conducted after these presentations. The morning sessions organized participants by discipline. The afternoon sessions were cross-disciplinary.

The questions considered in the morning included:

  • How can a geoscience high performance computing enterprise be best structured to meet the needs of your discipline?

  • What will be required to make such an enterprise function effectively?

  • What are the characteristics of a computing resource allocation model that will best meet the needs of your discipline?

  • What types of service and support will your discipline require, and what types of applications will need to be supported?

The breakout sessions in the afternoon addressed:

  • How can the community best facilitate—through developing geoscience cyberinfrastructure—better collaboration in the areas of education, outreach, training, and workforce development?

  • How can the economy of scale for a geoscience high performance computing enterprise best be demonstrated?

  • What is the best model for establishing a fair relationship between available resources, required science, and allocation of computing?

  • How can a geoscience HPC enterprise be designed to provide the optimal capability for addressing emerging opportunities and for supporting "hero" computing requirements?

After each breakout session, the deliberations were summarized by the breakout leaders in plenary sessions.

Workshop Findings and Recommendations

The community had several important recommendations and insights for how to go build a geocollaboratory, both what they need from it and what they can contribute.

  1. There was general consensus that, discipline by discipline, the community has problems suitable for the petascale and needs large-scale computing resources.

  2. The importance of tools for data analysis and visualization was repeatedly stated as a need.

  3. It was widely felt that the community needs to develop several compelling science problems that require the collaboration of the entire community. Two candidates were suggested:

    • Earth's Past—Paleoclimate integrates not just ocean, atmosphere, and ice, but also the geological record. By adding plate tectonics and the Sun's variability (stellar evolution) as components needed to understand the Earth's past, this could constitute a major grand challenge. Events like snowball Earth, Paleocene-Eocene Thermal Maximum (PETM), the Cretaceous-Tertiary (KT) event, and the ice ages inform our understanding of the Earth/Sun system's potential for variability.

    • Water Cycle—Understanding the Earth's water cycle is critical for human survival and brings hydrologists and climate experts together in a grand challenge mission.

  4. Several participants expressed the concern that the science drivers described in the Petascale Collaboratory for the Geosciences documents were not inclusive enough and needed to be updated. The comment was made that the outside organizing panel for the workshop was not broad and representative enough of the full spectrum of geoscience subdisciplines.

  5. There was a common sentiment expressed that additional application support for petascale computing must be provided within the collaboratory.

  6. A common belief was expressed by the majority of participants that the human dimensions of collaboratory formation—e.g., providing suitable education and outreach opportunities, engaging social scientists and community planners in forming and utilizing collaboratory resources, and ensuring appropriate governance and communication within the collaboratory—will be as important to the endeavor as the provision of hardware and software resources.

Workshop Deliverables

A consensus was reached on several points for moving the Geocollaboratory formation process forward:

  • That a new steering committee be formed for the purpose of writing the Geocollaboratory Implementation Plan

  • That a workshop summary document be written and released by mid-November 2006

  • That a Town Hall Meeting be held during the December AGU meeting to provide an opportunity for further discussion of the Geocollaboratory concept with the community

  • That a series of pilot projects be identified to demonstrate the utility of forming the Geocollaboratory

  • That workshops be proposed and conducted to build community consensus on the Geocollaboratory science drivers

Impact of the Workshop

The workshop brought together a critical mass of geoscientists and focused them on the issue of cyberinfrastructure needs for the geosciences. The clear message coming out of the workshop is that the geoscience community needs and is ready for better cyberinfrastructure that enables collaboration and provides access to next-generation HPC and data storage resources.

Beyond that, the geoscience community made it clear that common tools for creating end-to-end modeling and forecast systems is critical to their scientific progress. An organizational structure facilitating that would be welcome.

As a consensus-building and community-building exercise, the workshop was a successful first step.

The deliverables cited above are relatively low cost, feasible, and will build momentum for the community moving forward.