Climate Data Compression Research
| |
 |
| |
This image compares the
temperature field output from a high resolution POP ocean simulation
(left) with a version of the data that has been lossily compressed by
a factor of 20 to 1 (right). CISL's nascent Climate Data Compression
Research project is investigating wavelet-based lossy data compression
techniques applied to geosciences data. Lossy compression techniques
may become essential to exploring, managing, and distributing
ever-growing scientific data sets. |
The Climate Data Compression Research project is a new effort, begun
in FY 2006. Building on the success of CISL's VAPOR work, which employs
wavelet-based progressive data access to permit the exploration of
terascale data sets, CISL began investigating the application of
wavelet-based lossy data compression techniques applied to climate model
simulation outputs. The methods employed are similar to those now widely
used in the compression of digital media. The goals of this nascent work
are to:
- Determine whether, and to what degree, scientific data sets can
tolerate information loss
- Investigate a variety of compression methods and determine which
may be most appropriate for geosciences data
- And if successful, develop user tools for data compression.
The exponential growth in transistor-count density predicted by
Moore's law has led to ever-increasing computer processing power and
has enabled computational scientists to numerically simulate physical
phenomena at unprecedented scales, thereby generating extraordinary
amounts of data. For example, the recent IPCC work yielded over 100
terabytes of climate model output. While microprocessor performance
continues to double roughly every 18 months, other computing
technologies are improving at much more modest rates. In particular,
storage and networking bandwidths have lagged behind. As a result the
challenge of storing, analyzing, managing, and sharing large simulation
data sets is becoming increasingly problematic, hampering scientific
productivity. Lossy signal compression techniques, such as those
ubiquitously used for digital media and now being investigated by CISL,
may provide relief for researchers drowning in a deluge of data.
While the extension of integer, digital media, lossy compression
techniques to floating-point scientific data is relatively
straightforward, the scale of the data, and the desire to preserve
essential data properties, such as smooth derivatives, introduces many
subtle challenges. In FY 2006, CISL has been working with domain scientists
to identify promising wavelet decompositions for maintaining essential
data properties while yielding high compression rates. Progress has also
been made in developing computationally efficient algorithms for handling
very large data. CISL is currently collaborating with three CGD groups,
each with unique needs. CGD's Frank Bryan is providing POP ocean
simulation data with very high spatial resolutions (e.g. 3600 x 2400).
Grant Brantstator's atmospheric data, on the other hand, possesses
hundreds of thousands of time steps, but low spatial resolution. Finally,
CISL is working with Earth Systems Grid staff to deliver compressed CCSM
data over the web via the ESG. All of these efforts are works in progress.
Should these compression strategies prove viable, work in FY 2007 will
focus on developing end-user tools. We will also broaden our base of
scientific collaborators to further refine and validate this novel
approach to tackling large scientific data sets.
This research supports NCAR's strategic priorities of "Developing
and providing advanced services and tools" and "Conducting research in
computer science, applied mathematics, statistics, and numerical methods."
It is made possible by NSF Core funding.
|