Research Data Archive
![]() |
|||
|
These charts show the service and growth metrics for the Research Data Archive (RDA) during FY2006 and FY2007. a) The number of unique users specified by access pathway: the NCAR MSS, publicly available web servers, one-time special requests prepared for individual users, and TIGGE. b) The amount of data delivered to customers, by access pathway. c) The amount of data in the MSS archive, showing annual growth. d) The amount of data on public servers (a high-demand subset of the RDA archive), showing annual growth. Charts a) and b) indicate the RDA's great significance to the community, and charts c) and d) show the annual progress toward building more valued content into the RDA. |
|||
The Research Data Archive (RDA) is a key part of the NCAR strategic priority of "Creating an Earth system knowledge environment" because it provides an information resource through a large collection of datasets that support scientific studies in climate, weather, earth systems modeling, and increasingly other related geosciences. The RDA activities can be viewed from two different perspectives: user data services and archive content development.
In FY2007, about 5,400 unique users were provided 140 TB of data through various primary access pathways: the NCAR MSS, public servers on the web, one-time special requests prepared for individuals, and the TIGGE (THORPEX (THe Observing system Research and Predictability Experiment) Interactive Grand Global Ensemble) archive (see charts a and b). The largest user group is associated with the web access pathway (4,700) in contrast with the largest amount of data being distributed via the MSS access pathway (83 TB). This indicates the RDA is known as a world data resource, and when it comes to accessing large significant reference datasets, working directly on CISL computational resources is more effective.
A simple measure of data content development is archive growth. The RDA expanded by nearly a factor of two in FY2007, from 74 to 159 TB (see chart c). TIGGE is part of the RDA, but is shown separately because it alone added 66 TB. Although this factor somewhat overshadows the 19 TB growth in the remaining part of the RDA, this is also large when contrasted to the 6 TB growth in FY2006 (chart c). The most-demanded datasets from the RDA are available online through publicly available web servers. The current online data is 19 TB (chart d). Again, TIGGE is shown separately and does not change between FY2006 and FY2007, because it is a capped rolling 3-week archive with only the most current data. Older data are copied to the MSS and managed so they are directly available to NCAR computational facilities and upon request as a special order to the broader public.
As a whole, the RDA is constantly changing, curation extends and adds to existing datasets, stewardship improves the documentation, creates systematic organization, applies data quality assurance and verification, and develops access for the users. Many routine tasks and background infrastructure developments are necessary to maintain the RDA. Some major activities for FY2008 will be:
Continued enhancement of the TIGGE archive. Ensemble weather forecast data from five NWP centers will be added to the archive, bringing the total to 10 centers, and a data rate of 300 GB/day in several million 2D gridded fields. Significantly more web and software engineering is required to improve user access through the TIGGE portal hosted by the Community Data Portal. Users need the capability to acquire multi-center temporal-spatial-parameter data subsets on selected uniform horizontal grids.
Continued frequent updates for NCEP model and observational outputs that are available online. The global final analysis model output and supporting conventional observations account for over 33 TB annually downloaded from the RDA web server. These data are particularly popular for research that requires initial conditions to run regional models for many global locations, e.g. WRF.
Fully develop and deploy services for the Japanese 25-Year Reanalysis (JRA-25). We have received an extensive agreement with the JMA to partner with them in the distribution of all JRA-25 data products. All data are at NCAR, and significant effort is required to organize, document, and build user access interfaces.
World-class data resources are built through collaborations, a few projects underway include:
Integrated Surface Dataset (ISD): NOAA/NCDC and CISL have compared and assessed each other's surface data holdings and are collaborating to create a "best" merged collection.
CISL is hosting, in the RDA, the first international Observing System Simulation Experiment (OSSE) validation dataset. This NCAR outlet will primarily serve the university community; other U.S. and international agencies will serve their own communities.
The RDA will receive and maintain a copy of the 20th Century Global Reanalysis being run by NOAA/ESRL at NERSC. Early demonstration computations have shown this reanalysis method capable of defining most major storms and weather events back into the 1800s, which is much earlier than any previous reanalysis.
Steady progress will continue for the International Comprehensive Ocean-Atmosphere Data Set (ICOADS), a longstanding collaboration (20+ years) between NOAA and NCAR/CISL. A new release was accomplished late in FY2007, and another release is planned for FY2008.
The RDA maintenance and development within CISL is supported entirely by NSF Core funding.
