SCD FY97 Annual Scientific Report

Research data

The Data Support Section (DSS) maintains a large, organized archive of computer-accessible research data that is made available to scientists around the world. The archive represents an irreplaceable store of observed data and analyses and is used for major national and international atmospheric and oceanic research projects.

There are now over 490 distinct datasets in the archive, ranging in size from less than 1 MB to over 1 TB. The total volume of data in the DSS archive was 2.4 terabytes (TB) in August 1990 and 8.5 TB in October 1997. We have been adding a lot of reanalysis data and other analyses. The change of data storage with time has been as follows:


Data stored for Data Support and total mass store

  Data Support Section

Total mass store

 
Date Bit files Volume
  (TB)  
Bit files Volume
  (TB)  
DSS % of  
mass store
13 Aug 90 61,335 2.437 -- 14.430 16.9
4 Aug 91 65,518 2.689 715,000 19.400 13.9
3 Aug 92 80,538 3.085 1,060,000 27.270 11.3
Aug 93 103,314 4.072 1,351,271 36.280 11.2
15 Sep 94 119,703 4.751 1,849,466 47.423 10.0
14 Feb 95 123,877 5.085 1,966,990 52.456 9.7
24 Jan 96 137,680 5.950 2,486,471 67.590 8.8
28 Aug 96 143,340 6.770 2,888,639 78.964 8.6
28 Feb 97 151,509 7.513 3,289,224 91.399 8.2
17 Oct 97 159,945 8.482 4,046,678 110.359 7.7

The DSS staff provides assistance and expertise in using the archive and help researchers locate data appropriate to their needs. Users may obtain copies of data by network access, on various tape media, or they may use data directly from the NCAR MSS. DSS staff also assist scientists by providing data access programs (to read and unpack data), other software for data manipulation, and dataset documentation. At a later point we will present more information about the use of the DSS archives.

Data requests handled during October 1996 - September 1997:

DSS staff handled many requests for information about data, data processing tools, and online access programs. Staff handled 319 requests for data to be sent offsite. These requests required data from 389 datasets. Data were selected from 22495 archive volumes (holding 2030 GB), and 1841 GB were shipped to users. Users received the data on 10 round tapes, 102 cartridges, and 620 Exabyte tapes. In addition, at least 115 users received data by electronic transfer. We shipped 34 copies of the National Meteorological Center (NMC) gridpoint Compact Disk-Read Only Memory (CD-ROM). The University of Washington sold more of these CD-ROMs.

The following table shows a time history of our data requests from 1991-1996. The amount of data that we send to users has increased from about 150 GB per year to 750 GB. The use of half-inch tapes has been decreasing a lot, while Exabyte tapes are popular. They give us a method to send large datasets at low cost, and the tape drives cost as little as $700. Many other users access the data from the NCAR computers. We help them obtain our access software, etc. Then they run their programs, and we do not count those as data requests.


Data sent from NCAR DSS
This shows the handling of user requests at DSS

  1992
(8/91-7/92)
1993
(8/92-7/93)
1994
(9/93-8/94)
1995
(10/94-9/95)
1996
(9/95-8/96)
1997
(10/96-9/97)
Requests handled 400 417 441 399 328 319
Data from datasets (#) 475 497 -- -- 477 389
Read GB for user select 230 242 354 520 915 2,032
On MSS files (read for users) 6,212 5,099 7,116 8,268 9,192 22,495
Select data to send (GB) 150 154 258 382 750 1,841
 a. 1/2-inch tapes sent 727 333 262 92 7 10
 b. 3480 cartridges sent 280 112 280 268 280 102
 c. 8-mm tapes sent 103 117 260 240 336 620
 d. PC floppy disks sent 147 102 85 49 42 0
FTP transfers 16 80 89 190 120 115
CD-ROMs sold (1946-on analyses) 15 29 35 11 35 34
Reanalysis CD-ROMs -- -- -- -- -- 755

Major accomplishments completed by DSS in FY97

NCEP/NCAR global atmospheric reanalysis project

This huge project will result in 50 years of global analyses at six-hour intervals. The production of analyses started June 1994. Twenty-two years were finished in September 1996. Forty years were finished in October 1997. A long paper was in Bull. AMS in March 1996. A CD-ROM was included; the Bulletin is mailed to about 13,000 people (worldwide).

Ocean data

To support general research and reanalysis, the Comprehensive Ocean Atmosphere Data Set (COADS) has been updated. The 1950-1970 and 1990-1995 periods were reprocessed and supplemented with additional data. Surface winds estimated from the NSCAT and ERS2 satellite sensors have also been made available, as have near-real time and long-term SST analyses.

Maintain the archives and help users

We now have about 500 datasets. We updated many datasets during the year and added some new ones. Some data flows are very large. This is a large task.

Global analysis and observations from NCEP

The normal and advanced analysis archives from NCEP operations were routinely updated. The advanced archives started in 1990. Forecasts are included. In addition, we archive all of the global observations from NCEP. Data also come from ECMWF. Most of these sets are now current through September 1997.

Data exchange

Under the U.S.-Russian bilateral agreement, the data exchange has been active. Jenne and two other U.S. Principal Investigators visited Russia in August 1997 to draw up plans for another year of work.

A good data exchange has started with the Chinese Academy (IAP Institute, Beijing). Jenne visited Beijing in September 1996. A considerable amount of data had been exchanged by December 1996. Documents are available.

NCAR and NOAA also reached an exchange agreement with State Oceanic Administration of the Chinese National Oceanographic Data Center in Tianjin, China. We will furnish a land surface station archive in exchange for the digitization of 1.8 million ship observations from U.S. logbooks.

More information on the huge NCEP/NCAR global atmospheric reanalysis project

The NCEP/NCAR project will accomplish 50 years of global reanalyses, with output each six hours. The project started in 1991, based on many earlier years of data gathering, model development, and related experience. The task of NCAR (DSS section) is to provide the observations. There are thousands of surface and upper-air observations (temperature, wind, and cloud data from balloons, aircraft, and satellites) that are being used. The operational production of analyses started in June 1994.

Benefits of reanalysis and staff time used

Reanalysis has been a very big project for us at NCAR (the same can be said for NCEP). There are many benefits from the project: a very good set of analyses, other output such as precipitation and radiation, and datasets of observations that have much better quality (and are easier to use). The observations can be used for future reanalyses and for all sorts of other research. The work has been helpful to preserve national data treasures.

This project has a heavy impact on our time at NCAR. In late 1995, we had to speed up the work on all of the older datasets. We have been using 5 or 6 FTE of effort on reanalysis from September 1995 to October 1997. This is a big drain on our small group, but it is a great project.

Some key dates in the NCEP/NCAR 50-year reanalysis project

Use of NCEP/NCAR reanalysis data

The main archive for reanalysis data is at NCAR, but it is also at three other locations (Goddard, NOAA/ERL, and NOAA/NCDC). Summarized here are the data accesses from the NCAR MSS, and data distributed on tape and CD-ROM.


Reanalysis data usage

Method Amount of use
On computers at NCAR ~3500 GB/year
Sent on tapes 1931 GB by Oct. 1997
Sent on CD-ROMs 755 CD; 500 GB by Oct. 1997

Data first became available on the NCAR MSS in late 1994. The annual summaries for 1995, 1996, and the first half of 1997 are shown for the NCAR and University user communities. The number of unique users, numbers of MSS files, and gigabytes accessed on a read transfer are shown. We expect the dramatic growth trends shown to continue as more research activities begin on this archive.


Reanalysis data usage on MSS

  NCAR

University

Total

Year Users Files GBytes Users Files GBytes Users Files GBytes
1995 4 1,417 292 9 569 123 13 1,986 416
1996 14 3,753 811 45 8,516 1,869 59 12,269 2,679
19971 14 1,287 264 48 7,795 1,707 64 9,476 2,063

        1 January - June 1997 only

Sending reanalysis data by tape. The first order was in December 1994. The cumulative orders are given below.


Reanalysis data sent by tape

Date Number  
of orders
Cumulative      
data volume (GB)
1 Jan 96 13 85
1 Jan 97 68 1,187
13 Oct 97 117 1,931

  • All CD-ROM sales by October 13, 1997. One CD-ROM is prepared with a subset of data for each year of reanalysis. Sales of the NCEP/NCAR reanalysis CD-ROMs (one per year) are given below:


    Reanalysis CD-ROM sales

      Unique
      CDs  
    Orders CD-ROMs
        sold    
    21 Apr 97 8 14 81
    27 May 97 10 31 203
    23 Jul 97 10 58 387
    12 Aug 97 10 72 502
    13 Oct 97 12 106 755

    Plans for 1998-1999 for reanalysis

    Atmospheric and oceanic dataset advances

    The task to update many datasets, and add new sets

    NCAR has a very large archive (now about 480 datasets). Many people do not realize that the task of updating many datasets takes a lot of time. We have to bring in the data, run inventories to check for problems that can be fixed, and update the information about the various archives.

    During the period September 1996 through August 1997, 15 new datasets were added to the DSS archive. New sets include the 2.5 x 2.5-degree ECMWF 15-year reanalysis, data from recent exchanges with China, the UK Marine Data Bank, additional archives from the TOGA COARE project, and arctic Sea Ice Climatology. Updates: Over 100 different datasets were updated in the past year. Six sets were updated several times each month, and eight were updated monthly.

    Plans for 1997-98: The work to prepare updates will have to continue for each of these years. In fact, there may be a few more datasets that will need updates.

    The Comprehensive Ocean-Atmosphere Data Set (COADS)

    COADS is recognized worldwide as the most extensive set of surface marine data over the past 142 years. This dataset is a result of a cooperative effort, beginning during the early 1980s, between NOAA's National Climatic Data Center and Climate Diagnostic Center (of ERL), and NCAR. NCAR contributes to the COADS project in three ways. Computers at NCAR are used for nearly all the data processing. The Mass Storage System serves as the permanent archive for all the data. The Data Support Section is responsible for a majority of the data and documentation distribution, and for data access consultation.

    COADS Release 1 (April 1985) contained global marine data for the 1854-1979 period. Interim extensions to Release 1 added data for 1980 through 1991. Recent accomplishments have further extended the time series and made improvements to Release 1 and the interim extensions. Release 1a adds data to the time series for years 1980-1995. Release 1b has upgraded and replaced data for the 1950-1979 period. Along with adding observations to COADS, we have made data processing improvements that increase the overall quality of the data, e.g. upgraded the format to include more data fields, fixed some known data errors, and changed processing rules to achieve an improved mixture (from the many data sources) of observations.

    The next major development for COADS will be a reprocessing of the 1854-1949 time period. This phase will include a project that will merge the U.K. Marine Data Bank with COADS as well as include newly digitized data for this early period. In parallel, we will continue to develop our online documentation, accessible via FTP and the WWW, so that the COADS data users can conveniently keep informed about our achievements.

    Data support for oceanographic research

    Requests for oceanographic research data at NCAR and throughout the UCAR membership are given a high priority. Requests are filled in several ways. Data are collected from outside NCAR, documented, verified with access programs, and placed on the MSS for use on all local systems. Data products developed at NCAR, but outside of SCD, are migrated to the MSS, documented and placed under control and maintenance of the DSS. These products are then similarly available to all local systems at NCAR. Numerous data requests are also received from people not having NCAR computing accounts. In these cases, data, documentation, and access software are furnished to the user by network or magnetic media transfers.

    A few current achievements and forthcoming projects are briefly described to illustrate the scope of these activities.

    By responding to data service needs like these and many other smaller activities, the DSS supports a wide variety research throughout the national and international community.

    Data for the very high atmosphere, 70-1000 km

    Since as early as 1966, NSF has funded five special radar sites to probe the upper atmosphere between about 80 and 1000 km. The main database from several radars is from 1981 to the present. Variables such as temperature, electron density, and ion velocity are measured. In 1984, NSF began a special program involving the DSS and NCAR's HAO to establish and maintain a database at NCAR so that data from several radars can be easily used together.

    Under the CEDAR program, this effort has expanded to include related ground-based measurements and model output. For example, Fabry-Perot interferometer observations, Light Detection and Ranging (LIDAR) observations by the University of Illinois, Thermospheric/Ionospheric General Circulation Model (TIGCM) output from Ray Roble (HAO), Assimilative Mapping of Ionospheric Electrodynamics (AMIE) model output from Art Richmond (HAO), and Global Scale Winds Model output from Maura Hagan (HAO). Mesosphere-troposphere radar, medium-frequency radar, and high-frequency radar data have been added. Most recently data have been added from the Japanese MU (Mesosphere and Upper Atmosphere) incoherent scatter radar.

    A minicomputer is maintained at NCAR for access to this database. Batch and interactive software and documentation have been written and installed. Internet access is maintained at two levels: Documentation and data inventories may be obtained via anonymous FTP or web page, but a login is required to obtain data. Current plans are to add an interactive data selection capability to the web interface.

    We assist data contributors by designing new record layouts, providing conversion software and verifying results. DSS staff will periodically archive new data contributions, fill data and software requests, consult with users, and prepare and distribute an annual catalog.

    NCAR/TN-427+PPR-CEDAR DATABASE COMMITTEE REPORT. JM Holt (MIT Haystack Observatory) and B.A. Emery. June 96.


    | Next page | Top of this section | Table of contents |

    | NCAR | UCAR | NSF | NCAR FY97 ASR |