Director's Message  |   Executive Summary  |   Divisional Narrative   |   Publications   |   Educational Activities   |   Awards   |   Community Service   |   Staff   |   Visitors and Collaborators   |   NCAR FY2003 ASR   |
 

Research data support and services

The Data Support Section (DSS) maintains a large, organized archive of computer-accessible research data that is made available to scientists around the world. The archive represents an irreplaceable store of observed data and analyses and is used for major national and international atmospheric and oceanic research projects. DSS has much data, processing capability, and services that are not offered by other groups. The DSS group started working in 1965 and has been working on large projects and building the data archives ever since.

There are now over 550 distinct datasets in the archive, ranging in size from less than 1 MB to over 1 TB. The total volume of data in the DSS archive was 2.4 terabytes in August 1990 and 22.5 terabytes in September 2003.

Data stored for Data Support, and total mass store

  Data Support Section

  Total NCAR Mass Store

Volume

Date

Bit files

Volume   Bit files Volume

DSS/MSS

13 Aug 1990

61,335

2.437 TB

 

---

14.430 TB

16.9%

3 Aug 1992

80,538

3.085 TB

 

1,060,000

27.270 TB

11.3%

15 Sep 1994

119,703

4.751 TB

 

1,849,466

47.423 TB

10.0%

28 Aug 1996

143,340

6.770 TB

 

2,888,639

78.964 TB

8.6%

17 Oct 1997

159,945

8.482 TB

 

4,046,678

110.359 TB

7.69%

2 Sep 1998

167,073

10.032 TB

 

5,038,611

147.439 TB

6.80%

7 Sep 1999

185,608

11.942 TB

 

6,737,448

206.885 TB

5.77%

25 Aug 2000

192,404

13.875 TB

 

8,187,688

267.796 TB

5.18%

6 Sep 2001

210,224

17.475 TB

 

10,781,364

370.706 TB

4.71%

1 Nov 2002

225,257

20.543 TB

 

14,996,451

568.519 TB

3.61%

18 Sep 2003

236,881

22.453 TB

 

20,167,740

869.241 TB

2.58%

Note: The total data on the mass store passed 300 TB on January 2, 2001. Over the next 8.1 months, it added another 70.7 TB.

The DSS staff provides assistance and expertise in using the DSS archive, and they help researchers locate data appropriate to their needs. Users may obtain copies of data by network access, on various tape media, or they may use DSS data directly from the NCAR mass store. DSS staff also assist scientists by providing data access programs (to read and unpack data), other software for data manipulation, and dataset documentation. DSS has 10 staff members. There are 8.2 technical staff that work on projects in meteorology and oceanography, plus 1 FTE for administrative work. Also, 0.8 tech staff works on the very high atmosphere (CEDAR).

A summary of main accomplishments by Data Support during FY2003

  • Data development work to upgrade reanalysis observations and to prepare for climate variability and other research has continued. There are now seven major categories of observations, each with many component datasets. The NCEP/NCAR 50-year reanalysis (1948 - on) used Version 1 of the observations. Version 3 of the observations was completed in June 2003.

  • DSS has always and continues to update numerous research data products on a regular basis. Data transfers using magnetic tape are becoming less frequent and scheduled network transfers have replaced them. The advantage is that network transfers can be highly automated because new MSS files and online files are quickly becoming available for users. The disadvantage is network transfers need to be carefully monitored to ensure dataset integrity. DSS continues to work on monitoring systems that include receipt reports and data delivery reconciliation.

  • Read old tapes: More than 2,500 7-track and 9-track tape reels were processed to form MSS files. This effort rescued many old source data collections and removed the need for SCD to maintain significant reel tape capability. DSS maintains small-scale reel tape capability for future projects.

  • The DSS data and information server was upgraded with new hardware, online data, and documentation. This affords faster service, more easily accessible data, and better metadata information for the users. This is an ongoing effort that is monitored with statistical summaries of data movement to the network and individual dataset usage.

  • Software was written to read existing DSS metadata, built up since the middle 1980s and to write new metadata in a standard formate used by the CDP. This makes the DSS archives uniformly discoverable along with other data UCAR-wide.

  • The Document Project: Historically, DSS has written considerable hardcopy dataset documentation and collected many data reports. A project that will preserve these metadata is using scanning technology to create digital page images. Roy Jenne has been gathering many smaller documents into bundles of papers and writing more. The library now holds 305 documents and more than 18K image pages. This effort is ongoing. More scanning needs to be done, and overview guides that will aid the users have been written.

  • Data services have been provided to over 1,500 unique users during the past year. About 820 of the users take data service directly from the MSS, obtain data prepared upon request, or receive data via CD-ROMs, ftp, and tapes. The remaining users (about 700) are identified by unique IP address that access the online services and explicitly download files containing research data of substantial size. There are many more web hits for metadata, documents, etc.

  • Involvement in reanalysis with both preparation of observations and product distribution stands as a key activity for DSS. The Version 3 global surface and upper-air observations for 1957 - 78, plus VTPR and TOVS (1972 - on) satellite data were provided to ECMWF for the ERA-40 project. Some Version 3 surface hourly data was also specially prepared during Mar - Aug 2002 for NCEP Regional Reanalysis (NRR), which is a 25-year reanalysis (1979 - on) covering North and Central America at high resolution. The world 3-hour surface data (7,500 stations) was given better location data by June 2003. Outputs from these reanalyses are downloaded frequently and provide users with one of the very best datasets for studying long-term climate variability and trends.

  • The DSS marine data collaborative project with Russia (NSF funded) has proceeded on schedule. Marine surface observations from about 150 Russian research vessel cruises have been digitized, delivered to DSS, and verified by format translation into the I-COADS standard format.

  • This year progress on I-COADS has been focused on data source development and a new underlying ASCII format. Through these efforts, I-COADS will be more complete, and accessing and incorporating international contributions will be easier. Next year, I-COADS will be extended to year 2002 and will be available in the new ASCII format.

  • The widely used World Ocean Atlas and World Ocean Database 2001 have been added to the archive along with other key data resources, e.g. 1999 - 2003 NCEP analysis and QSCAT blended dataset, the SST analyses from NCEP and the UK Hadley Centre, and the MICOM model output. These collections will be extended, and new ones will be added next year.

Information about our main projects

  1. More data development work for reanalysis and climate research

    Version 3 of all of the reanalysis observations for 1948 - on (now 55.8 years) was completed June 2003. This completes a major goal, but it does not complete enough work to satisfy all of the present research goals. For example, we do not have enough surface observations for South Africa for years before 1967. During 2000 - 2003, people in the U.S. have been planning for a new long reanalysis. It will need all of the present observations plus some upgrades as noted. One part of the new national plans is to also do some reanalysis work for 1900 - 1947. There are projects in NOAA and NCAR to increase the amount of digital data for those early years. The new work for the COADS marine dataset will also help meet these needs. Several groups (including NCAR) have been gathering information about what data could be prepared.

  2. The seven sets of observations for reanalysis and climate research

    We have tried to assemble all the observations we could for 1946 - on, and we have some data for earlier years. Most of the data are for 1948-2003. These datasets provide a significant benefit to the research community. The main types of observations for reanalysis include:

    • Rawinsondes (balloons for temp, RH, wind aloft)
    • Pibals (wind aloft)
    • Aircraft reports (wind, temperature)
    • Satellite cloud winds
    • Satellite soundings (temp aloft and trace gases). The first of these data started in April 1969
    • Sfc 3-hr synop (7500 stns, temp, pressure, wind, weather, etc.)
    • COADS ocean sfc (ships, buoys, temp, wind, pressure, SST, etc.)


  3. The Reanalysis paper of March 1996 is the most cited paper in geosciences. Some good news about the NCEP/NCAR Reanalysis:

    • Eugenia Kalnay was the lead author of a long paper: "The NCEP/NCAR 40-Year Reanalysis Project," that was published in the March 1996 Bull. AMS. In July 2003, Eugenia was informed that this paper was the most highly cited paper of all of the papers published in the geosciences in the past decade, with 1,795 citations. In July 2003 the total citations had increased to 2,064. The reanalysis has been a very useful project.

    • The reanalysis project officially started in February 1991. Production of analyses was during June 1994 - July 1998 to do the first 50 years. Production was done on a 1.0-Gflop (real) computer, that could do 30 days of global analyses each day (resolution: T62, 208 km, 28 levels).

    • NCEP updates this reanalysis each month. As of October 2003, global data is available for each six hours during 1948 - September 2003 (55.7 years).

    Since 1998, our NCAR crew has been adding more observations, reducing error tolerances, and refining the good product created during the eight years when the time pressures were very severe. Now we have many of the world's meteorological observations taken during 1948 - on. The progress is also good on documents and backups, but not done.

  4. Obtain ERA-40 reanalysis data from ECMWF

    The previous reanalysis project at ECMWF was ERA-15. It used T106 resolution (125 km). From June 1994 to September 1996, ECMWF produced the ERA-15 reanalysis for 1979 - 1993 (15 years). The total volume from ERA-15 in the NCAR archives is about 418 GB.

    ECMWF started planning for a long reanalysis (ERA-40) in about 1996. NCAR provided nearly all of the observations for the early years (1957 - 78) and part of the observations for later years. NCEP converted the observations to Bufr and passed them to ECMWF, where they were used to analyze observations for the years 1957 - 2002 (45 years). They are using three separate parallel datastreams in production. A problem in surface observations for early years was discovered and fixed in June - July 2001. Production for the years 1957 - 1978 restarted about September 2001. Production was completed in April 2003. The resolution was T159 (83 km) and 60 levels.

    The data flow to NCAR

    NCAR expects to obtain about 15 to 35 TB from ERA-40. The data may start to arrive about November 2003.

  5. Mesoscale model data

    NCAR has archives of mesoscale model data from NCEP with data from October 1971 - on. The early data had a resolution of 190 km, which is not a very good resolution by 2003 standards. For example, the global NCEP/NCAR reanalysis (production started in June 1994) had a resolution of 28 levels and T62 (208 km). This global resolution is almost as good as the mesoscale model of 1971.

    NCAR has been the model data center for mesoscale model data since 1994 and has received some NOAA funds to pay for part of this work. NCAR receives data from three main mesoscale models, (1) the Eta model data of NCEP, (2) the Maps model from NOAA FSL in Boulder, and (3) the Gem model in Canada.

    The archives from Eta start in May 1995, from Maps starts August 1996, and data from Canada starts April 1997. The models have been run at a resolution of about 32 km and interpolated to a NWS grid with resolution 40 km. The table below shows the amount of mesoscale data at NCAR, and the amount sent to users:

    Mesoscale model data at NCAR

    We show the total data in the mesoscale model archive for North America on four successive years. The cumulative data sent to users is also given. These data have a resolution of 30 to 40 km.
    Date

    Data in
    archive (GB)

    Cumulative data
    sent to users

    Jan 2000

    510

    450 GB

    Feb 20, 2001

    737

    748 GB

    Apr 30, 2002

    1,049

    2,860 GB

    The numbers are from Chi-Fan Shih, NCAR.


  6. The NCEP Regional Reanalysis (NRR) of North America

    This NCEP mesoscale reanalysis of North America has been under development for several years. It will analyze this region at a resolution of 32 km for 1979 - 2003 (25 years). In June 2003, a fast operational phase started with three or four parallel datastreams. This was mostly completed in September 2003. About 12 TB will come to NCAR from this project, and the first data may arrive at NCAR about November 2003. It will arrive on STK tapes that each hold 200 GB native.

  7. Progress in preparing all of the world observations for reanalysis and climate research

    Version 1 of all of the observations was used for the NCEP/NCAR 50-year reanalysis. Version 1 of observations for the last block of years (1948 - 1957) was ready by March 1998. There was still much work that could be done to add more data and decrease the errors in station locations. Summary of progress:

    • Start on Version 1 of observations in February 1991 and finish March 1998.
    • Version 2 ready and all sent to ECMWF by February 1999.
    • Version 3 was ready by June 2003. Most of Version 3 was ready in time for use in the ECMWF ERA-40 reanalysis. The years 1957 - 1978 are completely Version 3 in ERA-40. There had been some trouble and they got observations for all these years again in July 2001. Production for years 1957 - 1978 restarted September 2001.
    • At NCAR, we have had to slow down the development work, but some datasets for post-Version 3 are being prepared.


  8. Project to analyze a huge New York snowstorm in December 1947.

    The director of NCEP has a project to reanalyze a big snowstorm in New York City in December 1947. In March 2003, NCEP asked if DSS could help them get more observations to analyze December 1947 and make forecasts. They tried an analysis and forecast with the observations that we gave them earlier, but it did not work. They did not have all of three datasets from us: US raobs for 1946 - 1947 and two sets of pibals (wind aloft) for 1918 - 1947. We also obtained Canadian surface observations for 1947 - 1973 in January 2003, processed it, and passed it to NCEP about April 2003. NCDC Asheville sent us a new set of U.S. hourly surface observations for 137 U.S. stations for 1928 - 1948, received in April 2003. We have done a lot of quality work on it and will pass it to NCEP about November 2003. About July 2003, NCEP used the new supply of data to make analyses and forecasts for December 1947. They were very happy to get good forecasts 24 to 36 hours in advance. They will report on this at a special AMS symposiums about January 2004.

  9. Work with NCDC, Asheville (NOAA) on hourly U.S. observations, 1928 - 1948.

    NCDC has key-entered this old U.S. data from forms for 127 cities. We have done a lot of quality control work on it and fixed problems.

  10. Progress on the I-COADS dataset of surface marine observations

    NOAA/NCDC and NOAA/CDC have been cooperating with NCAR since the early 1980s on the U.S. COADS project. Since the beginning, COADS has benefited from cooperation worldwide, and to reflect this status has been renamed International Comprehensive Ocean-Atmosphere Data Set (I-COADS). Over time, I-COADS has been improved with additional observations -- especially for periods with little data -- through the discovery and corrections of data problems and through enhancements to the metadata. Currently, I-COADS is available for 1794 through 1997, and an overlapping extension (1997-2002) is due for completion in late 2003.

    International collaborative efforts, data exchanges, and the U.S.-based Climate Data Modernization Project (CDMP by NCDC) are all contributing previously unavailable data to I-COADS. The Russian R/V digitization project is now in its third year and has produced 1.9 million observations from the global ocean for 1937 to 2000. A new data exchange agreement between the U.S. and China is nearly finalized and will result in digitizing 500,000 - 750,000 observations for 1850-1868 from the German Maury Logbooks. CDMP is contributing by providing digital images of logbook pages and media suitable for digitizing in China. These and other international data sources that are available to the project will be added to I-COADS when the appropriate time periods are reprocessed.

    A new format for marine surface records will become an integral part of I-COADS in the coming years. The International Marine Meteorological ASCII (IMMA) has been under development by the I-COADS project for several years and will receive acceptance by the WMO in the future. This format is suitable for all marine surface data; old historical records, modern keyed ship logs, buoy, and GTS transmitted data. The official archive copy of I-COADS and the forthcoming real-time I-COADS (to be provided by NCDC) will be in the IMMA format.

  11. Status of oceanographic data at DSS

    DSS has a broad and expanding selection of oceanographic data. The more than 70 datasets support climate research, model initialization and verification, and development of the I-COADS, which is a blend of many data archives covering more than a century. All the I-COADS products are available, but beyond those some of the newer and most popular collections are:

    • Blended global wind stress from QSCAT and NCEP Analysis for1999-2003
    • SST global analyses from NCEP and the Hadley Centre (UK) for the 1800s - 2003
    • Miami Isopycnic Coordinate Ocean Model (MICOM) results
    • World Ocean Atlas (WOA) and World Ocean Database (WOD) from NODC, release 2001

    All the oceanographic datasets are organized into categories and described in a title list that is easy to review and is available online at DSS home web page. The short title list provides convenient links to more detailed metadata, and in many cases the data itself. For needs not served by this approach we also offer all the data to users that have access to the SCD MSS and can make it available to other users by request to the DSS staff. These services are clearly shown on the DSS information web pages.

  12. Datasets for the high stratosphere 70 - 1000 km; CEDAR

    The archives of CEDAR data at NCAR for the high atmosphere include data from about 100 instruments. Data for about five of the large incoherent scatter radars are included. There are also new types of data like lidars. This archive work at NCAR was started in 1984. The data have been on a Sun server at HAO, but will probably be moved to a new fast PC server running Red Hat Linux. This is a computing division (SCD) project with HAO. One DSS staff spends 0.8 FTE of time on CEDAR work. One online document, RJ0160, 45 pages, has more information about these data.

  13. The document project; More progress in preparing documents online

    The document project started in April 1999, and the production of scanned documents started in March 2000. The goals of this project are to (1) gather many documents that describe datasets, observing programs, and related information. Many documents were only 1 to 10 pages long and needed to be assembled into larger "books." (2) Write more documents, and (3) scan them for online use.

    In October 2003 we had completed 305 documents with 18,233 pages. The progress on scanning documents has been as follows:

    Date

    Documents

    Total pages

    Comment

    3/31/2000 -- 206  
    10/5/2000 Up thru RJ0062 4,474 Oct 2000
    6/28/2001 Up thru RJ0117 8,401 Jun 2001
    7/26/2002 Up thru RJ0220 14,414 Jul 2002
    3/06/2003 Up thru RJ0274 17,000  
    6/12/2003 Up thru RJ0287 17,521 Jun 2003
    10/16/2003 Up thru RJ0305 18,233  

    These documents will be a good resource to go along with the datasets and the projects. There is still too much work to do, but many key subject areas have been covered.

    How to find the scanned documents

    The documents are listed on the web at Scanned Documents. We have prepared a guide to the documents (RJ0297, 72 p) where there is a little more information about the texts, and where similar subjects are grouped together.

    Subjects of some of the documents:

    • Coverage plots of observations for reanalysis, 1948 - on
    • Information about the seven main types of observations for reanalysis, and about the component sets of data
    • Data in various countries (includes data not at NCAR), 1,600 pages
    • Observing projects (GATE, FGGE, etc.), 350 pages
    • Selected science subjects, 815 pages
    • Papers about satellite data, ~1,200 pages
    • Data systems and data strategy, ~900 pages
    • Energy issues, ~700 pages
    • Papers about computing hardware and history, ~1,700 pages

  14. Work on the DSS web server

    Work started to extend the capabilities of the DSS web server in November 1999. More metadata was added, the layout was improved, and various new data inventories were added. One good milestone of improvements was reached in July 2000 and another in February 2001. Then this work had to slow down because other projects were being delayed too much.

    In 2003, the DSS data and information server was upgraded with new hardware, online data, and documentation. This affords faster service, more easily accessible data, and better metadata information for the users. This is an ongoing effort that is monitored with statistical summaries of data movement to the network and individual dataset usage.

  15. Give access to much data; Help many data users

    Much of our time in DSS is needed to build the archive and extend it. We are also an operational unit to work with many users and provide much data and consulting help. We will present some statistics about our delivery of data.

    DSS supported about 880 main users each year during 1997 - 2001

    The following table gives the number of data requests for data from DSS. It counts the main requests for data, not the much higher number of web server accesses where people can take data and information. The column for CD-ROMs shows the number of requests and the total number of CD-ROMs sent. One CD-ROM order is often for 10 CDs or more. For data use on NCAR computers, we count the number of unique users each year, not each time that data is used.

    Year

    Requests: Data on tape, etc.

    Requests for CD-ROMs (CDs sent)

    Unique users on NCAR computers

    Total requests

    1995 376 11 (11?) 391 778
    1996 347 ? (35) 399 781
    1997 329 146 (1150) 414 889
    1998 331 164 (1893) 383 878
    1999 308 141 (2401) 429 878
    2000 367 124 (2524) 422 913
    2001 436 47 (1170) 400 883
    2002 250 172 (1612) 388 810

    Sent 10,750 CD-ROMs in six years

    A CD-ROM has been made with data for each year of reanalysis (1950 - 2002, 53 years). We did not do 1948 - 1949. During 1997 through 2002, NCAR distributed 10,750 copies of these CD-ROMs (which held 7,095 GB of data). These have gone to 47 countries (48% to U.S., 9.7% to Japan, 5.0% to Canada, 2.5% to the U.K., 2.2% to India, etc.). Nearly all of these CD-ROMs have data from the NCEP/NCAR 50-year reanalysis project (now 55.7 years of data).

    The total amount of DSS data used each year

    The total use of DSS data has increased from 1.23 TB in 1990, to 6.2 TB in 1999, and to 23.6 TB in 2002 (Figure 1 and Appendix 1). During 1997 - 2002, about 3.3 to 4.0 TB have been sent away from NCAR. The data sent away often goes to university departments and to other archives. This data is probably used at least 2 or 3 times.

    The access to our DSS data has grown about 20 times since 1990. Fortunately, the technology that we can use to deliver data has also improved a lot.

    How much DSS data is used by NCAR users?

    These users mainly use data directly from the mass store, onto NCAR computers. About 60% of the use of DSS data on the NCAR computers was by universities and 40% by NCAR users. This means that NCAR people used about 1.3 TB of DSS data in 1994 and 8 TB in 2000 (see Appendix 1). The NCAR use is actually greater than this because some groups move portions of the DSS archives to their own storage area, and then use that data.

  16. Over half of data used has been from reanalysis projects

    The output from the reanalysis projects has been heavily used, and has been a big benefit for world research. Users obtained 6.5 TB of reanalysis data from NCAR Data support (DSS) in 1997 which increased to 10.8 TB in 1999, and to 13.0 TB in 2002. About 55 to 67% of the total data used was from reanalysis done at NCEP (Table 3 and Figure 2).

    About 60% of the recent data use is output from the NCEP/NCAR 50-year reanalysis project (Table 3). This was a huge project by NCEP and by the Data Support (DSS) group at NCAR. It was recognized for a special award by the American Meteorological Society in Jan 2000. The work on this project started Feb 1991, analysis production started at NCEP in Jun 1994; 23 years were completed in Sep 1996 and 50 years (1948 - 97) were done in July 1998, and since then the analysis is updated each month to provide a consistent up-to-date analysis to use for seasonal forecasts and for many other purposes.

    The reanalysis output (at each six hours for 55 years) has been very useful to research

    Large amounts of reanalysis data are being used, about 10 TBytes per year during 1999 - 2002. Most of this data is from the NCEP/NCAR project (also called Reanalysis 1). The NCEP-2 project started production in 05/1998 and now we have data from it for 1979 - 2002. During 2001, there were 162 GBytes of data used (Table 3) that were from the NCEP-2 reanalysis. The use of ECMWF reanalysis data is not included in this table. We note that data from the ERA-40 project (1957 - on) is not yet available (written Oct 2003), but is coming soon.

    Year

    Custom orders (GB)

    CD-ROM (GB)

    NCAR compute (GB)

    Total reanal. used (GB)

    Total user data (GB)

    Reanal. % of all data used

    1995 573 0 415 988 6195 16
    1996 1192 0 2679 3871 7224 54
    1997 1820 759 3954 6533 11787 55
    1998 1940 1249 3872 7061 11653 61
    1999 1611 1585 7588 10784 16590 65
    2000 1389 1666 5450 8505 14420 59
    2001 1027 772 8693 10492 15741 67
    2002 1466 1064 10464 12994 23592 55

    Note: In 2001, there were 8531 GB used on NCAR computers from Reanalysis 1, and a little from Reanalysis 2 (total 8693 GB). The CD-ROMs have data from Reanalysis 1. Reanalysis data used on NCAR computers in 2002: Reanalysis 1 used 10,009 GB plus Reanalysis 2 used 455 GB (total 10464 GB).

  17. Reanalysis data used

  18. Use of data from the Data Support web server

    Data Support has had a web server since about 1992. The software on it was upgraded during 1999 - 2000, and we were able to provide more detailed statistics about the use of the server starting 08/2000.

    Columns 3-5 of Table 4 refer to user hits on either dataset information or on datasets parked on the server. A total of 395.8 GBytes of data has gone out by this mode during Jan - Sep 2002. We have 254 datasets parked on the server; these have a volume of 70.2 GBytes (and have 12,868 files). Each time a user obtains a file, a hit is counted. During 2002, a total of 340.6 GBytes (column 7) of these special datasets have been taken from the server by users shown in column 6. And the most popular dataset was monthly means from the NCEP/NCAR Reanalysis (they took 79.2 GB of this in 2002, and they took 70.4 GB of Etopo-2 data (2 minute - 4 km - elevation and depth data for the world).

    Data and information taken from the DSS web server

    The total hits and data taken are given by columns 3-5. A special set of 254 datasets is on this server; columns 6 and 7 refer to web activity to take them.

    (1)
    Period

    (2)
    Months

    (3)
    Total hits

    (4)
    Number of unique users

    (5)
    Total gigabytes served

    (6)
    Number of unique users down-
    loading data

    (7)
    Gigabytes served from dataset/ data directories

    08/2000-12/2000 5 493498 10054 37.40 1293 19.83
    01/2001-12/2001 12 1105266 31144 198.26 5882 150.49
    01/2002-09/2002 9 1290905 54176 395.84 15950 340.64


  19. External linkages for Data Support

    We have had a number of links with other organizations over the years. These have made it possible to obtain much data and to combine forces to carry out huge projects such as reanalysis.

    • Work closely with NCEP during 1991 - 2002 to make it possible to accomplish the NCEP/NCAR global reanalysis of data for 1948 - 2002 (now 55 years). - Also we helped NCEP with observations for the Regional Reanalysis for North America for years 1979 - 2003. Resolution 32 km. Production was during June - Sep 2003.

    • Work with ECMWF in Europe (provide observations) during about 1996 - 2002 so that they could do a long reanalysis for about 06/1957 - 2002 (45 years). They obtain much of the NCAR data for observations via NCEP.

    • Lead a big data exchange with Russia during 1982 - 2003. This has been very fruitful. Each year we have a meeting and write an updated plan. The work involves six US agencies. A meeting was in Sep 2001 in Russia, and in Nov 2002 we prepared another document.

    • There are many smaller, but important, linkages such as with China, with cloud-climate data (NASA-GISS, NYC), with satellite data archiving (NOAA), with helping solve NASA data strategy issues, etc.

  20. Dataset development in Data Support, 1965 - 2003 (history).

    Data Support has now completed 39 years of work on data acquisition and development. We also have participated in selected science projects, and given data support for a wide variety of research projects both in the US and foreign areas. Research in a given science area will often blossom if we can just get the needed data and organize it so that it is easy to use. During 1990 - 2003, we have worked closely with the reanalysis projects. It was a huge project to develop and provide the necessary global observations so that analyses of the global atmosphere could be made each 6-hours for 1948 - 2003, now 56 years. This was done for the NCEP/NCAR Reanalysis project.

    A summarized history of our data and science work will be in another document. In early years, we opened up daily analyses for N. Hemisphere weather patterns, and research blossomed. We prepared a tape with monthly surface observations for the world…and temperature trend results soon followed. We joined a project to prepare a climatology of the S. Hemisphere, done during 1967 - 71, and the grid data plus publications followed. By working with three universities, we prepared some of the earliest motion films to show the changes in daily weather patterns and the annual climate cycle, done during 1970 - 72. In 1981 we joined with NOAA to prepare the best set of observations for the world's surface ocean for 1854 - on. The data are mostly from ships and buoys. The first version was available in 1983, and much research was then possible. Old observations and new updates are still being added.

    In this way, the data archives and resulting research gradually expanded with each step building on previous steps. The computer technology for calculations and data storage has improved a lot. By making a practical application of the technology, the capability for data archives and data delivery has increased by leaps and bounds. Also the working relationships with weather forecast units (e.g., NCEP) and with other archives (NCDC, Asheville; USAF; Russia; etc.) has helped them and helped us. Please see the brief history for more information.

  21. Project to prepare backups for part of the NCAR archives (Status in Sep 2003)

    It has been observed that the future people working in data archives will not understand enough about the basic datasets to make certain that the important data does survive for 30 to 50 years. Therefore, the data often needs better packaging, and more summary information. We have a project to place the most important 8 to 12% of the DSS archives (about 2 TB) onto backup tapes. In most cases these data are not well covered by other archives in the US, and/or the data access would be difficult or costly.

    The "data backup project" started about 05/1999. The task to define the work was perhaps 20% done by Oct 2000. The data backup project has been very active during the past year. The main purpose is to move copies of our most important datasets into off-site archives. There are several other goals: (1) increase the staff knowledge about each dataset, (2) define many of the "most important" datasets, (3) be certain that staff can locate each dataset, (4) add some more important datasets, and (5) update some of these important datasets.

    By Sep 2002 we had completed backing up 1090 GB of data (Table 5). These were on about 28 DLT 8000 tapes (capacity 40 GB each, uncompressed). In Sep 2002 we started the conversion to SDLT technology (one tape holds 110 GB, native). We purchased two Super DLT drives and an automatic tape loader that holds 26 tapes. By Aug 2003 we had prepared a total of 1730 GB onto 18 SDLT tapes (Table 5). In Sep 2003, this became 19 tapes because we reduced the block size of some large records that were difficult for some users.

    Date

    Total data on special backups

    Total DLT 4000 tapes done

    One tape holds, native

    Nov 2000 0.5 GB None --
    Apr 2001 38.3 GB 1 tape, DLT 8000 40 GB
    May 21, 2001 443 GB 12 tapes, DLT 8000 40 GB
    Sep 26, 2002 1090 GB 28 tapes, DLT 8000 40 GB
    Aug 22, 2003 1730 GB 18 tapes, SDLT 220 110 GB

    Important data: We recognize that the definition of the most important datasets of analyses will gradually change with time. But key parts of older analyses should be saved indefinitely so that comparisons with other analyses can be made, and for historical reasons. The original basic datasets of observations should be saved forever, even if they are merged a dozen times into other datasets. Some of the merges may turn out to be wrong. But our original cleaned up files will still be good.

  22. Methods for the bulk data transfer of large datasets

    To help users obtain data and to move data backups, it is useful to have better options to send data in bulk, on tapes. We have been developing better methods for several years. In 1986 the best tape was a 200 MB tape cartridge by IBM. It required about 5500 tapes to hold one terabyte (Table 6). In June 2002 a new DLT technology was introduced where one small DLT tape cartridge (size 10.5 x 10.5 cm) could hold 160 GB; therefore only 7 tapes could hold a terabyte. In several years, DLT engineers expect that one tape will hold one TB.

    Tape technology for bulk data transfer

    The tape capacity is the native capacity (without compression). Random binary numbers usually do not compress very much, but the tape drives compress some data sets a lot. We can't fill a tape 100% full, so for the table we assume about 90% tape use and no compression.

    Date

    Technology

    One tape holds

    Data rate MBs

    Tapes per terabyte
    (90%)

    1986 -- 200 MB 3 5550
    04/1995 DLT 4000 20 GB -- 55
    1998 DLT 7000 35 GB 5 32
    1999 DLT 8000 40 GB 6 28
    ~09/2000 Super DLT 110 GB 11 10
    06/2002 SDLT-2 160 GB 16 7

    Note: A new technology that uses 200 GB (native) LTO tapes was introduced about Feb 2003. It is also a good technology. A previous 100-GB, LTO-1 technology existed.

  23. A Possible use of bulk data transfer methods

    The NCEP/NCAR reanalysis for the global atmosphere, each 6 hours, now has data for 55 years (1948 - 2002). The total archive is about 2900 GB. The 5 most used component datasets have a volume of 800 GB. An update Oct 2003: This reanalysis is now through 09/2003 (55.8 years). The data from the NCEP-2 global reanalysis for 23 years (1979 - 2002) has a volume of 212 GB for the three most used datasets. NCEP started fast production on the regional reanalysis for N. America in June 2003. The analyses each 3 hours for 25 years (1979 - 2003) was mostly completed in Sep 2003 and will have a volume of about 12 TB.

    • We note that the 12 TB would fit onto 336 DLT 8000 tapes that each hold nearly 40 GB.

    • Or the 12 TB would fit onto 120 Super DLT tapes that each hold nearly 110 GB. NCAR will receive this data on STK tapes that each hold 200 GB native, thus about 65 tapes.

    • The dimensions of a DLT cartridge are 10.5 cm by 10.5 cm by 2.5 cm. Thus one cartridge has a volume of 276 cm3. - So 120 cartridges (with 12 TB) would fit into a cube that is only a little larger than 33cm (13 inches) on a side.

    • By 2006 one tape will probably hold 500 GB, so the 12 TB would fit onto only about 25 cartridges. Wow!

    These examples show that large volumes of data can be moved on a rather small number of tapes.

  24. How to conduct data projects to help science (information about problems and strategy)

    The design of data systems for good data preparation, archiving, and data services can be difficult. We have developed more information about data problems and data strategy. Our goal has been to use methods that give good data archives, and good data access at a reasonable cost. To do this, one needs to control complexity and have a good strategy.

    There has been a lot of trouble in various government and business data systems. Not enough attention has been paid to using methods that achieve success, that keep the focus on real data tasks, and on limiting costs.

    Paul Strassmann writes "Consultants, IT vendors, and most government economists have long assumed that the more IT you have, the lower the transaction costs. - As I've said before, those claims are myths." (Computerworld, Apr 1, 2002)

    The data problems go beyond costs. There have been too many data system attempts that take too long and do not even solve the main problems. And unfortunately, they displace other methods that have a better track record.

    When agencies or institutions are trying to achieve good info tech results, or avoid problems, or cope with existing data system problems, it would help if there were a better collection of good and bad examples about info tech applications.

    Books have been written about trouble in data systems. Many U.S. federal projects have had problems. And commercial systems have not escaped. I have gathered together part of this material, so that readers can get a brief sense of the history, and can start learning more about how to avoid problems. Document RJ0052, 69 pages (online) has part of this information.

Total amount of DSS data that is used each year

A lot of DSS data is used on the NCAR computers (Table 2), and about 50 to70% of that use is by non-NCAR users, mostly at universities. The custom orders for data go to non-DSS users and nearly all of the CD-ROMs are sent to US and foreign locations, not NCAR.

This table shows the total volume of DSS data that is being used during 1990 - 2002. Table 2 and Figure 3 show whether this data is being sent by tapes and ftp, sent on CD-ROMs, or used on the main computers at NCAR. During 1997 - 2001, about 3 to 4 TBytes has been sent each year by tapes, ftp or CD-ROM. The total data use during this period has been about 12 to 16 TBytes each year. But this jumped to 20.4 TB in 2002.

The use of this data is really more than 16 TB per year. The data sent by tapes or ftp often goes to groups of users, and it also feeds several other archives that distribute data, both in the US and in other countries. Many other users obtain the data from these other archives. On an average, the data is probably used at least 2 or 3 times after it is obtained by other users or archives.

TOTAL DATA VOLUME FOR USERS, 1990 - 2001 (GB) NCAR Data Support sends data on tapes and by net ftp for special orders. In addition, much data has been sent on CD-ROMs at a price of about $10 per disc. Also, many users from NCAR and the universities run programs on NCAR computers and use DSS data directly from the mass store.

Year

Custom orders (GB)

CD-ROM (GB)

NCAR compute (GB)

Compute GB, % university

Total user data (GB)

1990 e130 3 1100 52 1233
1991 144 10 2480 48 2634
1992 135 13 2638 59 2786
1993 777 11 3626 69 4414
1994 1078 15 4397 70 5490
1995 1525 4 4666 78 6195
1996 1571 32 5621 74 7224
1997 2569 799 8419 69 11787
1998 2521 1260 7872 60 11653
1999 2498 1587 12505 49 16590
2000 1908 1668 10844 74 14420
2001 1970 774 12997 61 15741
2002 2085 1064 20443 57 23592

Usage volume of DSS data

Usage volume of DSS data, by type

Note: These data are all for full calendar years. Some previous tables had a mix of calendar years, fiscal years, and partial years. Table prepared May 2003.

The total volume of DSS scientific data that was used has increased from 1.23 TB in 1990 to 6.20 TB in 1995 and to 15.7 TB in 2001, as shown in the table and figure above.In 2002 the total use jumped to 23.6 TB. The figure gives the data volume in GB that was sent by tapes, CD-ROMs, and by direct mass store access at NCAR. The numbers on the bar graphs give the GB for each data component and the total.