Research data support and services
The Data Support Section (DSS) maintains a
large, organized archive of computer-accessible
research data that is made available to
scientists around the world. The archive represents
an irreplaceable store of observed data and analyses
and is used for major national and international
atmospheric and oceanic research projects. DSS has much
data, processing capability, and services that are not
offered by other groups. The DSS group started working
in 1965 and has been working on large projects and building
the data archives ever since.
There are now over 550 distinct datasets in the
archive, ranging in size from less than 1 MB to over
1 TB. The total volume of data in the DSS archive was
2.4 terabytes in August 1990 and 22.5 terabytes in September 2003.
Data stored for Data Support,
and total mass store
| |
Data Support Section
|
|
Total NCAR Mass Store
|
Volume
|
|
Date
|
Bit files
|
Volume
|
|
Bit files
|
Volume
|
DSS/MSS
|
|
13 Aug 1990
|
61,335
|
2.437 TB
|
|
---
|
14.430 TB
|
16.9%
|
|
3 Aug 1992
|
80,538
|
3.085 TB
|
|
1,060,000
|
27.270 TB
|
11.3%
|
|
15 Sep 1994
|
119,703
|
4.751 TB
|
|
1,849,466
|
47.423 TB
|
10.0%
|
|
28 Aug 1996
|
143,340
|
6.770 TB
|
|
2,888,639
|
78.964 TB
|
8.6%
|
|
17 Oct 1997
|
159,945
|
8.482 TB
|
|
4,046,678
|
110.359 TB
|
7.69%
|
|
2 Sep 1998
|
167,073
|
10.032 TB
|
|
5,038,611
|
147.439 TB
|
6.80%
|
|
7 Sep 1999
|
185,608
|
11.942 TB
|
|
6,737,448
|
206.885 TB
|
5.77%
|
|
25 Aug 2000
|
192,404
|
13.875 TB
|
|
8,187,688
|
267.796 TB
|
5.18%
|
|
6 Sep 2001
|
210,224
|
17.475 TB
|
|
10,781,364
|
370.706 TB
|
4.71%
|
|
1 Nov 2002
|
225,257
|
20.543 TB
|
|
14,996,451
|
568.519 TB
|
3.61%
|
|
18 Sep 2003
|
236,881
|
22.453 TB
|
|
20,167,740
|
869.241 TB
|
2.58%
|
Note: The total data on the mass store
passed 300 TB on January 2, 2001. Over the next 8.1 months, it
added another 70.7 TB.
The DSS staff provides assistance and expertise in using
the DSS archive, and they help researchers locate data appropriate
to their needs. Users may obtain copies of data by network access,
on various tape media, or they may use DSS data directly from the
NCAR mass store. DSS staff also assist scientists by providing
data access programs (to read and unpack data), other software
for data manipulation, and dataset documentation.
DSS has 10 staff members. There are 8.2 technical
staff that work on projects in meteorology and oceanography,
plus 1 FTE for administrative work. Also, 0.8 tech staff works
on the very high atmosphere (CEDAR).
A summary of main accomplishments by Data Support during
FY2003
-
Data development work to upgrade reanalysis observations
and to prepare for climate variability and other research
has continued. There are now seven major categories of
observations, each with many component datasets. The
NCEP/NCAR 50-year reanalysis (1948 - on) used Version 1
of the observations. Version 3 of the observations was
completed in June 2003.
-
DSS has always and continues to update numerous research
data products on a regular basis. Data transfers using
magnetic tape are becoming less frequent and scheduled
network transfers have replaced them. The advantage is
that network transfers can be highly automated because
new MSS files and online files are quickly becoming
available for users. The disadvantage is network transfers
need to be carefully monitored to ensure dataset integrity.
DSS continues to work on monitoring systems that include
receipt reports and data delivery reconciliation.
-
Read old tapes: More than 2,500 7-track and 9-track
tape reels were processed to form MSS files. This effort
rescued many old source data collections and removed the
need for SCD to maintain significant reel tape capability.
DSS maintains small-scale reel tape capability for future
projects.
-
The DSS data and information server was upgraded with
new hardware, online data, and documentation. This affords
faster service, more easily accessible data, and better
metadata information for the users. This is an ongoing
effort that is monitored with statistical summaries of
data movement to the network and individual dataset usage.
-
Software was written to read existing DSS metadata,
built up since the middle 1980s and to write new metadata
in a standard formate used by the CDP. This makes the DSS
archives uniformly discoverable along with other data
UCAR-wide.
-
The Document Project: Historically, DSS has written
considerable hardcopy dataset documentation and collected
many data reports. A project that will preserve these
metadata is using scanning technology to create digital
page images. Roy Jenne has been gathering many smaller
documents into bundles of papers and writing more. The
library now holds 305 documents and more than 18K image
pages. This effort is ongoing. More scanning needs to
be done, and overview guides that will aid the users
have been written.
-
Data services have been provided to over 1,500
unique users during the past year. About 820 of the
users take data service directly from the MSS,
obtain data prepared upon request, or receive data
via CD-ROMs, ftp, and tapes. The remaining users
(about 700) are identified by unique IP address
that access the online services and explicitly
download files containing research data of
substantial size. There are many more web hits
for metadata, documents, etc.
-
Involvement in reanalysis with both preparation of
observations and product distribution stands as a key
activity for DSS. The Version 3 global surface and
upper-air observations for 1957 - 78, plus VTPR and
TOVS (1972 - on) satellite data were provided to
ECMWF for the ERA-40 project. Some Version 3 surface
hourly data was also specially prepared during Mar - Aug
2002 for NCEP Regional Reanalysis (NRR), which is a
25-year reanalysis (1979 - on) covering North and Central
America at high resolution. The world 3-hour surface data
(7,500 stations) was given better location data by June
2003. Outputs from these reanalyses are downloaded
frequently and provide users with one of the very best
datasets for studying long-term climate variability
and trends.
-
The DSS marine data collaborative project with Russia
(NSF funded) has proceeded on schedule. Marine surface
observations from about 150 Russian research vessel
cruises have been digitized, delivered to DSS, and
verified by format translation into the I-COADS
standard format.
-
This year progress on I-COADS has been focused on
data source development and a new underlying ASCII
format. Through these efforts, I-COADS will be more
complete, and accessing and incorporating international
contributions will be easier. Next year, I-COADS will
be extended to year 2002 and will be available in the
new ASCII format.
-
The widely used World Ocean Atlas and World Ocean
Database 2001 have been added to the archive along with
other key data resources, e.g. 1999 - 2003 NCEP analysis
and QSCAT blended dataset, the SST analyses from NCEP
and the UK Hadley Centre, and the MICOM model output.
These collections will be extended, and new ones will
be added next year.
Information about our main projects
-
More data development work for
reanalysis and climate research
Version 3 of all of the reanalysis observations for
1948 - on (now 55.8 years) was completed June 2003.
This completes a major goal, but it does not complete
enough work to satisfy all of the present research goals.
For example, we do not have enough surface observations
for South Africa for years before 1967. During 2000 - 2003,
people in the U.S. have been planning for a new long
reanalysis. It will need all of the present observations
plus some upgrades as noted. One part of the new national
plans is to also do some reanalysis work for 1900 - 1947.
There are projects in NOAA and NCAR to increase the
amount of digital data for those early years. The new
work for the COADS marine dataset will also help meet
these needs. Several groups (including NCAR) have been
gathering information about what data could be prepared.
-
The seven sets of observations for
reanalysis and climate research
We have tried to assemble all the observations
we could for 1946 - on, and we have some data for
earlier years. Most of the data are for 1948-2003.
These datasets provide a significant benefit to
the research community. The main types of
observations for reanalysis include:
- Rawinsondes (balloons for temp, RH, wind aloft)
- Pibals (wind aloft)
- Aircraft reports (wind, temperature)
- Satellite cloud winds
- Satellite soundings (temp aloft and trace gases).
The first of these data started in April 1969
- Sfc 3-hr synop (7500 stns, temp, pressure,
wind, weather, etc.)
- COADS ocean sfc (ships, buoys, temp, wind,
pressure, SST, etc.)
-
The Reanalysis paper of March 1996 is the most
cited paper in geosciences. Some good news about the
NCEP/NCAR Reanalysis:
Eugenia Kalnay was the lead author of a long paper:
"The NCEP/NCAR 40-Year Reanalysis Project," that was
published in the March 1996 Bull. AMS. In July 2003,
Eugenia was informed that this paper was the most
highly cited paper of all of the papers published
in the geosciences in the past decade, with 1,795
citations. In July 2003 the total citations had
increased to 2,064. The reanalysis has been a very
useful project.
The reanalysis project officially started in
February 1991. Production of analyses was during
June 1994 - July 1998 to do the first 50 years.
Production was done on a 1.0-Gflop (real) computer,
that could do 30 days of global analyses each day
(resolution: T62, 208 km, 28 levels).
NCEP updates this reanalysis each month. As of
October 2003, global data is available for each six
hours during 1948 - September 2003 (55.7 years).
Since 1998, our NCAR crew has been adding more
observations, reducing error tolerances, and
refining the good product created during the eight
years when the time pressures were very severe.
Now we have many of the world's meteorological
observations taken during 1948 - on. The progress
is also good on documents and backups, but not done.
-
Obtain ERA-40 reanalysis data from ECMWF
The previous reanalysis project at ECMWF was
ERA-15. It used T106 resolution (125 km). From
June 1994 to September 1996, ECMWF produced the
ERA-15 reanalysis for 1979 - 1993 (15 years).
The total volume from ERA-15 in the NCAR archives
is about 418 GB.
ECMWF started planning for a long reanalysis
(ERA-40) in about 1996. NCAR provided nearly all
of the observations for the early years (1957 - 78)
and part of the observations for later years. NCEP
converted the observations to Bufr and passed them
to ECMWF, where they were used to analyze observations
for the years 1957 - 2002 (45 years). They are using
three separate parallel datastreams in production.
A problem in surface observations for early years
was discovered and fixed in June - July 2001.
Production for the years 1957 - 1978 restarted about
September 2001. Production was completed in April 2003.
The resolution was T159 (83 km) and 60 levels.
The data flow to NCAR
NCAR expects to obtain about 15 to 35 TB from ERA-40.
The data may start to arrive about November 2003.
-
Mesoscale model data
NCAR has archives of mesoscale model data from
NCEP with data from October 1971 - on. The early
data had a resolution of 190 km, which is not a
very good resolution by 2003 standards. For
example, the global NCEP/NCAR reanalysis
(production started in June 1994) had a
resolution of 28 levels and T62 (208 km).
This global resolution is almost as good
as the mesoscale model of 1971.
NCAR has been the model data center
for mesoscale model data since 1994 and has
received some NOAA funds to pay for part of
this work. NCAR receives data from three main
mesoscale models, (1) the Eta model data of
NCEP, (2) the Maps model from NOAA FSL in
Boulder, and (3) the Gem model in Canada.
The archives from Eta start in May 1995,
from Maps starts August 1996, and data from
Canada starts April 1997. The models have been
run at a resolution of about 32 km and
interpolated to a NWS grid with resolution
40 km. The table below shows the amount of
mesoscale data at NCAR, and the amount sent
to users:
Mesoscale model data at NCAR
We show the total data in the mesoscale model
archive for North America on four successive
years. The cumulative data sent to users is
also given. These data have a resolution of
30 to 40 km.
Date
|
Data in archive (GB)
|
Cumulative data sent to users
|
|
Jan 2000
|
510
|
450 GB
|
|
Feb 20, 2001
|
737
|
748 GB
|
|
Apr 30, 2002
|
1,049
|
2,860 GB
|
| The numbers are from Chi-Fan
Shih, NCAR. |
-
The NCEP Regional Reanalysis (NRR) of North America
This NCEP mesoscale reanalysis of North America has
been under development for several years. It will
analyze this region at a resolution of 32 km for
1979 - 2003 (25 years). In June 2003, a fast operational
phase started with three or four parallel datastreams.
This was mostly completed in September 2003. About
12 TB will come to NCAR from this project, and the
first data may arrive at NCAR about November 2003.
It will arrive on STK tapes that each hold 200 GB
native.
-
Progress in preparing all of the world observations for
reanalysis and climate research
Version 1 of all of the observations was used for the
NCEP/NCAR 50-year reanalysis. Version 1 of observations
for the last block of years (1948 - 1957) was ready by
March 1998. There was still much work that could be done
to add more data and decrease the errors in station
locations. Summary of progress:
- Start on Version 1 of observations in February
1991 and finish March 1998.
- Version 2 ready and all sent to ECMWF by
February 1999.
- Version 3 was ready by June 2003. Most of
Version 3 was ready in time for use in the
ECMWF ERA-40 reanalysis. The years 1957 - 1978
are completely Version 3 in ERA-40. There had
been some trouble and they got observations
for all these years again in July 2001.
Production for years 1957 - 1978 restarted
September 2001.
- At NCAR, we have had to slow down the
development work, but some datasets for
post-Version 3 are being prepared.
-
Project to analyze a huge New York
snowstorm in December 1947.
The director of NCEP has a project to reanalyze
a big snowstorm in New York City in December 1947.
In March 2003, NCEP asked if DSS could help them
get more observations to analyze December 1947
and make forecasts. They tried an analysis and
forecast with the observations that we gave them
earlier, but it did not work. They did not have
all of three datasets from us: US raobs for
1946 - 1947 and two sets of pibals (wind aloft)
for 1918 - 1947. We also obtained Canadian
surface observations for 1947 - 1973 in
January 2003, processed it, and passed it to
NCEP about April 2003. NCDC Asheville sent
us a new set of U.S. hourly surface observations
for 137 U.S. stations for 1928 - 1948, received
in April 2003. We have done a lot of quality
work on it and will pass it to NCEP about
November 2003. About July 2003, NCEP used the
new supply of data to make analyses and
forecasts for December 1947. They were very
happy to get good forecasts 24 to 36 hours
in advance. They will report on this at a
special AMS symposiums about January 2004.
-
Work with NCDC, Asheville (NOAA) on hourly
U.S. observations, 1928 - 1948.
NCDC has key-entered this old U.S. data from
forms for 127 cities. We have done a lot of
quality control work on it and fixed problems.
-
Progress on the I-COADS dataset of surface
marine observations
NOAA/NCDC and NOAA/CDC have been cooperating
with NCAR since the early 1980s on the U.S. COADS
project. Since the beginning, COADS has benefited
from cooperation worldwide, and to reflect this
status has been renamed International Comprehensive
Ocean-Atmosphere Data Set (I-COADS). Over time,
I-COADS has been improved with additional
observations -- especially for periods with little
data -- through the discovery and corrections of
data problems and through enhancements to the
metadata. Currently, I-COADS is available for
1794 through 1997, and an overlapping extension
(1997-2002) is due for completion in late 2003.
International collaborative efforts, data exchanges,
and the U.S.-based Climate Data Modernization Project
(CDMP by NCDC) are all contributing previously
unavailable data to I-COADS. The Russian R/V
digitization project is now in its third year and
has produced 1.9 million observations from the global
ocean for 1937 to 2000. A new data exchange agreement
between the U.S. and China is nearly finalized and
will result in digitizing 500,000 - 750,000
observations for 1850-1868 from the German Maury
Logbooks. CDMP is contributing by providing digital
images of logbook pages and media suitable for
digitizing in China. These and other international
data sources that are available to the project will
be added to I-COADS when the appropriate time periods
are reprocessed.
A new format for marine surface records will
become an integral part of I-COADS in the coming years.
The International Marine Meteorological ASCII (IMMA)
has been under development by the I-COADS project for
several years and will receive acceptance by the WMO
in the future. This format is suitable for all marine
surface data; old historical records, modern keyed
ship logs, buoy, and GTS transmitted data. The official
archive copy of I-COADS and the forthcoming real-time
I-COADS (to be provided by NCDC) will be in the IMMA
format.
-
Status of oceanographic data at DSS
DSS has a broad and expanding selection of
oceanographic data. The more than 70 datasets
support climate research, model initialization
and verification, and development of the I-COADS,
which is a blend of many data archives covering
more than a century. All the I-COADS products
are available, but beyond those some of the newer
and most popular collections are:
- Blended global wind stress from QSCAT and NCEP
Analysis for1999-2003
- SST global analyses from NCEP and the Hadley
Centre (UK) for the 1800s - 2003
- Miami Isopycnic Coordinate Ocean Model
(MICOM) results
- World Ocean Atlas (WOA) and World Ocean Database
(WOD) from NODC, release 2001
All the oceanographic datasets are organized
into categories and described in a title list that
is easy to review and is available online at DSS home
web page. The short title list provides convenient
links to more detailed metadata, and in many cases
the data itself. For needs not served by this approach
we also offer all the data to users that have access
to the SCD MSS and can make it available to other users
by request to the DSS staff. These services are clearly
shown on the DSS information web pages.
-
Datasets for the high stratosphere 70 - 1000 km; CEDAR
The archives of CEDAR data at NCAR for the high
atmosphere include data from about 100 instruments.
Data for about five of the large incoherent scatter
radars are included. There are also new types of
data like lidars. This archive work at NCAR was
started in 1984. The data have been on a Sun server
at HAO, but will probably be moved to a new fast PC
server running Red Hat Linux. This is a computing
division (SCD) project with HAO. One DSS staff
spends 0.8 FTE of time on CEDAR work. One online
document, RJ0160, 45 pages, has more information
about these data.
-
The document project; More progress in preparing
documents online
The document project started in April 1999, and
the production of scanned documents started in March
2000. The goals of this project are to (1) gather
many documents that describe datasets, observing
programs, and related information. Many documents
were only 1 to 10 pages long and needed to be
assembled into larger "books." (2) Write more
documents, and (3) scan them for online use.
In October 2003 we had completed 305 documents
with 18,233 pages. The progress on scanning
documents has been as follows:
Date
|
Documents
|
Total pages
|
Comment
|
| 3/31/2000 |
-- |
206 |
|
| 10/5/2000 |
Up thru RJ0062 |
4,474 |
Oct 2000 |
| 6/28/2001 |
Up thru RJ0117 |
8,401 |
Jun 2001 |
| 7/26/2002 |
Up thru RJ0220 |
14,414 |
Jul 2002 |
| 3/06/2003 |
Up thru RJ0274 |
17,000 |
|
| 6/12/2003 |
Up thru RJ0287 |
17,521 |
Jun 2003 |
| 10/16/2003 |
Up thru RJ0305 |
18,233 |
|
These documents will be a good resource to
go along with the datasets and the projects.
There is still too much work to do, but many
key subject areas have been covered.
How to find the scanned documents
The documents are listed on the web at
Scanned
Documents. We have prepared a guide to the
documents (RJ0297, 72 p) where there is a little
more information about the texts, and where similar
subjects are grouped together.
Subjects of some of the documents:
- Coverage plots of observations for reanalysis,
1948 - on
- Information about the seven main types of
observations for reanalysis, and about the component
sets of data
- Data in various countries (includes data not
at NCAR), 1,600 pages
- Observing projects (GATE, FGGE, etc.),
350 pages
- Selected science subjects, 815 pages
- Papers about satellite data, ~1,200 pages
- Data systems and data strategy, ~900 pages
- Energy issues, ~700 pages
- Papers about computing hardware and history,
~1,700 pages
-
Work on the DSS web server
Work started to extend the capabilities of the
DSS web server in November 1999. More metadata was
added, the layout was improved, and various new data
inventories were added. One good milestone of
improvements was reached in July 2000 and another
in February 2001. Then this work had to slow down
because other projects were being delayed too much.
In 2003, the DSS data and information server was
upgraded with new hardware, online data, and
documentation. This affords faster service, more easily
accessible data, and better metadata information for
the users. This is an ongoing effort that is monitored
with statistical summaries of data movement to the
network and individual dataset usage.
-
Give access to much data; Help many data users
Much of our time in DSS is needed to build the
archive and extend it. We are also an operational
unit to work with many users and provide much data
and consulting help. We will present some statistics
about our delivery of data.
DSS supported about 880 main users each year during
1997 - 2001
The following table gives the number of data requests
for data from DSS. It counts the main requests for data,
not the much higher number of web server accesses where
people can take data and information. The column for
CD-ROMs shows the number of requests and the total
number of CD-ROMs sent. One CD-ROM order is often for
10 CDs or more. For data use on NCAR computers, we
count the number of unique users each year, not each
time that data is used.
Year
|
Requests: Data on tape, etc.
|
Requests for CD-ROMs (CDs sent)
|
Unique users on NCAR computers
|
Total requests
|
| 1995 |
376 |
11 (11?) |
391 |
778 |
| 1996 |
347 |
? (35) |
399 |
781 |
| 1997 |
329 |
146 (1150) |
414 |
889 |
| 1998 |
331 |
164 (1893) |
383 |
878 |
| 1999 |
308 |
141 (2401) |
429 |
878 |
| 2000 |
367 |
124 (2524) |
422 |
913 |
| 2001 |
436 |
47 (1170) |
400 |
883 |
| 2002 |
250 |
172 (1612) |
388 |
810 |
Sent 10,750 CD-ROMs in six years
A CD-ROM has been made with data for each year of
reanalysis (1950 - 2002, 53 years). We did not do
1948 - 1949. During 1997 through 2002, NCAR distributed
10,750 copies of these CD-ROMs (which held 7,095 GB of
data). These have gone to 47 countries (48% to U.S.,
9.7% to Japan, 5.0% to Canada, 2.5% to the U.K., 2.2%
to India, etc.). Nearly all of these CD-ROMs have data
from the NCEP/NCAR 50-year reanalysis project (now
55.7 years of data).
The total amount of DSS data used each year
The total use of DSS data has increased from 1.23 TB in 1990, to 6.2 TB
in 1999, and to 23.6 TB in 2002 (Figure 1 and Appendix 1). During 1997 -
2002, about 3.3 to 4.0 TB have been sent away from NCAR. The data sent away
often goes to university departments and to other archives. This data is
probably used at least 2 or 3 times.
The access to our DSS data has grown about 20 times since 1990. Fortunately,
the technology that we can use to deliver data has also improved a lot.
How much DSS data is used by NCAR users?
These users mainly use data directly from the mass store, onto NCAR computers.
About 60% of the use of DSS data on the NCAR computers was by universities
and 40% by NCAR users. This means that NCAR people used about 1.3 TB of
DSS data in 1994 and 8 TB in 2000 (see Appendix 1). The NCAR use is actually
greater than this because some groups move portions of the DSS archives
to their own storage area, and then use that data.
-
Over half of data used has been from reanalysis projects
The output from the reanalysis projects has been heavily used, and has
been a big benefit for world research. Users obtained 6.5 TB of reanalysis
data from NCAR Data support (DSS) in 1997 which increased to 10.8 TB in
1999, and to 13.0 TB in 2002. About 55 to 67% of the total data used was
from reanalysis done at NCEP (Table 3 and Figure 2).
About 60% of the recent data use is output from the NCEP/NCAR 50-year reanalysis
project (Table 3). This was a huge project by NCEP and by the Data Support
(DSS) group at NCAR. It was recognized for a special award by the American
Meteorological Society in Jan 2000. The work on this project started Feb
1991, analysis production started at NCEP in Jun 1994; 23 years were completed
in Sep 1996 and 50 years (1948 - 97) were done in July 1998, and since then
the analysis is updated each month to provide a consistent up-to-date analysis
to use for seasonal forecasts and for many other purposes.
The reanalysis output (at each six hours for 55 years) has been very
useful to research
Large amounts of reanalysis data are being used, about 10 TBytes per year
during 1999 - 2002. Most of this data is from the NCEP/NCAR project (also
called Reanalysis 1). The NCEP-2 project started production in 05/1998 and
now we have data from it for 1979 - 2002. During 2001, there were 162 GBytes
of data used (Table 3) that were from the NCEP-2 reanalysis. The use of
ECMWF reanalysis data is not included in this table. We note that data from
the ERA-40 project (1957 - on) is not yet available (written Oct 2003),
but is coming soon.
Year
|
Custom orders (GB)
|
CD-ROM (GB)
|
NCAR compute (GB)
|
Total reanal. used (GB)
|
Total user data (GB)
|
Reanal. % of all data used
|
|
1995 |
573 |
0 |
415 |
988 |
6195 |
16 |
|
1996 |
1192 |
0 |
2679 |
3871 |
7224 |
54 |
|
1997 |
1820 |
759 |
3954 |
6533 |
11787 |
55 |
|
1998 |
1940 |
1249 |
3872 |
7061 |
11653 |
61 |
|
1999 |
1611 |
1585 |
7588 |
10784 |
16590 |
65 |
|
2000 |
1389 |
1666 |
5450 |
8505 |
14420 |
59 |
|
2001 |
1027 |
772 |
8693 |
10492 |
15741 |
67 |
|
2002 |
1466 |
1064 |
10464 |
12994 |
23592 |
55 |
Note: In 2001, there were 8531 GB used on NCAR computers
from Reanalysis 1, and a little from Reanalysis 2 (total 8693 GB).
The CD-ROMs have data from Reanalysis 1. Reanalysis data used on
NCAR computers in 2002: Reanalysis 1 used 10,009 GB plus Reanalysis
2 used 455 GB (total 10464 GB).

-
Use of data from the Data Support web server
Data Support has had a web server since about 1992. The software on it
was upgraded during 1999 - 2000, and we were able to provide more detailed
statistics about the use of the server starting 08/2000.
Columns 3-5 of Table 4 refer to user hits on either dataset information
or on datasets parked on the server. A total of 395.8 GBytes of data has
gone out by this mode during Jan - Sep 2002. We have 254 datasets parked
on the server; these have a volume of 70.2 GBytes (and have 12,868 files).
Each time a user obtains a file, a hit is counted. During 2002, a total
of 340.6 GBytes (column 7) of these special datasets have been taken from
the server by users shown in column 6. And the most popular dataset was
monthly means from the NCEP/NCAR Reanalysis (they took 79.2 GB of this in
2002, and they took 70.4 GB of Etopo-2 data (2 minute - 4 km - elevation
and depth data for the world).
Data and information taken from the DSS web server
The total hits and data taken are given by columns 3-5. A special set of
254 datasets is on this server; columns 6 and 7 refer to web activity to
take them.
(1)
Period
|
(2)
Months
|
(3)
Total hits
|
(4)
Number of unique users
|
(5)
Total gigabytes served
|
(6)
Number of unique users down- loading data
|
(7)
Gigabytes served from dataset/ data directories
|
|
08/2000-12/2000 |
5 |
493498 |
10054 |
37.40 |
1293 |
19.83 |
|
01/2001-12/2001 |
12 |
1105266 |
31144 |
198.26 |
5882 |
150.49 |
|
01/2002-09/2002 |
9 |
1290905 |
54176 |
395.84 |
15950 |
340.64 |
-
External linkages for Data Support
We have had a number of links with other organizations over the years.
These have made it possible to obtain much data and to combine forces to
carry out huge projects such as reanalysis.
-
Work closely with NCEP during 1991 - 2002 to make it possible to accomplish
the NCEP/NCAR global reanalysis of data for 1948 - 2002 (now 55 years).
- Also we helped NCEP with observations for the Regional Reanalysis
for North America for years 1979 - 2003. Resolution 32 km. Production
was during June - Sep 2003.
-
Work with ECMWF in Europe (provide observations) during about 1996
- 2002 so that they could do a long reanalysis for about 06/1957 - 2002
(45 years). They obtain much of the NCAR data for observations via NCEP.
-
Lead a big data exchange with Russia during 1982 - 2003. This has been
very fruitful. Each year we have a meeting and write an updated plan.
The work involves six US agencies. A meeting was in Sep 2001 in Russia,
and in Nov 2002 we prepared another document.
-
There are many smaller, but important, linkages such as with China,
with cloud-climate data (NASA-GISS, NYC), with satellite data archiving
(NOAA), with helping solve NASA data strategy issues, etc.
-
Dataset development in Data Support, 1965 - 2003 (history).
Data Support has now completed 39 years of work on data acquisition and development. We also have participated in selected science projects, and given data support for a wide variety of research projects both in the US and foreign areas. Research in a given science area will often blossom if we can just get the needed data and organize it so that it is easy to use. During 1990 - 2003, we have worked closely with the reanalysis projects. It was a huge project to develop and provide the necessary global observations so that analyses of the global atmosphere could be made each 6-hours for 1948 - 2003, now 56 years. This was done for the NCEP/NCAR Reanalysis project.
A summarized history of our data and science work will be in another document. In early years, we opened up daily analyses for N. Hemisphere weather patterns, and research blossomed. We prepared a tape with monthly surface observations for the world…and temperature trend results soon followed. We joined a project to prepare a climatology of the S. Hemisphere, done during 1967 - 71, and the grid data plus publications followed. By working with three universities, we prepared some of the earliest motion films to show the changes in daily weather patterns and the annual climate cycle, done during 1970 - 72. In 1981 we joined with NOAA to prepare the best set of observations for the world's surface ocean for 1854 - on. The data are mostly from ships and buoys. The first version was available in 1983, and much research was then possible. Old observations and new updates are still being added.
In this way, the data archives and resulting research gradually expanded with each step building on previous steps. The computer technology for calculations and data storage has improved a lot. By making a practical application of the technology, the capability for data archives and data delivery has increased by leaps and bounds. Also the working relationships with weather forecast units (e.g., NCEP) and with other archives (NCDC, Asheville; USAF; Russia; etc.) has helped them and helped us. Please see the brief history for more information.
-
Project to prepare backups for part of the NCAR archives (Status in Sep
2003)
It has been observed that the future people working in data archives will
not understand enough about the basic datasets to make certain that the
important data does survive for 30 to 50 years. Therefore, the data often
needs better packaging, and more summary information. We have a project
to place the most important 8 to 12% of the DSS archives (about 2 TB) onto
backup tapes. In most cases these data are not well covered by other archives
in the US, and/or the data access would be difficult or costly.
The "data backup project" started about 05/1999. The task to define the
work was perhaps 20% done by Oct 2000. The data backup project has been
very active during the past year. The main purpose is to move copies of
our most important datasets into off-site archives. There are several other
goals: (1) increase the staff knowledge about each dataset, (2) define many
of the "most important" datasets, (3) be certain that staff can locate each
dataset, (4) add some more important datasets, and (5) update some of these
important datasets.
By Sep 2002 we had completed backing up 1090 GB of data (Table 5). These
were on about 28 DLT 8000 tapes (capacity 40 GB each, uncompressed). In
Sep 2002 we started the conversion to SDLT technology (one tape holds 110
GB, native). We purchased two Super DLT drives and an automatic tape loader
that holds 26 tapes. By Aug 2003 we had prepared a total of 1730 GB onto
18 SDLT tapes (Table 5). In Sep 2003, this became 19 tapes because we reduced
the block size of some large records that were difficult for some users.
Date
|
Total data on special backups
|
Total DLT 4000 tapes done
|
One tape holds, native
|
| Nov 2000 |
0.5 GB |
None |
-- |
| Apr 2001 |
38.3 GB |
1 tape, DLT 8000 |
40 GB |
| May 21, 2001 |
443 GB |
12 tapes, DLT 8000 |
40 GB |
| Sep 26, 2002 |
1090 GB |
28 tapes, DLT 8000 |
40 GB |
| Aug 22, 2003 |
1730 GB |
18 tapes, SDLT 220 |
110 GB |
Important data: We recognize that the definition of the most important
datasets of analyses will gradually change with time. But key parts of older
analyses should be saved indefinitely so that comparisons with other analyses
can be made, and for historical reasons. The original basic datasets of
observations should be saved forever, even if they are merged a dozen times
into other datasets. Some of the merges may turn out to be wrong. But our
original cleaned up files will still be good.
-
Methods for the bulk data transfer of large datasets
To help users obtain data and to move data backups, it is useful to have
better options to send data in bulk, on tapes. We have been developing better
methods for several years. In 1986 the best tape was a 200 MB tape cartridge
by IBM. It required about 5500 tapes to hold one terabyte (Table 6). In
June 2002 a new DLT technology was introduced where one small DLT tape cartridge
(size 10.5 x 10.5 cm) could hold 160 GB; therefore only 7 tapes could hold
a terabyte. In several years, DLT engineers expect that one tape will hold
one TB.
Tape technology for bulk data transfer
The tape capacity is the native capacity (without compression). Random binary
numbers usually do not compress very much, but the tape drives compress
some data sets a lot. We can't fill a tape 100% full, so for the table we
assume about 90% tape use and no compression.
Date
|
Technology
|
One tape holds
|
Data rate MBs
|
Tapes per terabyte (90%)
|
| 1986 |
-- |
200 MB |
3 |
5550 |
| 04/1995 |
DLT 4000 |
20 GB |
-- |
55 |
| 1998 |
DLT 7000 |
35 GB |
5 |
32 |
| 1999 |
DLT 8000 |
40 GB |
6 |
28 |
| ~09/2000 |
Super DLT |
110 GB |
11 |
10 |
| 06/2002 |
SDLT-2 |
160 GB |
16 |
7 |
Note: A new technology that uses 200 GB (native) LTO tapes was introduced
about Feb 2003. It is also a good technology. A previous 100-GB, LTO-1 technology
existed.
-
A Possible use of bulk data transfer methods
The NCEP/NCAR reanalysis for the global atmosphere, each 6 hours, now has
data for 55 years (1948 - 2002). The total archive is about 2900 GB. The
5 most used component datasets have a volume of 800 GB. An update Oct 2003:
This reanalysis is now through 09/2003 (55.8 years). The data from the NCEP-2
global reanalysis for 23 years (1979 - 2002) has a volume of 212 GB for
the three most used datasets. NCEP started fast production on the regional
reanalysis for N. America in June 2003. The analyses each 3 hours for 25
years (1979 - 2003) was mostly completed in Sep 2003 and will have a volume
of about 12 TB.
-
We note that the 12 TB would fit onto 336 DLT 8000 tapes that each
hold nearly 40 GB.
-
Or the 12 TB would fit onto 120 Super DLT tapes that each hold nearly
110 GB. NCAR will receive this data on STK tapes that each hold 200
GB native, thus about 65 tapes.
-
The dimensions of a DLT cartridge are 10.5 cm by 10.5 cm by 2.5 cm.
Thus one cartridge has a volume of 276 cm3. - So 120 cartridges (with
12 TB) would fit into a cube that is only a little larger than 33cm
(13 inches) on a side.
-
By 2006 one tape will probably hold 500 GB, so the 12 TB would fit
onto only about 25 cartridges. Wow!
These examples show that large volumes of data can be moved on a rather
small number of tapes.
-
How to conduct data projects to help science (information about problems
and strategy)
The design of data systems for good data preparation, archiving, and data
services can be difficult. We have developed more information about data
problems and data strategy. Our goal has been to use methods that give good
data archives, and good data access at a reasonable cost. To do this, one
needs to control complexity and have a good strategy.
There has been a lot of trouble in various government and business data
systems. Not enough attention has been paid to using methods that achieve
success, that keep the focus on real data tasks, and on limiting costs.
Paul Strassmann writes "Consultants, IT vendors, and most government economists
have long assumed that the more IT you have, the lower the transaction costs.
- As I've said before, those claims are myths." (Computerworld, Apr
1, 2002)
The data problems go beyond costs. There have been too many data system
attempts that take too long and do not even solve the main problems. And
unfortunately, they displace other methods that have a better track record.
When agencies or institutions are trying to achieve good info tech results,
or avoid problems, or cope with existing data system problems, it would
help if there were a better collection of good and bad examples about info
tech applications.
Books have been written about trouble in data systems. Many U.S. federal
projects have had problems. And commercial systems have not escaped. I have
gathered together part of this material, so that readers can get a brief
sense of the history, and can start learning more about how to avoid problems.
Document RJ0052, 69 pages (online) has part of this information.
Total amount of DSS data that is used each year
A lot of DSS data is used on the NCAR computers (Table 2), and about 50 to70% of that use is by non-NCAR users, mostly at universities. The custom orders for data go to non-DSS users and nearly all of the CD-ROMs are sent to US and foreign locations, not NCAR.
This table shows the total volume of DSS data that is being used during 1990 - 2002. Table 2 and Figure 3 show whether this data is being sent by tapes and ftp, sent on CD-ROMs, or used on the main computers at NCAR. During 1997 - 2001, about 3 to 4 TBytes has been sent each year by tapes, ftp or CD-ROM. The total data use during this period has been about 12 to 16 TBytes each year. But this jumped to 20.4 TB in 2002.
The use of this data is really more than 16 TB per year. The data sent by tapes or ftp often goes to groups of users, and it also feeds several other archives that distribute data, both in the US and in other countries. Many other users obtain the data from these other archives. On an average, the data is probably used at least 2 or 3 times after it is obtained by other users or archives.
TOTAL DATA VOLUME FOR USERS, 1990 - 2001 (GB) NCAR Data Support sends data on tapes and by net ftp for special orders. In addition, much data has been sent on CD-ROMs at a price of about $10 per disc. Also, many users from NCAR and the universities run programs on NCAR computers and use DSS data directly from the mass store.
Year
|
Custom orders (GB)
|
CD-ROM (GB)
|
NCAR compute (GB)
|
Compute GB, % university
|
Total user data (GB)
|
| 1990 |
e130 |
3 |
1100 |
52 |
1233 |
| 1991 |
144 |
10 |
2480 |
48 |
2634 |
| 1992 |
135 |
13 |
2638 |
59 |
2786 |
| 1993 |
777 |
11 |
3626 |
69 |
4414 |
| 1994 |
1078 |
15 |
4397 |
70 |
5490 |
| 1995 |
1525 |
4 |
4666 |
78 |
6195 |
| 1996 |
1571 |
32 |
5621 |
74 |
7224 |
| 1997 |
2569 |
799 |
8419 |
69 |
11787 |
| 1998 |
2521 |
1260 |
7872 |
60 |
11653 |
| 1999 |
2498 |
1587 |
12505 |
49 |
16590 |
| 2000 |
1908 |
1668 |
10844 |
74 |
14420 |
| 2001 |
1970 |
774 |
12997 |
61 |
15741 |
| 2002 |
2085 |
1064 |
20443 |
57 |
23592 |


Note: These data are all for full calendar years.
Some previous tables had a mix of calendar years, fiscal years,
and partial years. Table prepared May 2003.
The total volume of DSS scientific data that was used has
increased from 1.23 TB in 1990 to 6.20 TB in 1995 and to
15.7 TB in 2001, as shown in the table and figure above.In 2002
the total use jumped to 23.6 TB. The figure gives the data volume
in GB that was sent by tapes, CD-ROMs, and by direct mass store
access at NCAR. The numbers on the bar graphs give the GB for
each data component and the total.
|