Community Data Portal (CDP) plans

In the next year and beyond, the broad goals of the Community Data Portal project are to enhance the scope of the data services offered through the portal as well as augment the size and the diversity of the data holdings, fostering interoperability and integration with all data management and access systems currently maintained by different groups within UCAR, NCAR and UOP.

To this end, the CDP will continue collaboration with several projects and groups within and outside NCAR, in particular:

  • With DSS to offer high-level data services (aggregation, subsetting, and ultimately analysis) of highly requested datasets like ERA-40 and NCEP.

  • With EOL, both to enable access to existing observational datasets and to develop a new paradigm for data infrastructure and support during NSF-funded field campaigns (proposed joint development of a Virtual Operations Center for Field Experiments in Atmospheric Science).

  • With WMO (World Meteorological Organization) to establish NCAR as a primary U.S. DCPC (Data Collection or Production Center) within the WIS (WMO Information Systems) global network.

  • With CISM (Center for Integrated Space Weather Modeling) to enable public and restricted access through a single data portal to Space Physics and Space Weather datasets stored at participating universities and institutions with underlying data management and transfer provided by SRB (Storage Resource Broker).

  • With UC-Irvine to provide advanced server-side data processing capabilities backed up by netCDF Operators for datasets hosted on the CDP.

  • With the NCAR GIS initiative to serve additional data types (CCSM ocean, land, and ice datasets) through the NCAR GIS portal.

  • With all other data providers who express a desire to leverage the CDP infrastructure to access and share data with a restricted team of collaborators or with the community at large.

On the technical side, CDP staff will be involved in the following major development areas:

  • Sub-portals and branding: the creation of a general framework such that sub-portals for a specific division (e.g. EOL), scientific theme (e.g. Global Warming), or project (e.g. Mirage) can be easily created and maintained to expose a specific branding and functionality while fully leveraging the broad portfolio of the CDP data services.

  • Data publishing: definition, development and support for machine-negotiated API for ingesting data and metadata into the CDP system.

  • OAI: Further development of the OAI infrastructure for exchanging metadata with partner institutions: establish production-level services, possibly broaden the number of partners, feed metadata records to Google search engine.

  • Visualization: Start to explore the possibility of providing server-side visualization capabilities on CDP data holdings through NCL and PyNGL, either via the portal interface or via standalone web services clients.

The CDP and RDA datasets currently reside on a high-speed shared file system between the systems run over a storage area network (SAN). The storage space of 20 TB will be augmented over the next year to over 50 TB. The additional space will accommodate growth in the RDA, data from other providers in the community, and testbed space for CPG activities. Further details about the SAN development plan including the needs for data policies appear at Storage Area Network (SAN) deployment plans for CDP/ESG.

 

 

FY2005 Annual Report