The NSF NCAR Data Commons Initiative

Main content

Democratizing Data through FAIR Principles

NSF NCAR Data Commons Initiative

The Data Commons Initiative advances FAIR data access by establishing shared infrastructure, standardized formats, and metadata-rich services.

We are excited to announce that we are entering the final phases of integration, modernization, and rollout for NSF NCAR Data Commons. This strategic initiative is transforming how we manage, integrate, and share data across the organization to support AI/ML readiness, interdisciplinary collaboration, and long-term scientific discovery.

The Data Commons will unify the Research Data Archive (RDA), Climate Data Gateway (CDG), and Geoscience Data Exchange (GDEX) into a modernized, analysis-ready RDA. This merged system will provide a metadata-rich, standards-aligned foundation designed to make data more findable, accessible, interoperable, and reusable (FAIR). It supports everything from model development to AI training and enables the AI/ML roadmap by delivering consistent, high-quality inputs for next-generation science.

This work strengthens operational resilience, promotes efficiency, and lays the foundation for shared services, interoperable APIs, and unified data models—reducing silos and accelerating insights across disciplines.

Vision

An Integrated, FAIR-Driven Data Ecosystem for Earth System Science

The overarching goal of the Data Commons is to enable scalable, analysis-ready access to scientific data through a cohesive, standards-based data infrastructure. This work will position NSF NCAR to support current and future research needs by reducing fragmentation and improving cross-domain data use.

Core Capabilities and Benefits

The Data Commons initiative delivers more than just centralized storage—it is a forward-looking platform designed to accelerate discovery, democratize data access, support reproducible science, and empower AI/ML-driven research.

  • Unified Data Access: A single, structured entry point for curated datasets from across NSF NCAR integrated with the distributed access capabilities of the Open Science Data Federation.
  • Discovery-Driven Design: Standardized, interoperable metadata enhances data discovery, cross-domain reuse, and streamlined access to research outputs.
  • Analysis-Ready Infrastructure:  Supports high-performance processing, advanced workflows, and AI-assisted tools like automated coding and workflow generation.
  • Augmented Search and Documentation: Intelligent services surface related datasets, models, and publications—while assisting with metadata validation, workflow documentation, and governance flagging.
  • AI/ML Enablement: Structured data environments support the training and validation of AI/ML models, enabling improved Earth system simulations, smarter data mining, and advanced pattern recognition (e.g., recommending related datasets via trained models).
  • Operational Resilience: Built on containerized infrastructure for scalability and uptime. Supports rapid deployment, disaster recovery, automated testing, and streamlined updates.
  • Trust and Reuse: A consistent metadata standard ensures reproducibility and benchmarking across domains and institutions. All services align with FAIR principles and community governance practices to support responsible data stewardship.

For inquiries or opportunities to get involved, please contact Doug Schuster.