SIParCS 2025 - Sofia Borukhovich

Sofia Borukhovich

Sofia Borukhovich, Temple University

Natural Language Discovery of NSF NCAR Scientific Data

Recorded Talk

The National Center for Atmospheric Research (NCAR) generates vast and diverse scientific datasets that are publicly accessible. However, locating relevant datasets can be challenging for users due to the volume and complexity. Our project proposes an AI-driven search assistant that transforms this experience by allowing users to make natural-language queries and receive most relevant datasets in response. This project explores the use of Large Language Models (LLMs) to improve scientific data discovery through natural language interaction.
The goal is to improve the discoverability of atmospheric and geoscience datasets by providing a more intuitive search experience for users. This summer, our focus is on building the core functionality of the search application, including processing and vectorizing metadata, integrating LLMs into the search pipeline, and evaluating dataset discoverability across NCAR’s atmospheric and geoscience data.

Mentors: Nathan Hook, Eric Nienhouse, Jason Cunning

Slides and poster