SIParCS 2025 - Sofia Borukhovich
Sofia Borukhovich, Temple University
Natural Language Discovery of NSF NCAR Scientific Data
Recorded Talk
The National Center for Atmospheric Research (NCAR) generates vast and diverse scientific datasets that are publicly accessible. However, locating relevant datasets can be challenging for users due to the volume and complexity. Our project proposes an AI-driven search assistant that transforms this experience by allowing users to make natural-language queries and receive most relevant datasets in response. This project explores the use of Large Language Models (LLMs) to improve scientific data discovery through natural language interaction.
The goal is to improve the discoverability of atmospheric and geoscience datasets by providing a more intuitive search experience for users. This summer, our focus is on building the core functionality of the search application, including processing and vectorizing metadata, integrating LLMs into the search pipeline, and evaluating dataset discoverability across NCAR’s atmospheric and geoscience data.
Mentors: Nathan Hook, Eric Nienhouse, Jason Cunning
Slides and poster