Abstract

Generative LLM agents, powered by Large Language Models (LLMs), process and interpret complex information across domains. These AI tools handle complex queries, integrate diverse data systems, and provide analytical insights. In geospatial analysis, they enhance data interpretation and visualization. We built Earth Copilot, A Proof of Concept (PoC) agentic system prototype on top of NASA VEDA, to aid users in navigating maps and contextualizing relevant data for geospatial queries. For instance, it accesses and identifies datasets like NDVI indices and burn severity indices to quantify ecological impacts.


However, applying advanced AI systems in critical domains like Earth science faces challenges. These models can generate factually incorrect information, leading to misinterpretations of crucial environmental data and potentially degrading trust. Racial, economic, and gender biases in the training corpus can manifest indirectly, skewing resource prioritization based on societal inequities. Our prototype encountered these issues, emphasizing the urgent need for robust guardrails and thorough red-teaming exercises.


The lessons learned emphasize balancing AI’s potential with factuality. The project revealed challenges in maintaining response consistency and mitigating biases in pre-training data. In this talk, we highlight the insights to guide the development of more reliable, transparent, and effective agentic tools, especially in scientific fields like Earth science.