Expanding the discipline of climate informatics

At its seventh annual meeting, the 2017 Climate Informatics Workshop brought together participants from diverse disciplines to discuss and share novel analysis methods for climate data. Because climate data generated by satellites, environmental sensors, and computer models is growing so rapidly in both volume and diversity, it is becoming increasingly complex to use. Workshop participants meet to discuss and help develop new, practical, and accurate data analysis methods. This workshop series seeks to build collaborative relationships between climate scientists and researchers from statistics, machine learning, and data mining. The workshops stimulate new ideas, foster new collaborations, grow the climate informatics community, and thus accelerate discovery across disciplinary boundaries.

Hackathon session

Hackathon facilitators work with participants who were challenged to predict seasonal rainfall in California using Python tools. (All photos by Brian Bevirt)

Workshop participants share their work with an interdisciplinary audience and gain valuable feedback from a broad community of domain experts. Many attendees retain these ties and can tap into a growing network of potential collaborators at the http://www.climateinformatics.org/ website, which is hosted by founder Imme Ebert-Imhoff. Climate scientists who attend learn about cutting-edge methods in machine learning and statistics, and computer scientists learn about useful applications of their methods that have strong relevance to society.

“For me,” Andy Rhines (workshop co-chair from the University of Washington) noted, “one of the broader highlights is that we’ve formed an interdisciplinary community that understands where machine learning methods and climate applications intersect. That comes in part from seeing examples of work presented by invited speakers, and also from community-building discussions that occur throughout the workshop.” To highlight advancements in the field, he continued, “This year’s papers illustrated a surge of progress in deep learning being applied to track climate extremes such as hurricanes and atmospheric rivers. The development of several large training datasets has accelerated the pace of research in this area, similar to how the MNIST database accelerated the development of methods to recognize handwritten text.” (Submitted papers will be published as workshop proceedings in an OpenSky repository at NCAR. 

These workshops also incorporate the emerging area of decision support systems. Several groups are developing methods that use machine learning to help guide weather forecasters through the enormous volumes of data that are output by Earth observing systems and numerical simulations. This application of machine learning is likely to continue growing as it focuses on improving an extraordinarily complex human-machine interface.

This year, the Climate Informatics Hackathon was held in advance of the community discussions to enrich the experience of participants during the workshop’s main programs. In the words of David John Gagne, “This year’s hackathon provided participants the opportunity to apply Python and machine learning skills by working on a real-world climate problem of predicting seasonal California rainfall along with a diverse group of climate scientists, statisticians, and machine learning experts. The group implemented a variety of techniques and found clever ways to improve on the baseline predictions.”

Hackathon participants

Hackathon participants

The intensive one-day hackathon was organized into three multi-hour sessions followed by presentations of solutions to the challenge.

Andy Rhines continued: “Hackathon participants get a taste of the challenges and benefits of integrating machine learning and climate science, and each year has seen iterative improvement based on lessons learned by feedback from participants. This year’s hackathon is an excellent case study of the types of climate problems that can be solved through machine learning. It shows how an ensemble of multiple climate simulations can be used to train machine learning prediction models more accurately. In turn, these predictions help us to better understand teleconnections in the climate system – in this year’s case, between extreme precipitation in California and conditions in remote locations. This year’s hackathon was a big step forward in terms of infrastructure, with all of the software – Rapid Analytics and Model Prototyping, or RAMP – being administered remotely for the first time and running in the cloud. Our collaborator, Balázs Kégl at the Paris-Saclay Center for Data Science, has developed a very sophisticated and robust architecture that should be a model for hackathon-type events across many scientific disciplines.”

Hackathon participants

The pre-workshop hackathon attracted 21 participants in 2017.

In summary, Andy Rhines writes, “An important prerequisite for interdisciplinary collaboration is having a forum that allows researchers from two or more fields to learn to speak the same language. Climate Informatics is a discipline that has always had fuzzy boundaries, by virtue of being welcoming to a wide range of methods and applications. But as the community has identified particularly well-matched methods and applications, those boundaries have started to come more into focus. For example, it is becoming clear that while machine learning algorithms cannot replace climate models, the two can be used in tandem to better understand the limited observations we have of the real world.”

Workshop participants

The 2017 Climate Informatics Workshop had more than 60 participants, many of whom return every year, and some of whom are its founding members.