Miglarese, Radiant Earth Advocate for Benefits of Open Training Datasets

Release Date:

Anne Hale Miglarese has a simple mantra when it comes to gathering and using training data for remote sensing.

Collect it once, the founder of the nonprofit Radiant Earth Foundation, says. Then use it many times.

Color photo of Anne Hale Miglarese with the graphic for the USGS EROS podcast Eyes on Earth

Anne Hale Miglarese, founding CEO of the Radiant Earth Foundation, pictured with the graphic for the USGS EROS podcast "Eyes On Earth."

In connecting the global development community with remote sensing tools that it needs to tackle social, economic, and environmental issues, Radiant Earth provides researchers and data scientists access to a growing catalog of satellite-based training data, such as African crop type classifications, benchmark land cover datasets, and the like. It also offers open machine learning models built on these training data, a feature that is geared to expand.  The availability of open models and cloud computing power help scientists search for trends and change across space and time.

Radiant Earth provides that access through what it calls Radiant MLHub—an open library for geospatial training data to advance machine learning applications on Earth observations. Radiant MLHub serves as a resource for a community of practice, giving data scientists benchmarks they can use to train and validate their models and improve their performances.

Radiant MLHub hosts open training datasets generated by Radiant Earth Foundation’s team, as well as other training data catalogs contributed by Radiant Earth’s partners. Miglarese talked about that and more in this podcast conversation. Here are some highlights from that interview:

Describe the mission of Radiant Earth Foundation.

“Our mission is to empower organizations and individuals globally, really, giving them access to Earth observation data, training data, standards, and tools to address the world’s most challenging problems. We do that through three programs. The majority of our activity today is in Radiant MLHub. The second is to cultivate a community of practice to develop standards for machine learning on Earth observation and to expand the interoperability of these tools and datasets. And finally, third, raising awareness in the global development sector specifically and with data scientists on the innovation that machine learning and Earth observation can bring to the global development and international development community.”

How did you arrive at the Radiant Earth focus on Earth observations and machine learning as the pillars of your foundation’s emphasis?

“When I originally founded Radiant over four years ago now, my hypothesis was that  ... the global development community, in particular, was having a very difficult time getting their hands on imagery, open imagery, and then getting access to the tool set needed to analyze that imagery. Our original mission was to build a cloud-based repository of open satellite imagery for the globe and associate that with open source software tools to make it freely accessible to anyone on the Earth. By the time we actually launched that platform, the Radiant Earth Platform, and opened it up to customers, we did an environmental scan, and 11 commercial companies had developed the same or similar product, many of them with a ‘freemium’ access plan. What that experience told me was that our hypothesis was right, but that it was a commercial marketplace. That is not a good place to spend philanthropic dollars.”

So, what did you do then?

“While we were building the original platform, we had received a grant from the Schmidt Futures to look at building land cover training datasets and opening those up so that others could do machine learning. That work was exceptionally well received. And I think what we have found is as a neutral entity, we are in a place where we can help academics, where we can help governments, where we can help the commercial sector, by being this library, if you will, this first kind of Guggenheim Library specifically for training data on Earth observation. We have received a very positive market reception. It’s a very nice niche for a neutral entity.”

How did you decide to provide open access to training data?

 “I’ve been in the field of remote sensing for 30-plus years. I started with Landsat data in college. But the intersection of cloud computing and machine learning, and with the plethora of data that we have now, we have to use machine learning and artificial intelligence (AI) to analyze all of this data. The very first step in that process though, is good, high-quality training data. It’s the bottleneck; it’s what’s holding back this greater innovation. We need a repository so that training data ... high-quality training data ... are not just used once and discarded, but can be shared with others around the globe so that we over time dramatically reduce that bottleneck.”

You wrote in a recent article that there is “a lack of awareness within the funding community about the ways in which training data can buy down future costs and speed the pace of innovation. Instead the geospatial community must leverage lessons learned from the drive for open geospatial data and bring best practices forward by applying the same techniques and policies to open training data.” What are some of those lessons, and can you incentivize this process?

“I think there should be carrots, and there are sticks, particularly from the funding side of the fence. What we have seen for 20, 30 years, is that when research is funded, oftentimes the data are not openly shared even if that was in the grant language. NASA, USGS, ESA (European Space Agency) ... are very familiar with this issue and are really pushing and highlighting how important it is to share. What we are seeing are significant investments by major philanthropies in machine learning on Earth observation, and they’re in their early days. So, many of them are funding a project or two or planning a large program, but also really understanding the necessity, when they provide these grants, that they deal with and require data management plans. Sharing the data, I think, is fundamental to growing the community, particularly in the global development community.”

Eyes on Earth Episode 34 - Open Training Data

What would some of the carrots be?

“I think the carrots are recognition of best practices. I think the ability, if you are a principal investigator and you share a great dataset in your next study, hopefully you will be able to find through Radiant MLHub a dataset that allows you to use it in your next research project, and buy down the cost and speed of your ability to do the analysis. I think public recognition is also important, and hopefully we will be able to find some really wonderful use cases and stress how important that is.

If organizations or individuals want to contribute their training data to the Radiant Earth effort, what do they need to do?

“It’s really very straightforward. You need some good metadata. We’ll actually host it. We’re in collaboration with the AWS (Amazon Web Services) Open Data program, so we can host it on an AWS bucket. If you would like to keep it in your environment, that’s fine. You will find instructions at Radiant Earth Foundation Radiant MLHub on our website. You can sign on and get an account. It’s all free. If we get a particularly large dataset, we’re happy to consult with organizations to help them register their data.”

Can you paint a picture of that point in time when machine learning and Earth observations are harnessed and available to the global community for addressing sustainable development goals?

“Oh yes, absolutely. I think there are a lot of wonderful initiatives underway and, as I’ve stressed, I think the intersection of a tremendous amount of Earth observation data, with cloud computing, with machine learning, with a global data science community, are going to really drive us to solutions. One that I am particularly interested in, and that Radiant is very focused on, is in agriculture, primarily in the global south. In helping nations and helping regions and helping farmers improve their farming practices by applying machine learning to the Earth observation data, and giving insights to those regions and those farmers on how to improve their yield, I think we are going to see dramatic improvements in solution development in support of the sustainable development goals in a whole host of arenas.”

Talk briefly about Landsat. What does Landsat bring to the Radiant Earth table that will make the organization’s goals a reality?

“I am very excited to see the launch of Landsat 9 next year, and the plans for Landsat Next and the level of collaboration between NASA and USGS and the Copernicus program of the European Union ... and as I understand it, some level of continuity and collaboration that is there to really maximize the quality of the data and the orbits and the timing. So, I’m very excited. Radiant will invest in building training datasets based on Landsat data and work with the community to bring additional Landsat training datasets to Radiant MLHub to support researchers around the globe.”

Related Content

Filter Total Items: 1
Date published: September 21, 2020
Status: Active

Eyes on Earth Episode 34 – Open Training Data

Eyes on Earth is a podcast on remote sensing, Earth observation, land change and science, brought to you by the USGS Earth Resources Observation and Science (EROS) Center. In this episode, we learn about an organization working to make remote sensing data easier to find and use....

Contacts: Janice Nelson