Artificial Intelligence (AI) and Machine Learning (ML) includes a broad suite of flexible data-driven empirical approaches to perform tasks that are difficult to implement using conventional methods. AI and ML harness the power of computing resources to evaluate the underlying patterns and relationships within a dataset without explicit instructions.
The North Atlantic-Appalachian AI/ML Capability Team is comprised of staff with a wide variety of scientific backgrounds who are united by the desire to improve how data is collected and interpreted using AI and ML. As AI/ML is a rapidly changing and evolving field of data science, the capability team is a resource for sharing information, connecting problems with expert knowledge, and lowering the barriers for entry into applying AI/ML to solve earth science problems.
Machine Learning Maps pH and Redox in North Atlantic Coastal Plain GW

Groundwater pH and redox conditions are fundamental chemical characteristics controlling the distribution of many contaminants of concern for drinking water or the ecological health of receiving waters. ML methods predict and map key characteristics.
Enabling Artificial Intelligence with Citizen Science in Fish Biology

Artificial Intelligence (AI) is revolutionizing ecology and conservation by enabling species recognition from photos and videos. This LSC project evaluates the capacity to expand using AI for individual fish recognition for population assessment.
Process‐Guided Deep Learning Predictions of Lake Water Temperature

The rapid growth of data in water resources has augmented advanced deep learning tools. This paper evaluates the Process‐Guided Deep Learning (PGDL) hybrid modeling framework with a use‐case of predicting depth‐specific lake water temperatures.
TERMINOLOGY AND CATEGORIES OF AI/ML
While AI/ML is a broad field, there are some commonly-discussed divisions between types of approaches to predict from data...
Artificial Intelligence
A broad field of data science where an algorithm “learns” from data. These approaches are highly flexible and capable of performing tasks typically requiring human intelligence.
Machine Learning
Approaches in which structured data are provided to an algorithm, which then “learns” the optimal way to predict the desired outcome. Creating the structured data is called “feature engineering”.
Example: A hydrologist wishes to predict the pH and redox conditions in a groundwater aquifer. She provides well location, depth, land use, soil characteristics, and many other attributes along with the measured pH and redox condition. The algorithm learns the pattern in the data and is able to predict the pH and redox condition in unmeasured areas using maps of the attributes provided to the model.
This approach was taken by DeSimone and others (2020) to map pH and redox conditions in the North Atlantic Aquifer System.
Deep Learning
Approaches in which unstructured data are provided to an algorithm. The algorithm “learns” the important features of the dataset as well as how to predict the desired outcome. No “feature engineering” is required.
Example: An ecologist provides 5,000 photographs to an algorithm, along with the correct fish species. The algorithm learns which features of the fish in the photograph distinguish the species and is able to identify the correct species when previously unseen fish photographs are provided.
This approach is being taken by the Leetown Science Center and others to enable citizen scientists and anglers to contribute to fish population assessments using submitted images.
Process/Knowledge Guided Deep Learning
Approaches in which unstructured data and rules or constraints are provided to an algorithm. The algorithm “learns” the important features of the dataset as well as how to predict the desired outcome, and the rules ensure that the results are supported by prior knowledge. No “feature engineering” is required.
Example: A hydrologist wishes to predict the temperature of water within a lake. The hydrologist provides timeseries of air temperature, radiation from the sun, wind speed, and many other variables to a deep learning algorithm. The hydrologist also provides rules based on the law of conservation of energy about what is a physically reasonable temperature. Predictions that are physically unreasonable are penalized during the learning process. The algorithm is able to correctly predict the lake temperature as new data is provided and these temperatures do not violate the law of conservation of energy.
This approach was taken by Read and others (2019) to a set of 68 lake observations. The authors observed superior predictive performance of the process-guided deep learning approaches.
POTENTIAL AI/ML APPLICATIONS FOR EARTH SCIENCES
Predicting outcomes based on data patterns & trends
AI/ML Operations: Creating maps from individual observations, forecasting data from past data, classifying previously unseen data into groups
Earth Science Applications: Forecasting streamflow and groundwater level, predicting water quality in unmonitored areas, predicting abundance and trends in wildlife populations, forecasting economic impacts of mineral commodity supply disruptions, predicting the occurrence of harmful algal blooms, etc.
Recognizing shapes in images and video or sounds in audio
AI/ML Operations: Classifying unseen images or sounds, counting or extracting features from an image, video, or audio recording
Earth Science Applications: Extracting agriculture or energy infrastructure from satellite imagery, classifying vegetation or land use types from aerial or drone imagery, extracting landforms or coastal features from high-resolution LiDAR, predicting streamflow from repeat images or videos taken in a channel, identifying wildlife in images or audio recordings, identifying features in microscopic images, classification of features in geophysical logs, etc.
Finding answers to questions in an information source
AI/ML Operations: Data mining, creating AI assistants to answer questions, classifying, rating according to content, identifying sentiment within text
Earth Science Applications: Extracting information from paper map records, extracting text and information from scanned forms or reports, identifying linkages between scientific journal articles within different fields, identifying the sentiment of posts to social media about earth science observations, answering earth science questions using information extracted from published reports, etc.
Optimization – finding the best solution to a complex problem
AI/ML Operations: Optimizing the routes between two locations, identifying the optimal configuration of monitoring points
Earth Science Applications: Water quality monitoring network optimization, quantifying tradeoffs between factors affecting environmental health, etc.
Finding the natural patterns in a data set
AI/ML Operations: Identifying anomalies, identifying the relationships between groups of data
Earth Science Applications: Data quality assurance of laboratory or field monitoring, identifying sub-groups or facies within multivariate datasets, identifying ecologic hydrologic regimes, etc.
STRENGTHS
- A growing community of practice exists within USGS at the national, regional, and Science Center level to take advantage of the AI/ML growing field.
- Over the past decade, a growing number of open-source tools have become available within the USGS to accommodate AI/ML opportunities.
- Recent expansion of high-performance computing and cloud hosting service resources has positioned the USGS for meeting the computing needs of AI/ML projects.
MAJOR OBJECTIVES
- Connect stakeholder needs to regional and national AI/ML capabilities
- Increase efficiency and quality of producing and interpreting USGS data
- Develop a centralized listing of AI/ML training and computing resources
- Develop guidelines or a fact sheet to communicate to non-data scientists what types of problems are best evaluated with AI/ML methods
- Increase communication and knowledge transfer among Regional AI/ML scientists
- Identify AI/ML expertise across the Region and USGS
- Establish and incorporate best practices for the entire AI/ML project lifecycle
- Identify and disseminate training opportunities for all skill levels
Credit for Common AI/ML applications: Matt Kukuck, USGS Cloud Hosting Services
USGS Community for Data Integration AI/ML Collaboration Area
Science Explorer: Machine Learning
USGS Associate CIO: Cloud Hosting Solutions
Predicting Groundwater Quality in Unmonitored Areas
Data Sciences:Water Resources
USGS High Performance Computing
USGS Cloud Hosting Services AI/ML Support Request
Landsat Missions: Satellites, Supercomputers, and Machine Learning Provide Real-Time Crop Type Data
Quantifying watershed controls on fine sediment particles and nutrient loading to Lake Tahoe using data mining and machine learning
Enabling AI for citizen science in fish biology
Enabling AI for citizen science in fish ecology
Below are publications associated with this project.
Mapping forested wetland inundation in the Delmarva Peninsula, USA: Use of deep learning model
GeoNat v1.0: A dataset for natural feature mapping with artificial intelligence and supervised learning
Naturally occurring uranium in groundwater in northeastern Washington State
Deep convolutional neural networks for map-type classification
Automated road breaching to enhance extraction of natural drainage networks from elevation models through deep learning
Below are news stories associated with this project.
Miglarese, Radiant Earth Advocate for Benefits of Open Training Datasets
Anne Hale Miglarese has a simple mantra when it comes to gathering and using training data for remote sensing.
Collect it once, the founder of the nonprofit Radiant Earth Foundation, says. Then use it many times.
- Overview
Artificial Intelligence (AI) and Machine Learning (ML) includes a broad suite of flexible data-driven empirical approaches to perform tasks that are difficult to implement using conventional methods. AI and ML harness the power of computing resources to evaluate the underlying patterns and relationships within a dataset without explicit instructions.
The North Atlantic-Appalachian AI/ML Capability Team is comprised of staff with a wide variety of scientific backgrounds who are united by the desire to improve how data is collected and interpreted using AI and ML. As AI/ML is a rapidly changing and evolving field of data science, the capability team is a resource for sharing information, connecting problems with expert knowledge, and lowering the barriers for entry into applying AI/ML to solve earth science problems.
Machine Learning Maps pH and Redox in North Atlantic Coastal Plain GWGroundwater pH and redox conditions are fundamental chemical characteristics controlling the distribution of many contaminants of concern for drinking water or the ecological health of receiving waters. ML methods predict and map key characteristics.
Enabling Artificial Intelligence with Citizen Science in Fish BiologyArtificial Intelligence (AI) is revolutionizing ecology and conservation by enabling species recognition from photos and videos. This LSC project evaluates the capacity to expand using AI for individual fish recognition for population assessment.
Process‐Guided Deep Learning Predictions of Lake Water TemperatureThe rapid growth of data in water resources has augmented advanced deep learning tools. This paper evaluates the Process‐Guided Deep Learning (PGDL) hybrid modeling framework with a use‐case of predicting depth‐specific lake water temperatures.
TERMINOLOGY AND CATEGORIES OF AI/ML
While AI/ML is a broad field, there are some commonly-discussed divisions between types of approaches to predict from data...
Artificial Intelligence
A broad field of data science where an algorithm “learns” from data. These approaches are highly flexible and capable of performing tasks typically requiring human intelligence.
Machine Learning
Approaches in which structured data are provided to an algorithm, which then “learns” the optimal way to predict the desired outcome. Creating the structured data is called “feature engineering”.
Example: A hydrologist wishes to predict the pH and redox conditions in a groundwater aquifer. She provides well location, depth, land use, soil characteristics, and many other attributes along with the measured pH and redox condition. The algorithm learns the pattern in the data and is able to predict the pH and redox condition in unmeasured areas using maps of the attributes provided to the model.
This approach was taken by DeSimone and others (2020) to map pH and redox conditions in the North Atlantic Aquifer System.
Deep Learning
Approaches in which unstructured data are provided to an algorithm. The algorithm “learns” the important features of the dataset as well as how to predict the desired outcome. No “feature engineering” is required.
Example: An ecologist provides 5,000 photographs to an algorithm, along with the correct fish species. The algorithm learns which features of the fish in the photograph distinguish the species and is able to identify the correct species when previously unseen fish photographs are provided.
This approach is being taken by the Leetown Science Center and others to enable citizen scientists and anglers to contribute to fish population assessments using submitted images.
Process/Knowledge Guided Deep Learning
Approaches in which unstructured data and rules or constraints are provided to an algorithm. The algorithm “learns” the important features of the dataset as well as how to predict the desired outcome, and the rules ensure that the results are supported by prior knowledge. No “feature engineering” is required.
Example: A hydrologist wishes to predict the temperature of water within a lake. The hydrologist provides timeseries of air temperature, radiation from the sun, wind speed, and many other variables to a deep learning algorithm. The hydrologist also provides rules based on the law of conservation of energy about what is a physically reasonable temperature. Predictions that are physically unreasonable are penalized during the learning process. The algorithm is able to correctly predict the lake temperature as new data is provided and these temperatures do not violate the law of conservation of energy.
This approach was taken by Read and others (2019) to a set of 68 lake observations. The authors observed superior predictive performance of the process-guided deep learning approaches.
Infographic: Process-Guided Deep Learning POTENTIAL AI/ML APPLICATIONS FOR EARTH SCIENCES
Predicting outcomes based on data patterns & trends
AI/ML Operations: Creating maps from individual observations, forecasting data from past data, classifying previously unseen data into groups
Earth Science Applications: Forecasting streamflow and groundwater level, predicting water quality in unmonitored areas, predicting abundance and trends in wildlife populations, forecasting economic impacts of mineral commodity supply disruptions, predicting the occurrence of harmful algal blooms, etc.
Recognizing shapes in images and video or sounds in audio
AI/ML Operations: Classifying unseen images or sounds, counting or extracting features from an image, video, or audio recording
Earth Science Applications: Extracting agriculture or energy infrastructure from satellite imagery, classifying vegetation or land use types from aerial or drone imagery, extracting landforms or coastal features from high-resolution LiDAR, predicting streamflow from repeat images or videos taken in a channel, identifying wildlife in images or audio recordings, identifying features in microscopic images, classification of features in geophysical logs, etc.
Finding answers to questions in an information source
AI/ML Operations: Data mining, creating AI assistants to answer questions, classifying, rating according to content, identifying sentiment within text
Earth Science Applications: Extracting information from paper map records, extracting text and information from scanned forms or reports, identifying linkages between scientific journal articles within different fields, identifying the sentiment of posts to social media about earth science observations, answering earth science questions using information extracted from published reports, etc.
Optimization – finding the best solution to a complex problem
AI/ML Operations: Optimizing the routes between two locations, identifying the optimal configuration of monitoring points
Earth Science Applications: Water quality monitoring network optimization, quantifying tradeoffs between factors affecting environmental health, etc.
Finding the natural patterns in a data set
AI/ML Operations: Identifying anomalies, identifying the relationships between groups of data
Earth Science Applications: Data quality assurance of laboratory or field monitoring, identifying sub-groups or facies within multivariate datasets, identifying ecologic hydrologic regimes, etc.
STRENGTHS
- A growing community of practice exists within USGS at the national, regional, and Science Center level to take advantage of the AI/ML growing field.
- Over the past decade, a growing number of open-source tools have become available within the USGS to accommodate AI/ML opportunities.
- Recent expansion of high-performance computing and cloud hosting service resources has positioned the USGS for meeting the computing needs of AI/ML projects.
MAJOR OBJECTIVES
- Connect stakeholder needs to regional and national AI/ML capabilities
- Increase efficiency and quality of producing and interpreting USGS data
- Develop a centralized listing of AI/ML training and computing resources
- Develop guidelines or a fact sheet to communicate to non-data scientists what types of problems are best evaluated with AI/ML methods
- Increase communication and knowledge transfer among Regional AI/ML scientists
- Identify AI/ML expertise across the Region and USGS
- Establish and incorporate best practices for the entire AI/ML project lifecycle
- Identify and disseminate training opportunities for all skill levels
Credit for Common AI/ML applications: Matt Kukuck, USGS Cloud Hosting Services
- Science
USGS Community for Data Integration AI/ML Collaboration Area
Science Explorer: Machine Learning
USGS Associate CIO: Cloud Hosting Solutions
Predicting Groundwater Quality in Unmonitored Areas
Data Sciences:Water Resources
USGS High Performance Computing
USGS Cloud Hosting Services AI/ML Support Request
Landsat Missions: Satellites, Supercomputers, and Machine Learning Provide Real-Time Crop Type Data
Quantifying watershed controls on fine sediment particles and nutrient loading to Lake Tahoe using data mining and machine learning
Since the late 1980’s, the USGS has collected discharge, sediment, and water quality data at seven major drainages under the Lake Tahoe Interagency Monitoring Program (LTIMP). Recently, continuous, real-time measurements of turbidity were added to the LTIMP. These data can be combined with in situ, model simulations, and remotely-sensed datasets available from the USGS, National Aeronautics and...Enabling AI for citizen science in fish biology
Artificial Intelligence (AI) is revolutionizing ecology and conservation by enabling species recognition from photos and videos. Our project evaluates the capacity to expand AI for individual fish recognition for population assessment. The success of this effort would facilitate fisheries analysis at an unprecedented scale by engaging anglers and citizen scientists in imagery collection. This...Enabling AI for citizen science in fish ecology
Artificial Intelligence (AI) is revolutionizing ecology and conservation by enabling species recognition from photos and videos. Our project evaluates the capacity to expand AI for individual fish recognition for population assessment. The success of this effort would facilitate fisheries analysis at an unprecedented scale by engaging anglers and citizen scientists in imagery collection.This proje - Publications
Below are publications associated with this project.
Mapping forested wetland inundation in the Delmarva Peninsula, USA: Use of deep learning model
The Delmarva Peninsula in the eastern United States is dominated by thousands of small, forested depressional wetlands that are highly sensitive to climate change and climate variability but provide critical ecosystem services. Due to the relatively small size of these depressional wetlands and occurrence under forest canopy cover, it is very challenging to map their inundation status based on exGeoNat v1.0: A dataset for natural feature mapping with artificial intelligence and supervised learning
Machine learning allows “the machine” to deduce the complex and sometimes unrecognized rules governing spatial systems, particularly topographic mapping, by exposing it to the end product. Often, the obstacle to this approach is the acquisition of many good and labeled training examples of the desired result. Such is the case with most types of natural features. To address such limitations, this rNaturally occurring uranium in groundwater in northeastern Washington State
Uranium is a radioactive element (radionuclide) that occurs naturally in rock, soil, and water, usually in low concentrations. Radionuclides are unstable atoms with excess energy and as radionuclides decay, they emit radiation. The uranium decay sequence also includes other radionuclides of concern such as radium and radon. This fact sheet addresses naturally occurring uranium in groundwater in noDeep convolutional neural networks for map-type classification
Maps are an important medium that enable people to comprehensively understand the configuration of cultural activities and natural elements over different times and places. Although a massive number of maps are available in the digital era, how to effectively and accurately locate and access the desired map on the Internet remains a challenge today. Previous works partially related to map-type claAutomated road breaching to enhance extraction of natural drainage networks from elevation models through deep learning
High-resolution (HR) digital elevation models (DEMs), such as those at resolutions of 1 and 3 meters, have increasingly become more widely available, along with lidar point cloud data. In a natural environment, a detailed surface water drainage network can be extracted from a HR DEM using flow-direction and flow-accumulation modeling. However, elevation details captured in HR DEMs, such as roads a - News
Below are news stories associated with this project.
Miglarese, Radiant Earth Advocate for Benefits of Open Training Datasets
Anne Hale Miglarese has a simple mantra when it comes to gathering and using training data for remote sensing.
Collect it once, the founder of the nonprofit Radiant Earth Foundation, says. Then use it many times.