North Atlantic-Appalachian AI/ML Capabilities

Machine Learning Maps pH and Redox in North Atlantic Coastal Plain GW

Machine Learning Maps pH and Redox in North Atlantic Coastal Plain GW

Groundwater pH and redox conditions are fundamental chemical characteristics controlling the distribution of many contaminants of concern for drinking water or the ecological health of receiving waters. ML methods predict and map key characteristics.


Enabling Artificial Intelligence with Citizen Science in Fish Biology

Enabling Artificial Intelligence with Citizen Science in Fish Biology

Artificial Intelligence (AI) is revolutionizing ecology and conservation by enabling species recognition from photos and videos. This LSC project evaluates the capacity to expand using AI for individual fish recognition for population assessment. 


Process‐Guided Deep Learning Predictions of Lake Water Temperature

Process‐Guided Deep Learning Predictions of Lake Water Temperature

The rapid growth of data in water resources has augmented advanced deep learning tools. This paper evaluates the Process‐Guided Deep Learning (PGDL) hybrid modeling framework with a use‐case of predicting depth‐specific lake water temperatures.


Science Center Objects

Artificial Intelligence (AI) and Machine Learning (ML) includes a broad suite of flexible data-driven empirical approaches to perform tasks that are difficult to implement using conventional methods. AI and ML harness the power of computing resources to evaluate the underlying patterns and relationships within a dataset without explicit instructions. 

The North Atlantic-Appalachian AI/ML Capability Team is comprised of staff with a wide variety of scientific backgrounds who are united by the desire to improve how data is collected and interpreted using AI and ML. As AI/ML is a rapidly changing and evolving field of data science, the capability team is a resource for sharing information, connecting problems with expert knowledge, and lowering the barriers for entry into applying AI/ML to solve earth science problems.



While AI/ML is a broad field, there are some commonly-discussed divisions between types of approaches to predict from data...


Artificial Intelligence

A broad field of data science where an algorithm “learns” from data. These approaches are highly flexible and capable of performing tasks typically requiring human intelligence.


Machine Learning

Approaches in which structured data are provided to an algorithm, which then “learns” the optimal way to predict the desired outcome. Creating the structured data is called “feature engineering”.

Example: A hydrologist wishes to predict the pH and redox conditions in a groundwater aquifer. She provides well location, depth, land use, soil characteristics, and many other attributes along with the measured pH and redox condition. The algorithm learns the pattern in the data and is able to predict the pH and redox condition in unmeasured areas using maps of the attributes provided to the model.

This approach was taken by DeSimone and others (2020) to map pH and redox conditions in the North Atlantic Aquifer System.


Deep Learning

Approaches in which unstructured data are provided to an algorithm. The algorithm “learns” the important features of the dataset as well as how to predict the desired outcome. No “feature engineering” is required.

Example: An ecologist provides 5,000 photographs to an algorithm, along with the correct fish species. The algorithm learns which features of the fish in the photograph distinguish the species and is able to identify the correct species when previously unseen fish photographs are provided.

This approach is being taken by the Leetown Science Center and others to enable citizen scientists and anglers to contribute to fish population assessments using submitted images.


Process/Knowledge Guided Deep Learning

Approaches in which unstructured data and rules or constraints are provided to an algorithm. The algorithm “learns” the important features of the dataset as well as how to predict the desired outcome, and the rules ensure that the results are supported by prior knowledge. No “feature engineering” is required.

Example: A hydrologist wishes to predict the temperature of water within a lake. The hydrologist provides timeseries of air temperature, radiation from the sun, wind speed, and many other variables to a deep learning algorithm. The hydrologist also provides rules based on the law of conservation of energy about what is a physically reasonable temperature. Predictions that are physically unreasonable are penalized during the learning process. The algorithm is able to correctly predict the lake temperature as new data is provided and these temperatures do not violate the law of conservation of energy.

This approach was taken by Read and others (2019) to a set of 68 lake observations. The authors observed superior predictive performance of the process-guided deep learning approaches.

Process-Guided Deep Learning

Infographic: Process-Guided Deep Learning




Predicting outcomes based on data patterns & trends

AI/ML Operations: Creating maps from individual observations, forecasting data from past data, classifying previously unseen data into groups

Earth Science Applications: Forecasting streamflow and groundwater level, predicting water quality in unmonitored areas, predicting abundance and trends in wildlife populations, forecasting economic impacts of mineral commodity supply disruptions, predicting the occurrence of harmful algal blooms, etc.


Recognizing shapes in images and video or sounds in audio

AI/ML Operations: Classifying unseen images or sounds, counting or extracting features from an image, video, or audio recording

Earth Science Applications: Extracting agriculture or energy infrastructure from satellite imagery, classifying vegetation or land use types from aerial or drone imagery, extracting landforms or coastal features from high-resolution LiDAR, predicting streamflow from repeat images or videos taken in a channel, identifying wildlife in images or audio recordings, identifying features in microscopic images, classification of features in geophysical logs, etc.


Finding answers to questions in an information source

AI/ML Operations: Data mining, creating AI assistants to answer questions, classifying, rating according to content, identifying sentiment within text

Earth Science Applications: Extracting information from paper map records, extracting text and information from scanned forms or reports, identifying linkages between scientific journal articles within different fields, identifying the sentiment of posts to social media about earth science observations, answering earth science questions using information extracted from published reports, etc.


Optimization – finding the best solution to a complex problem

AI/ML Operations: Optimizing the routes between two locations, identifying the optimal configuration of monitoring points

Earth Science Applications: Water quality monitoring network optimization, quantifying tradeoffs between factors affecting environmental health, etc.


Finding the natural patterns in a data set

AI/ML Operations: Identifying anomalies, identifying the relationships between groups of data

Earth Science Applications: Data quality assurance of laboratory or field monitoring, identifying sub-groups or facies within multivariate datasets, identifying ecologic hydrologic regimes, etc.



  • A growing community of practice exists within USGS at the national, regional, and Science Center level to take advantage of the AI/ML growing field.
  • Over the past decade, a growing number of open-source tools have become available within the USGS to accommodate AI/ML opportunities.
  • Recent expansion of high-performance computing and cloud hosting service resources has positioned the USGS for meeting the computing needs of AI/ML projects.



  • Connect stakeholder needs to regional and national AI/ML capabilities
  • Increase efficiency and quality of producing and interpreting USGS data
  • Develop a centralized listing of AI/ML training and computing resources
  • Develop guidelines or a fact sheet to communicate to non-data scientists what types of problems are best evaluated with AI/ML methods
  • Increase communication and knowledge transfer among Regional AI/ML scientists
  • Identify AI/ML expertise across the Region and USGS
  • Establish and incorporate best practices for the entire AI/ML project lifecycle
  • Identify and disseminate training opportunities for all skill levels

Credit for Common AI/ML applications: Matt Kukuck, USGS Cloud Hosting Services