Skip to main content
U.S. flag

An official website of the United States government

Machine learning can assign geologic basin to produced water samples using major ion geochemistry

September 30, 2021

Understanding the geochemistry of waters produced during petroleum extraction is essential to informing the best treatment and reuse options, which can potentially be optimized for a given geologic basin. Here, we used the US Geological Survey’s National Produced Waters Geochemical Database (PWGD) to determine if major ion chemistry could be used to classify accurately a produced water sample to a given geologic basin based on similarities to a given training dataset. Two datasets were derived from the PWGD: one with seven features but more samples (PWGD7), and another with nine features but fewer samples (PWGD9). The seven-feature dataset, prior to randomly generating a training and testing (i.e., validation) dataset, had 58,541 samples, 20 basins, and was classified based on total dissolved solids (TDS), bicarbonate (HCO3), Ca, Na, Cl, Mg, and sulfate (SO4). The nine-feature dataset, prior to randomly splitting into a training and testing (i.e., validation) dataset, contained 33,271 samples, 19 basins, and was classified based on TDS, HCO3, Ca, Na, Cl, Mg, SO4, pH, and specific gravity. Three supervised machine learning algorithms—Random Forest, k-Nearest Neighbors, and Naïve Bayes—were used to develop multi-class classification models to predict a basin of origin for produced waters using major ion chemistry. After training, the models were tested on three different datasets: Validation7, Validation9, and one based on data absent from the PWGD. Prediction accuracies across the models ranged from 23.5 to 73.5% when tested on the two PWGD-based datasets. A model using the Random Forest algorithm predicted most accurately compared to all other models tested. The models generally predicted basin of origin more accurately on the PWGD7-based dataset than on the PWGD9-based dataset. An additional dataset, which contained data not in the PWGD, was used to test the most accurate model; results suggest that some basins may lack geochemical diversity or may not be well described, while others may be geochemically diverse or are well described. A compelling result of this work is that a produced water basin of origin can be determined using major ions alone and, therefore, deep basinal fluid compositions may not be as variable within a given basin as previously thought. Applications include predicting the geochemistry of produced fluid prior to drilling at different intervals and assigning historical produced water data to a producing basin.

Citation Information

Publication Year 2021
Title Machine learning can assign geologic basin to produced water samples using major ion geochemistry
DOI 10.1007/s11053-021-09949-8
Authors Jenna L. Shelton, Aaron M. Jubb, Samuel Saxe, Emil D. Attanasi, Alexei Milkov, Mark A Engle, Philip A. Freeman, Christopher Shaffer, Madalyn S. Blondes
Publication Type Article
Publication Subtype Journal Article
Series Title Natural Resources Research
Series Number
Index ID 70224634
Record Source USGS Publications Warehouse
USGS Organization Geology, Energy & Minerals Science Center