Skip to main content
U.S. flag

An official website of the United States government

Don’t Let Negatives Hold You Back: Accounting for Underlying Physics and Natural Distributions of Hydrothermal Systems When Selecting Negative Training Sites Leads to Better Machine Learning Predictions

December 29, 2023

Selecting negative training sites is an important challenge to resolve when utilizing machine learning (ML) for predicting hydrothermal resource favorability because ideal models would discriminate between hydrothermal systems (positives) and all types of locations without hydrothermal systems (negatives). The Nevada Machine Learning project (NVML) fit an artificial neural network to identify areas favorable for hydrothermal systems by selecting 62 negative sites where the research team had confidence that no hydrothermal resource exists. Herein, we compare the implications of the expert selection of negatives (i.e., the NVML strategy) with a random sample strategy, where it is assumed that areas outside the favorable structural ellipses defined by NVML are negative. Because hydrothermal systems are sparse, it is highly probable that, in the absence of a favorable geological structure, hydrothermal favorability is low. We compare three training strategies: 1) the positive and negative labeled examples from NVML; 2) the positive examples from NVML with randomly selected negatives in equal frequency as NVML; and 3) the positive examples from NVML with randomly selected negatives reflecting the expected natural distribution of hydrothermal systems relative to the total area. We apply these training strategies to the NVML feature data (input data) using two ML algorithms (XGBoost and logistic regression) to create six favorability maps for hydrothermal resources. When accounting for the expected natural distribution of hydrothermal systems, we find that XGBoost performs better than the NVML neural network and its negatives. Model validation was less reliable using F1 scores, a common performance metric, than comparing probability estimates at known positives, likely because of the extreme natural class imbalance and the lack of negatively labeled sites. This work demonstrates that expert selection of negatives for training in NVML likely imparted modeling bias. Accounting for the sparsity of hydrothermal systems and all the types of locations without hydrothermal systems allows us to create better models for predicting hydrothermal resource favorability.

Publication Year 2023
Title Don’t Let Negatives Hold You Back: Accounting for Underlying Physics and Natural Distributions of Hydrothermal Systems When Selecting Negative Training Sites Leads to Better Machine Learning Predictions
Authors Pascal D. Caraccioli, Stanley Paul Mordensky, Cary R. Lindsey, Jacob DeAngelo, Erick Burns, John Lipor
Publication Type Article
Publication Subtype Journal Article
Series Title Geothermal Resources Council Transactions
Index ID 70251035
Record Source USGS Publications Warehouse
USGS Organization Geology, Minerals, Energy, and Geophysics Science Center