Evaluation of six methods for correcting bias in estimates from ensemble tree machine learning regression model

February 24, 2021

Ensemble-tree machine learning (ML) regression models can be prone to systematic bias: small values are overestimated and large values are underestimated. Additional bias can be introduced if the dependent variable is a transform of the original data. Six methods were evaluated for their ability to correct systematic and introduced bias. Method performance was evaluated using four case studies of groundwater quality: the units of the dependent variable were pH in two and log-concentration in the others. When performance metrics (bias and RMSE for both points and the CDF) were computed using the same units as those in the ML model, empirical distribution matching (EDM) provided the best results. When the metrics were computed using retransformed concentration, EDM and a method incorporating Duan's smearing estimate were both effective. A method based on the Z-score transform approximates EDM if the correlation coefficient between rank-ordered ML estimates and rank-ordered observations approaches one.

Publication Year	2021
Title	Evaluation of six methods for correcting bias in estimates from ensemble tree machine learning regression model
DOI	10.1016/j.envsoft.2021.105006
Authors	Kenneth Belitz, Paul E. Stackelberg
Publication Type	Article
Publication Subtype	Journal Article
Series Title	Environmental Modeling and Software
Index ID	70219219
Record Source	USGS Publications Warehouse
USGS Organization	WMA - Earth System Processes Division

Evaluation of six methods for correcting bias in estimates from ensemble tree machine learning regression model

Research Hydrologist Emeritus

Hydrologist

Research Hydrologist Emeritus

Hydrologist

Water Resources Mission Area - Headquarters

Evaluation of six methods for correcting bias in estimates from ensemble tree machine learning regression model

Citation Information

Related

Data Release for Evaluation of Six Methods for Correcting Bias in Estimates from Ensemble Tree Machine Learning Regression Models Data Release for Evaluation of Six Methods for Correcting Bias in Estimates from Ensemble Tree Machine Learning Regression Models

Ken Belitz (Former Employee)

Research Hydrologist Emeritus

Paul Stackelberg

Hydrologist

Related

Data Release for Evaluation of Six Methods for Correcting Bias in Estimates from Ensemble Tree Machine Learning Regression Models Data Release for Evaluation of Six Methods for Correcting Bias in Estimates from Ensemble Tree Machine Learning Regression Models

Ken Belitz (Former Employee)

Research Hydrologist Emeritus

Paul Stackelberg

Hydrologist