Regional-scale, three-dimensional continuous probability models, were constructed for aspects of redox conditions in the groundwater system of the Central Valley, California. These models yield grids depicting the probability that groundwater in a particular location will have dissolved oxygen (DO) concentrations less than selected threshold values representing anoxic groundwater conditions, or will have dissolved manganese (Mn) concentrations greater than selected threshold values representing secondary drinking water-quality contaminant levels (SMCL) and health-based screening levels (HBSL). The probability models were constrained by the alluvial boundary of the Central Valley to a depth of approximately 300 m. Probability distribution grids can be extracted from the 3-D models at any desired depth, and are of interest to water-resource managers, water-quality researchers, and groundwater modelers concerned with the occurrence of natural and anthropogenic contaminants related to anoxic conditions.
Models were constructed using a Boosted Regression Trees (BRT) machine learning technique that produces many trees as part of an additive model and has the ability to handle many variables, automatically incorporate interactions, and is resistant to collinearity. Machine learning methods for statistical prediction are becoming increasing popular in that they do not require assumptions associated with traditional hypothesis testing. Models were constructed using measured dissolved oxygen and manganese concentrations sampled from 2767 wells within the alluvial boundary of the Central Valley, and over 60 explanatory variables representing regional-scale soil properties, soil chemistry, land use, aquifer textures, and aquifer hydrologic properties. Models were trained on a USGS dataset of 932 wells, and evaluated on an independent hold-out dataset of 1835 wells from the California Division of Drinking Water. We used cross-validation to assess the predictive performance of models of varying complexity, as a basis for selecting final models. Trained models were applied to cross-validation testing data and a separate hold-out dataset to evaluate model predictive performance by emphasizing three model metrics of fit: Kappa; accuracy; and the area under the receiver operator characteristic curve (ROC). The final trained models were used for mapping predictions at discrete depths to a depth of 304.8 m. Trained DO and Mn models had accuracies of 86–100%, Kappa values of 0.69–0.99, and ROC values of 0.92–1.0. Model accuracies for cross-validation testing datasets were 82–95% and ROC values were 0.87–0.91, indicating good predictive performance. Kappas for the cross-validation testing dataset were 0.30–0.69, indicating fair to substantial agreement between testing observations and model predictions. Hold-out data were available for the manganese model only and indicated accuracies of 89–97%, ROC values of 0.73–0.75, and Kappa values of 0.06–0.30. The predictive performance of both the DO and Mn models was reasonable, considering all three of these fit metrics and the low percentages of low-DO and high-Mn events in the data.
|Title||Prediction and visualization of redox conditions in the groundwater of Central Valley, California|
|Authors||Celia Z. Rosecrans, Bernard T. Nolan, JoAnn M. Gronberg|
|Publication Subtype||Journal Article|
|Series Title||Journal of Hydrology|
|Record Source||USGS Publications Warehouse|
|USGS Organization||California Water Science Center; National Water Quality Assessment Program|