Probabilities of arsenic in groundwater at depths used for domestic and public supply in the Central Valley of California are predicted using weak-learner ensemble models (boosted regression trees, BRT) and more traditional linear models (logistic regression, LR). Both methods captured major processes that affect arsenic concentrations, such as the chemical evolution of groundwater, redox differences, and the influence of aquifer geochemistry. Inferred flow-path length was the most important variable but near-surface-aquifer geochemical data also were significant. A unique feature of this study was that previously predicted nitrate concentrations in three dimensions were themselves predictive of arsenic and indicated an important redox effect at >10 μg/L, indicating low arsenic where nitrate was high. Additionally, a variable representing three-dimensional aquifer texture from the Central Valley Hydrologic Model was an important predictor, indicating high arsenic associated with fine-grained aquifer sediment. BRT outperformed LR at the 5 μg/L threshold in all five predictive performance measures and at 10 μg/L in four out of five measures. BRT yielded higher prediction sensitivity (39%) than LR (18%) at the 10 μg/L threshold–a useful outcome because a major objective of the modeling was to improve our ability to predict high arsenic areas.
- Digital Object Identifier: 10.1021/acs.est.6b01914
- Source: USGS Publications Warehouse (indexId: 70184330)