Regional aquifers in the Mississippi embayment are the principal sources of water used for public and domestic supply, irrigation, and industrial uses throughout the region. An understanding of how water quality varies spatially, temporally, and with depth are critical aspects to ensuring long-term sustainable use of these resources. A boosted regression tree (BRT) model was used by the U.S. Geological Survey (USGS) to map water quality in the three regional aquifers with the largest groundwater withdrawals in the embayment: the Mississippi River Valley alluvial (MRVA) aquifer, middle Claiborne aquifer (MCAQ), and lower Claiborne aquifer (LCAQ).
The BRT model was used to predict pH to 1-kilometer raster grid cells for seven aquifer layers (one MRVA, four MCAQ, two LCAQ) following the hydrogeologic framework of the Mississippi embayment aquifer system regional MODFLOW model. The methods and approach used for pH predictions are the same as those used recently by the USGS to predict specific conductance and chloride in the aquifers. Explanatory variables for the BRT models included variables describing well location and construction, surficial variables such as soil properties and land use, and variables extracted from the groundwater flow model, such as groundwater levels and ages. The primary source of pH data was the USGS National Water Information System database. Additional data from State ambient groundwater monitoring programs and the Safe Drinking Water Information System also were used. For wells sampled multiple times, the most recent sample was used. Because groundwater residence times are long (greater than 100 years) throughout much of the study area, the possible effects of changes in water quality over time were considered small compared to the improvement in overall model accuracy by using available historical data. Values of pH from 3,362 wells for samples collected between 1960 and 2018 were used as training data for the BRT model. An additional 839 samples were used as holdout data to evaluate model performance. The predictive performance of the pH model is lower than for the training dataset, as indicated by an r-squared value of 0.89 for the training data and an r-squared of 0.71 for the holdout data. The root mean squared errors for the training and holdout data are 0.32 and 0.50 standard pH units, respectively. Data generated during this study and the model output are available from the companion data release.