Probability distribution grids of dissolved oxygen and dissolved manganese concentrations at selected thresholds in drinking water depth zones, Central Valley, California
The ascii grids represent regional probabilities that groundwater in a particular location will have dissolved oxygen (DO) concentrations less than selected threshold values representing anoxic groundwater conditions or will have dissolved manganese (Mn) concentrations greater than selected threshold values representing secondary drinking water-quality contaminant levels (SMCL) and health-based screening levels (HBSL) for water quality. The probability models were constrained by the alluvial boundary of the Central Valley to a depth of approximately 300 meters (m). We utilized prediction modeling methods, specifically boosted regression trees (BRT) with a Bernoulli error distribution within a statistical learning framework within R's computing framework (http://www.r-project.org/) to produce two-dimensional probability grids at selected depths throughout the modeling domain. The statistical learning framework seeks to maximize the predictive performance of machine learning methods through model tuning by cross validation. Models were constructed using measured dissolved oxygen and manganese concentrations sampled from 2,767 wells within the alluvial boundary of the Central Valley and over 60 predictor variables from 7 sources (see metadata) and were assembled to develop a model that incorporates regional-scale soil properties, soil chemistry, land use, aquifer textures, and aquifer hydrology. Previously developed Central Valley model outputs of textures (Central Valley Textural Model, CVTM; Faunt and others, 2010) and MODFLOW-simulated vertical water fluxes and predicted depth to water table (Central Valley Hydrologic Model, CVHM; Faunt, 2009) were used to represent aquifer textures and groundwater hydraulics, respectively. The wells used in the BRT models described above were attributed to predictor variable values in ArcGIS using a 500-m buffer. The response variable data consisted of measured DO and Mn concentrations from 2,767 wells within the alluvial boundary of the Central Valley. The data were compiled from two sources: U.S. Geological Survey (USGS) National Water Information System (NWIS) database (all data are publicly available from the USGS at http://waterdata.usgs.gov/ca/nwis/nwis) and the California State Water Resources Control Board Division of Drinking Water (SWRCB-DDW) database (water-quality data are publicly available from the SWRCB at http://geotracker.waterboards.ca.gov/gama/). Only wells with well depth data were selected, and for wells with multiple records, only the most recent sample in the period 19932014 that had the required water-quality data was used. Data were available for 932 wells for the NWIS dataset and 1,835 wells for the SWRCB-DDW dataset. Models were trained on a USGS NWIS dataset of 932 wells and evaluated on an independent hold-out dataset of 1,835 wells from the SWRCB-DDW. We used cross-validation to assess the predictive performance of models of varying complexity as a basis for selecting the final models used to create the prediction grids. Trained models were applied to cross-validation testing data and a separate hold-out dataset to evaluate model predictive performance by emphasizing three model metrics of fit: Kappa, accuracy, and the area under the receiver operator characteristic (ROC) curve. The final trained models were used for mapping predictions at discrete depths to a depth of approximately 300 m. Trained DO and Mn models had accuracies of 86100 percent, Kappa values of 0.690.99, and ROC values of 0.921.0. Model accuracies for cross-validation testing datasets were 8295 percent, and ROC values were 0.870.91, indicating good predictive performance. Kappa values for the cross-validation testing dataset were 0.300.69, indicating fair to substantial agreement between testing observations and model predictions. Hold-out data were available for the manganese model only and indicated accuracies of 8997 percent, ROC values of 0.730.75, and Kappa values of 0.060.30. The predictive performance of both the DO and Mn models was reasonable, considering all three of these fit metrics and the low percentages of low-DO and high-Mn events in the data. See associated journal article (Rosecrans and others, 2017) for complete summary of BRT modeling methods, model fit metrics, and relative influence of predictor variables for a given DO or Mn BRT model. The modeled response variables for the DO BRT models were based on measured DO values from wells at the following thresholds: <0.5 milligrams per liter (mg/L), <1.0 mg/L, and <2.0 mg/L, and these thresholds values were considered anoxic based on literature reviews. The modeled response variables for the Mn BRT models were based on measured Mn values from wells at the following exceedance thresholds: >50 micrograms per liter (g/L), >150 g/L, and >300 g/L. (The 150 g/L manganese threshold represents one-half the USGS HBSL.) The prediction grid discretization below land surface was in 15-m intervals to a depth of 122 m, followed by intervals of 30 m to a depth of 300 m, resulting in 14 two-dimensional probability grids for each constituent (DO and Mn) and threshold. Probability grid maps were also created for the shallow aquifer and deep aquifer represented by the median domestic and public-supply well depths, respectively. A depth of 46 m was used to stratify wells from the training dataset into the shallow and deep aquifer and was derived from depth percentiles associated with domestic and public supply in previous work by Burow and others (2013). In this work, the median well depth categorized as domestic was 30 m below land surface (bls), and the median well depth categorized as public-supply wells was 100 m bls. Therefore, datasets contained in the folders named "DO BRT prediction grids.zip" and "Mn BRT prediction grids.zip" each have 42 probability grids representing specific depths for each of the selected thresholds of DO and Mn BRT threshold models described above. The dataset contained in the folder named "PublicSupply&DomesticGrids.zip" contains probability grids represented by the domestic and public-supply drinking water depths for each of the six BRT models described above (12 grids total).
Citation Information
Publication Year | 2018 |
---|---|
Title | Probability distribution grids of dissolved oxygen and dissolved manganese concentrations at selected thresholds in drinking water depth zones, Central Valley, California |
DOI | 10.5066/F7T151S1 |
Authors | Bernard T. Nolan, Celia Z Rosecrans, JoAnn M. Gronberg |
Product Type | Data Release |
Record Source | USGS Asset Identifier Service (AIS) |
USGS Organization | Water Resources Mission Area - Headquarters |
Rights | This work is marked with CC0 1.0 Universal |