Ascii grids of predicted pH in depth zones used by domestic and public drinking water supply depths, Central Valley, California

February 23, 2017

The ascii grids associated with this data release are predicted distributions of continuous pH at the drinking water depth zones in the groundwater of Central Valley, California. The two prediction grids produced in this work represent predicted pH at the domestic supply and public supply drinking water depths, respectively and are bound by the alluvial boundary that defines the Central Valley. A depth of 46 m was used to stratify wells into the shallow and deep aquifer and were derived from depth percentiles associated with domestic and public supply in previous work by Burow et al. (2013). In this work, the median well depth categorized as domestic supply was 30 meters below land surface and the median well depth categorized as public supply is 100 meters below land surface. Prediction grids were created using prediction modeling methods, specifically Boosted Regression Trees (BRT) with a gaussian error distribution within a statistical learning framework within R's computing framework ( The statistical learning framework seeks to maximize the predictive performance of machine learning methods through model tuning by cross validation. The response variable was measured pH from 1337 wells, and was compiled from two sources: US Geological Survey (USGS) National Water Information System (NWIS) Database (all data are publicly available from the USGS: and the California State Water Resources Control Board Division of Drinking Water (SWRCB-DDW) database (water quality data are publicly available from the SWRCB: Only wells with measured pH and well depth data were selected, and for wells with multiple records, only the most recent sample in the period 1993-2014 was used. A total of 1003 wells (training dataset) were used to train the BRT model and 334 wells (hold-out dataset) were used to validate the prediction model. The training r-squared was 0.70 and the RMSE in standard pH units was were 0.26. The holdout r-squared was 0.43 and RMSE in standard pH units was 0.37. Predictor variables consisting of more than 60 variables from 7 sources (see metadata) were assembled to develop a model that incorporates regional-scale soil properties, soil chemistry, land use, aquifer textures, and aquifer hydrology. Previously developed Central Valley model outputs of textures (Central Valley Textural Model, CVTM; Faunt et al. 2010) and MODFLOW-simulated vertical water fluxes and predicted depth to water table (Central Valley Hydrologic Model, CVHM; Faunt, 2009) were used to represent aquifer textures and groundwater hydraulics, respectively. In this work, wells were attributed to predictor variable values in ArcGIS using a 500-m buffer. Results of the predictor variable influence as defined by Friedman (2001) for variables used in the final BRT model used for mapping can be downloaded from this landing page (see file named PredictorVariableInfluence_CentralValley_pH_BRT.csv).