Skip to main content
U.S. flag

An official website of the United States government

A hybrid machine learning model to predict and visualize nitrate concentration throughout the Central Valley aquifer, California, USA

May 16, 2017

The ascii grids associated with this data release are model inputs representing the Central Valley aquifer, California, and predicted nitrate concentrations (as NO3-N, mg/L) at two depth zones associated with private and public drinking water supply wells, respectively, . The model input and prediction grids are bound by the alluvial bed boundary that defines the Central Valley. The prediction grids were produced with Boosted Regression Tree (BRT) modeling methods within a statistical modeling framework using the statistical modeling software R (R Core Team, and linear interpolation within Oasis Montaj software (Geosoft, version 9.0.2). The response variable was a set of nitrate concentrations in wells located within the Central Valley. We compiled the database of well nitrate measurements from private supply and public supply wells. Nitrate data came from two sources, the University of California at Davis (UC Davis) and the U.S. Geological Survey (USGS). Prior to statistical modeling, wells were spatially declustered using an equal area grid cell approach to reduce effects on the modeling of oversampling in areas of intensive agricultural land use. A total of 5170 wells were selected, 3508 of which were used for training and 1662 of which served as hold-out. A database of 25 predictor variables was used for the final BRT model and included well characteristics, land use, climate, soil properties, aquifer properties, depth to the water table, and estimates of nitrogen loading and groundwater age. Based on the gridded predictor variables and final model, nitrate predictions were made using the R raster package for 17 depth zones spaced throughout the aquifer (at 15.24, 30.48, 45.72, 60.96, 76.20, 91.44, 106.68, 121.92, 152.40, 182.88, 213.36, 243.84, 274.32, 304.80, 365.76, 426.72, and 487.68 m below ground surface) to create input layers for 3D mapping with Oasis Montaj software version 9.0.2 (GeoSoft, Inc., 2016). The nitrate prediction grids for each of the 17 depth zones were imported into the Oasis Montaj mapping environment software for 3D interpolation and visualization. Each grid was assigned a vertical thickness of 1 m and linear interpolation was used between each of the layers at a vertical resolution of 1 m to produce a complete representation of predicted nitrate concentration at depth throughout the Central Valley. For visualization purposes, nitrate predictions were extracted from the interpolated model at 54.86 m and at 121.92 m deep. These depths correspond to the average total depths of private and public wells for the training wells.

Citation Information

Publication Year 2017
Title A hybrid machine learning model to predict and visualize nitrate concentration throughout the Central Valley aquifer, California, USA
DOI 10.5066/F7V40SDN
Authors Katherine M Ransom, Bernard T Nolan
Product Type Data Release
Record Source USGS Digital Object Identifier Catalog
USGS Organization Office of Planning and Programming