Skip to main content
U.S. flag

An official website of the United States government

Data for machine learning predictions of pH in the glacial aquifer system, northern USA

December 30, 2020

A boosted regression tree (BRT) model was developed to predict pH conditions in three-dimensions throughout the glacial aquifer system (GLAC) of the contiguous United States using pH measurements in samples from 18,258 wells and predictor variables that represent aspects of the hydrogeologic setting. Model results indicate that the carbonate content of soils and aquifer materials strongly controls pH and when coupled with long flow paths, results in the most alkaline conditions. Conversely, in areas where glacial sediments are thin and carbonate-poor, pH conditions remain acidic. At depths typical of drinking-water supplies, predicted pH greater than 7.5 - which is associated with arsenic mobilization - occurs more frequently than predicted pH less than 6 - which is associated with water corrosivity and the mobilization of other trace elements. A novel aspect of this model was the inclusion of numerically based estimates of groundwater flow characteristics (age and flow path length) as predictor variables. The sensitivity of pH predictions to these variables was consistent with hydrologic understanding of groundwater flow systems and the geochemical evolution of groundwater quality. The model was not developed to provide precise estimates of pH at any given location. Rather, it can be used to more generally identify areas where contaminants may be mobilized into groundwater and where corrosivity issues may be of concern to set priorities among areas for future groundwater monitoring. Data are provided in 2 tables and 3 compressed files that contain various files associated with the BRT model. The 2 tables include: 1) pH_Predictions_GLAC_GeochMod_Dataset.csv: This table is generally a subset of the pH dataset (the measured pH data for well sites that were separated into the training and testing dataset files “trnData.txt” and “testData.txt” included in model_archive.7z) that was used to model pH conditions but includes some additional wells from Wilson and others (2019). The table includes pH, general chemical characteristics, and concentrations of major and trace elements, calculated parameters, and mineral saturation indices (SI) computed with PHREEQC (Parkhurst and Appelo, 2013) for 9,655 groundwater samples from wells in the GLAC. 2) pH_Predictions_GLAC_Variable_Descriptions.txt: A table listing all variables (short abbreviation and long description) used in the model, including the importance rank of the variable, units, and reference. The 3 compressed files include: 1) model_archive.7z: contains 15 files associated with the BRT model 2) rstack_dom.7z: rstack_dom.txt 3) rstack_pub.7z : rstack_pub.txt Refer to the README.txt file in model_archive.7z for information about the files in the archive and how to use them to run the BRT model. "The "rstack" files represent raster stacks which are a collection of raster layer objects with the same spatial extent and resolution and which are vertically aligned. Rstack.dom consists of raster layer objects at the depth typically used for domestic supplies and, those at the depth typically used for public supplies.