Data and Model Archive for Preliminary Machine Learning Models of Manganese and 1,4-Dioxane in Groundwater on Long Island, New York
Data and preliminary machine-learning models used to predict manganese and 1,4-dioxane in groundwater on Long Island are documented in this data release. Concentration data used to develop the models were from 910 wells for manganese and 553 wells for 1,4-dioxane, primarily public supply wells, from U.S. Geological Survey, U.S. Environmental Protection Agency (USEPA), and Suffolk County Water Authority sources. Thirty-two explanatory variables describe depth, groundwater flow, land use, soil properties, and other features of the aquifer system. The models use XGBoost, an ensemble tree machine learning method. Four models are documented for manganese, predicting the probability of concentrations relative to four thresholds: 10 micrograms per liter (detection), 50 micrograms per liter (the USEPA Secondary Maximum Contaminant Level), 150 micrograms per liter, and 300 micrograms per liter (the USEPA lifetime health advisory). One model is documented for 1,4-dioxane, predicting the probability of concentrations relative to 0.07 micrograms per liter (detection). The models were used to predict concentrations in two layers of the upper glacial aquifer and three layers of the Magothy aquifer. Predictions were made at a 500-square-foot resolution across the entire island for manganese and across Suffolk County, which occupies the eastern two-thirds of Long Island, for 1,4-dioxane.
The data are provided in data tables, raster files, and model files. One data table describes the 32 explanatory variables (LI_mn_14dx_exp_vars.txt). One data table describes the well data and includes the manganese and 1,4-dioxane concentrations, explanatory variables, and predictions for the wells (LI_mn_14dx_well_data.txt). There is a compressed group (zip file) of five files providing the explanatory variable data used to make predictions for the five aquifer layers (LI_mn_14dx_predinput_griddata.zip) and a zip file of 25 files providing model predictions for each model and aquifer layer (LI_mn_14dx_predoutput_rasters.zip). The data release also contains a tif-format raster file of the prediction grid (LI_mn_14dx_prediction_grid.tif). The models are documented in a zip file (LI_mn_14dx_models.zip) that contains the model object files (R data format) and scripts that can be used to run the models to produce the predictions provided in this data release. Filenames for prediction input and for model output are distinguished by names and numbers as follows: 1_upper_glacial, top layer of the upper glacial aquifer; 3_upper_glacial, bottom layer of the upper glacial aquifer; 5_Magothy, top layer of the Magothy aquifer; 14_Magothy, middle layer of the Magothy aquifer; and 23_Magothy, bottom layer of the Magothy aquifer.
Citation Information
Publication Year | 2022 |
---|---|
Title | Data and Model Archive for Preliminary Machine Learning Models of Manganese and 1,4-Dioxane in Groundwater on Long Island, New York |
DOI | 10.5066/P90AT9YG |
Authors | Leslie A Desimone |
Product Type | Data Release |
Record Source | USGS Asset Identifier Service (AIS) |
USGS Organization | Water Resources Mission Area - Headquarters |
Rights | This work is marked with CC0 1.0 Universal |