Multi-task Deep Learning for Water Temperature and Streamflow Prediction (ver. 1.1, June 2022)

October 10, 2022

This item contains data and code used in experiments that produced the results for Sadler et. al (2022) (see below for full reference). We ran five experiments for the analysis, Experiment A, Experiment B, Experiment C, Experiment D, and Experiment AuxIn. Experiment A tested multi-task learning for predicting streamflow with 25 years of training data and using a different model for each of 101 sites. Experiment B tested multi-task learning for predicting streamflow with 25 years of training data and using a single model for all 101 sites. Experiment C tested multi-task learning for predicting streamflow with just 2 years of training data. Experiment D tested multi-task learning for predicting water temperature with over 25 years of training data. Experiment AuxIn used water temperature as an input variable for predicting streamflow. These experiments and their results are described in detail in the WRR paper. Data from a total of 101 sites across the US was used for the experiments. The model input data and streamflow data were from the Catchment Attributes and Meteorology for Large-sample Studies (CAMELS) dataset (Newman et. al 2014, Addor et. al 2017). The water temperature data were gathered from the National Water Information System (NWIS) (U.S. Geological Survey, 2016). The contents of this item are broken into 13 files or groups of files aggregated into zip files: input_data_processing.zip: A zip file containing the scripts used to collate the observations, input weather drivers, and catchment attributes for the multi-task modeling experiments flow_observations.zip: A zip file containing collated daily streamflow data for the sites used in multi-task modeling experiments. The streamflow data were originally accessed from the CAMELs dataset. The data are stored in csv and Zarr formats. temperature_observations.zip: A zip file containing collated daily water temperature data for the sites used in multi-task modeling experiments. The data were originally accessed via NWIS. The data are stored in csv and Zarr formats. temperature_sites.geojson: Geojson file of the locations of the water temperature and streamflow sites used in the analysis. model_drivers.zip: A zip file containing the daily input weather driver data for the multi-task deep learning models. These data are from the Daymet drivers and were collated from the CAMELS dataset. The data are stored in csv and Zarr formats. catchment_attrs.csv: Catchment attributes collatted from the CAMELS dataset. These data are used for the Random Forest modeling. For full metadata regarding these data see CAMELS dataset. experiment_workflow_files.zip: A zip file containing workflow definitions used to run multi-task deep learning experiments. These are Snakemake workflows. To run a given experiment, one would run (for experiment A) 'snakemake -s expA_Snakefile --configfile expA_config.yml' river-dl-paper_v0.zip: A zip file containing python code used to run multi-task deep learning experiments. This code was called by the Snakemake workflows contained in 'experiment_workflow_files.zip'. random_forest_scripts.zip: A zip file containing Python code and a Python Jupyter Notebook used to prepare data for, train, and visualize feature importance of a Random Forest model. plotting_code.zip: A zip file containing python code and Snakemake workflow used to produce figures showing the results of multi-task deep learning experiments. results.zip: A zip file containing results of multi-task deep learning experiments. The results are stored in csv and netcdf formats. The netcdf files were used by the plotting libraries in 'plotting_code.zip'. These files are for five experiments, 'A', 'B', 'C', 'D', and 'AuxIn'. These experiment names are shown in the file name. sample_scripts.zip: A zip file containing scripts for creating sample output to demonstrate how the modeling workflow was executed. sample_output.zip: A zip file containing sample output data. Similar files are created by running the sample scripts provided. A. Newman; K. Sampson; M. P. Clark; A. Bock; R. J. Viger; D. Blodgett, 2014. A large-sample watershed-scale hydrometeorological dataset for the contiguous USA. Boulder, CO: UCAR/NCAR. https://dx.doi.org/10.5065/D6MW2F4D N. Addor, A. Newman, M. Mizukami, and M. P. Clark, 2017. Catchment attributes for large-sample studies. Boulder, CO: UCAR/NCAR. https://doi.org/10.5065/D6G73C3Q Sadler, J. M., Appling, A. P., Read, J. S., Oliver, S. K., Jia, X., Zwart, J. A., & Kumar, V. (2022). Multi-Task Deep Learning of Daily Streamflow and Water Temperature. Water Resources Research, 58(4), e2021WR030138. https://doi.org/10.1029/2021WR030138 U.S. Geological Survey, 2016, National Water Information System data available on the World Wide Web (USGS Water Data for the Nation), accessed Dec. 2020.

Publication Year	2022
Title	Multi-task Deep Learning for Water Temperature and Streamflow Prediction (ver. 1.1, June 2022)
DOI	10.5066/P9U0TG8L
Authors	Jeffrey M Sadler, Alison P Appling, Jordan S Read, Samantha K Oliver, Xiaowei Jia, Jacob A Zwart, Kumar Vipin
Product Type	Data Release
Record Source	USGS Asset Identifier Service (AIS)
USGS Organization	Water Resources Mission Area - Headquarters
Rights	This work is marked with CC0 1.0 Universal

Multi-task Deep Learning for Water Temperature and Streamflow Prediction (ver. 1.1, June 2022)

EDGE Ecologist and Data Scientist

Chief, Data Science Branch

Hydrologist

Senior Data Scientist

EDGE Ecologist and Data Scientist

Chief, Data Science Branch

Hydrologist

Senior Data Scientist

Water Resources Mission Area - Headquarters

Multi-task Deep Learning for Water Temperature and Streamflow Prediction (ver. 1.1, June 2022)

Citation Information

Related

EDGE Ecologist and Data Scientist

Chief, Data Science Branch

Hydrologist

Senior Data Scientist

Related

EDGE Ecologist and Data Scientist

Chief, Data Science Branch

Hydrologist

Senior Data Scientist