Fluvial Sediment Estimates in StreamStats
Fluvial sediment samples are valuable for addressing key environmental concerns such as aquatic habitat degradation and excess nutrients, but their collection is often impractical across all rivers and timeframes of interest. In addition, previously used analytical and numerical methods (Gray and Simões, 2008; Ellison and others, 2016) have not allowed for the transfer of knowledge from sites that have data to sites that do not have data. To overcome this limitation, the U.S. Geological Survey (USGS) developed machine learning (ML) models to predict suspended-sediment concentrations (SSCs) and bedload transport (BL) in Minnesota rivers that lack physical sediment data.
The USGS trained and validated the ML models with approximately 1,300 SSC samples from 56 sites and 600 BL samples from 43 sites across Minnesota (Lund and Groten, 2022; Lund and others, 2022). The ML models incorporate key features such as streamflow, watershed and catchment characteristics, and rate of change in streamflow (slope), which help explain sediment transport processes. The ML models explained approximately 70 percent of the variability in the SSC and BL samples (Lund and Groten, 2022; Lund and others, 2022). These ML models improved sediment transport predictions for rivers and streams with little or no physical sediment data by leveraging the ability of ML to learn complex nonlinear relations and transfer knowledge from sites with data to sites without data available.


StreamStats Integration
The USGS collaborated with the Minnesota Pollution Control Agency to integrate the ML models into the USGS StreamStats web application. Users have three options for streamflow data input used to run the ML models:
- Select a USGS streamgage—Use existing streamflow data from a USGS streamgage.
- Upload streamflow data—Upload a comma-separated values file with daily or 15-minute streamflow data. Additional streamflow data can be found on the Minnesota Department of Natural Resources Cooperative Stream Gaging website:
- Estimate streamflow using flow duration curve transfer method—Estimate streamflow when data are unavailable.
Once users complete the required steps to run the ML models in StreamStats, the features used, and the corresponding outputs will be available in graphical and comma-separated values formats.


Model Limitations and Data Gaps
Although the ML models cover the State of Minnesota, users may not be able to run the ML models at specific sites for a variety of reasons, including an incorrect basin delineation and (or) incomplete datasets at a specific site. The tool will not run on smaller-sized streams and does not output predictions beyond the range of data used to train the ML models (for example, SSC greater than 7,040 milligrams per liter and BL greater than 1,885 tons per day).
Reference linked below.
Extreme gradient boosting machine learning models, suspended sediment, bedload, streamflow, and geospatial data, Minnesota, 2007-2019
Highlighted publication is featured below, and references are linked below.
Using machine learning in Minnesota’s StreamStats to predict fluvial sediment
Using machine learning to improve predictions and provide insight into fluvial sediment transport
Application of dimensionless sediment rating curves to predict suspended-sediment concentrations, bedload, and annual sediment loads for rivers in Minnesota
Estimating sediment discharge: Appendix D
StreamStats
Fluvial sediment samples are valuable for addressing key environmental concerns such as aquatic habitat degradation and excess nutrients, but their collection is often impractical across all rivers and timeframes of interest. In addition, previously used analytical and numerical methods (Gray and Simões, 2008; Ellison and others, 2016) have not allowed for the transfer of knowledge from sites that have data to sites that do not have data. To overcome this limitation, the U.S. Geological Survey (USGS) developed machine learning (ML) models to predict suspended-sediment concentrations (SSCs) and bedload transport (BL) in Minnesota rivers that lack physical sediment data.
The USGS trained and validated the ML models with approximately 1,300 SSC samples from 56 sites and 600 BL samples from 43 sites across Minnesota (Lund and Groten, 2022; Lund and others, 2022). The ML models incorporate key features such as streamflow, watershed and catchment characteristics, and rate of change in streamflow (slope), which help explain sediment transport processes. The ML models explained approximately 70 percent of the variability in the SSC and BL samples (Lund and Groten, 2022; Lund and others, 2022). These ML models improved sediment transport predictions for rivers and streams with little or no physical sediment data by leveraging the ability of ML to learn complex nonlinear relations and transfer knowledge from sites with data to sites without data available.


StreamStats Integration
The USGS collaborated with the Minnesota Pollution Control Agency to integrate the ML models into the USGS StreamStats web application. Users have three options for streamflow data input used to run the ML models:
- Select a USGS streamgage—Use existing streamflow data from a USGS streamgage.
- Upload streamflow data—Upload a comma-separated values file with daily or 15-minute streamflow data. Additional streamflow data can be found on the Minnesota Department of Natural Resources Cooperative Stream Gaging website:
- Estimate streamflow using flow duration curve transfer method—Estimate streamflow when data are unavailable.
Once users complete the required steps to run the ML models in StreamStats, the features used, and the corresponding outputs will be available in graphical and comma-separated values formats.


Model Limitations and Data Gaps
Although the ML models cover the State of Minnesota, users may not be able to run the ML models at specific sites for a variety of reasons, including an incorrect basin delineation and (or) incomplete datasets at a specific site. The tool will not run on smaller-sized streams and does not output predictions beyond the range of data used to train the ML models (for example, SSC greater than 7,040 milligrams per liter and BL greater than 1,885 tons per day).
Reference linked below.
Extreme gradient boosting machine learning models, suspended sediment, bedload, streamflow, and geospatial data, Minnesota, 2007-2019
Highlighted publication is featured below, and references are linked below.