Skip to main content
U.S. flag

An official website of the United States government

Regression model development and computational procedures to support estimation of real-time concentrations and loads of selected constituents in two tributaries to Lake Houston near Houston, Texas, 2005-9

February 24, 2012

In December 2005, the U.S. Geological Survey (USGS), in cooperation with the City of Houston, Texas, began collecting discrete water-quality samples for nutrients, total organic carbon, bacteria (Escherichia coli and total coliform), atrazine, and suspended sediment at two USGS streamflow-gaging stations that represent watersheds contributing to Lake Houston (08068500 Spring Creek near Spring, Tex., and 08070200 East Fork San Jacinto River near New Caney, Tex.). Data from the discrete water-quality samples collected during 2005–9, in conjunction with continuously monitored real-time data that included streamflow and other physical water-quality properties (specific conductance, pH, water temperature, turbidity, and dissolved oxygen), were used to develop regression models for the estimation of concentrations of water-quality constituents of substantial source watersheds to Lake Houston. The potential explanatory variables included discharge (streamflow), specific conductance, pH, water temperature, turbidity, dissolved oxygen, and time (to account for seasonal variations inherent in some water-quality data). The response variables (the selected constituents) at each site were nitrite plus nitrate nitrogen, total phosphorus, total organic carbon, E. coli, atrazine, and suspended sediment. The explanatory variables provide easily measured quantities to serve as potential surrogate variables to estimate concentrations of the selected constituents through statistical regression. Statistical regression also facilitates accompanying estimates of uncertainty in the form of prediction intervals. Each regression model potentially can be used to estimate concentrations of a given constituent in real time. Among other regression diagnostics, the diagnostics used as indicators of general model reliability and reported herein include the adjusted R-squared, the residual standard error, residual plots, and p-values. Adjusted R-squared values for the Spring Creek models ranged from .582–.922 (dimensionless). The residual standard errors ranged from .073–.447 (base-10 logarithm). Adjusted R-squared values for the East Fork San Jacinto River models ranged from .253–.853 (dimensionless). The residual standard errors ranged from .076–.388 (base-10 logarithm). In conjunction with estimated concentrations, constituent loads can be estimated by multiplying the estimated concentration by the corresponding streamflow and by applying the appropriate conversion factor. The regression models presented in this report are site specific, that is, they are specific to the Spring Creek and East Fork San Jacinto River streamflow-gaging stations; however, the general methods that were developed and documented could be applied to most perennial streams for the purpose of estimating real-time water quality data.

Citation Information

Publication Year 2012
Title Regression model development and computational procedures to support estimation of real-time concentrations and loads of selected constituents in two tributaries to Lake Houston near Houston, Texas, 2005-9
DOI 10.3133/sir20125006
Authors Michael T. Lee, William H. Asquith, Timothy D. Oden
Publication Type Report
Publication Subtype USGS Numbered Series
Series Title Scientific Investigations Report
Series Number 2012-5006
Index ID sir20125006
Record Source USGS Publications Warehouse
USGS Organization Texas Water Science Center