Hydrology Monitoring Network: Data Mining and Modeling to Separate Human and Natural Hydrologic Dynamics
The application of data-mining techniques, including artificial neural network (ANN) models, to the Comprehensive Everglades Restoration Plan (CERP) supported databases demonstrates how empirical models of complex hydrologic systems can be developed, disparate databases and models can be integrated to support multidisciplinary research, and study results can be easily disseminated to meet the needs of a broad range of end users.
Hydrology Monitoring Network: Data Mining and Modeling to Separate Human and Natural Hydrologic Dynamics
—————————————————————————————————————————————————————
SUMMARY
New technologies in environmental monitoring have made it cost effective to acquire tremendous amounts of hydrologic and water-quality data. Although these data are a valuable resource for understanding environmental systems, often these data are under utilized and/or under interpreted. The monitoring network(s) supported by the Comprehensive Everglades Restoration Plan (CERP) records tremendous amounts of data each day and the data base incorporates millions of data points describing the environmental response of the system to changing conditions. To enhance the evaluation of the CERP data base, there is an immediate need to apply new methodologies to systematically analyze data sets to address critical issues such as water depths at ungaged locations, water-depths and water-quality responses to controlled flow releases, and optimization of existing hydrologic data-collection networks. There also is a need to maximize data resources by integrating disparate hydrologic and ecologic data bases
PROBLEM
An important part of the USGS mission is to provide scientific information for the effective water-resources management of the Nation. To assess the quantity and quality of the Nation's surface-water, many agencies and universities collects hydrologic and water-quality data from rivers, lakes, and estuaries. The techniques used for this study in FY05, FY06, and FY07 have demonstrated how valuable information can be extracted from existing databases to assist local, state and Federal agencies. The application of data-mining techniques, including artificial neural network (ANN) models, to the CERP supported databases demonstrates how empirical models of complex hydrologic systems can be developed, disparate databases and models can be integrated to support multidisciplinary research, and study results can be easily disseminated to meet the needs of a broad range of end users.
The South Atlantic Water Science Center's data-mining activities in the Everglades has been focused on three areas; the integration of hydrologic database in support of an ecological study of the Snail Kite, the hydrologic record extension and water-level estimate for the Everglades Depth Estimation Network (EDEN), and the development of ANN models to analyze hydrologic and water-quality dynamics in the Loxahatchee Wildlife Refuge.
Snail Kite Hydrology
Hydrologists and ecologists have been working in the Everglades on integrating a long-term hydrologic data network and a short-term ecological database to support ecological models of the habitat of the snail kite, a threatened and endangered raptor. The vegetative structure of these habitats is an expression of both recent past and current hydrological conditions. It is critically important to determine how the species associations within these communities respond differentially to changes in hydrology through time and space.
Record Extension for the Everglades Depth Estimation Network
The Everglades Depth Estimation Network (EDEN) is an integrated network of real-time water-level gaging stations, ground-elevation models, and water-surface models designed to provide scientists, engineers, and water-resource managers with current (2000-present) water-level information for the entire freshwater portion of the greater Everglades. To increase the accuracy of the water-surface models, 25 real-time water-level gaging stations were added to the network of 253 established water-level gaging stations. The expansiveness of the Everglades, limited number of gaging stations, and extreme sensitivity of fauna to small changes in water depth have created a need for accurately predicting water level at locations of the new gaging location to extend the record back in time (hindcast) to be concurrent with the beginning of the EDEN database. Simulated water levels at these new locations has been challenging because an ultra-low gradient makes interactions between meteorology, vegetation, topology, and hydrology complex.
Loxahatchee Hydrologic and Water-Quality Analysis
The Arthur R. Marshal Loxahatchee National Wildlife Refuge (NWR) is the last of the soft-water ecological systems in the Everglades. Historically, the ecosystem was driven by precipitation inputs to the system that were low in conductance and nutrients. With controlled releases into the canal that surround the Refuge, the transport of water with higher conductance and nutrient concentration could potentially alter critical ecosystem functions. With potential alteration of flow patterns to accommodate the restoration of the Everglades, the Refuge could be affected not only by changes in the timing and frequency of hydroperiods but by the quality of the water that inundate the Refuge.
OBJECTIVES AND SCOPE
The objectives and scope of the data-mining activities for the three focus areas are described below.
Snail Kite Hydrology
The principal objective of the snail kite study in Water Conservation Area (WCA) 3b is to separate plant community response due to typical seasonal and inter-annual variances in hydrologic regimes. Hydroperiods of water depths have a significant effect on the nesting and foraging of the snail kite. A critical element of the study is to determine how the vegetative communities respond to temporal and spatial changes in hydrology. Seventeen water-depth recorders are co-located at transects where extensive plant samples are collected. These continuous recorders were established in 2003. A long-term network of three water-level recorders has been maintained since 1991. Using inputs representing the three long-term gages, very accurate ANN models were developed as input to predict the water levels at the 17 short-term sites. The models were then used to hindcast water levels to 1991, resulting, much longer water-level record to help scientists better learn how the snail kite's habitat is affected by changing hydrology.
To maximize the usefulness of the ANN models and the hindcasted data to a broad range of users, a decision support system (DSS) was developed to integrate the historical data, ANN models, simulation controls, statistical analysis, and output. The DSS was developed as a Microsoft ExcelTM/Visual Basic for Applications (VBA) program.
Record Extension for the Everglades Depth Estimation Network
To incorporate the data from the newly added stations to the 7-year EDEN database in the greater Everglades, the short-term water-level records (generally less than 1 year) needed to be simulated back in time (hindcasted) to be concurrent with data from the established gaging stations in the database. A three-step modeling approach using artificial neural network models was used to estimate the water levels at the new stations. The artificial neural network models used static variables that represent the gaging station location and percent vegetation in addition to dynamic variables that represent water-level data from the established EDEN gaging stations. The final step of the modeling approach was to simulate the computed error of the initial estimate to increase the accuracy of the final water-level estimate.
Loxahatchee Hydrologic and Water-Quality Analysis
There are two objectives of this project. The first is to compile the current and historical databases from 1990s to the present and then apply data mining techniques, including ANN models, to analyze the inflows, outflows, precipitation, water-level, conductance, and phosphorus data. The second objective is to build a DSS that integrates the databases, ANN models, simulation controls, streaming graphics, optimization routines, and model output in an easily disseminated Excel application. The DSS will allow Refuge managers to simulate the water level, conductance, and phosphorous models to evaluate various water resource management scenarios.
APPROACH
The approach for the three data-mining activities in the Everglades are described below.
Snail Kite Hydrology
Seventeen water-depth recorders are co-located at transects where extensive plant samples are collected. These continuous recorders were established in 2002. A long-term network of three water-level recorders has been maintained since 1991. Using inputs representing the three long-term gages, very accurate ANN models were developed as input to predict the water depths at the 17 short-term sites. The models were then used to hindcast water depths to 1991, resulting, much longer water-level record to help scientists better learn how the snail kite's habitat is affected by changing hydrology. The results from this study are described in Conrads and Roehl, 2006.
Record Extension for the Everglades Depth Estimation Network
To incorporate the data from the newly added stations to the 7-year EDEN database in the greater Everglades, the short-term water-level records (generally less than 1 year) needed to be simulated back in time (hindcasted) to be concurrent with data from the established gaging stations in the database. A three-step modeling approach using artificial neural network models was used to estimate the water levels at the new stations. The artificial neural network models used static variables that represent the gaging station location and percent vegetation in addition to dynamic variables that represent water-level data from the established EDEN gaging stations. The final step of the modeling approach was to simulate the computed error of the initial estimate to increase the accuracy of the final water-level estimate. The results of this study are presented in Conrads and Roehl, 2007.
Loxahatchee Hydrologic and Water-Quality Analysis
To understand the relationships between canal inflows/outflows and water level, conductance, and phosphorous a Data Mining-based model will be developed to predict water level, conductance, and phosphorous at various locations interest. The steps to be taken are described below.
Step 1. Data Compilation and Merging.
Historic hydrologic and meteorological data from the various Federal and State databases will be merged and time synchronized. Parameters of interest include inflows, outflows, rainfall, wind direction and speed, groundwater levels, water levels, conductance, and phosphorous.
Step 2. Data Preparation
Methods will be used to maximize the information content in the raw data, while diminishing the influence of poor or missing measurements. Signal (time series) processing methods include clustering, filtering, spectral decomposition, estimation of data characteristics and time delays, and synthesizing missing data. Signal processing transforms the "raw" data into "pre-processed" data for analysis and modeling. The data collected from the agencies have different sampling frequencies, ranging from every 15 minutes to once per month. The variables must be "time-merged" by either interpolating between less frequent measurements, or by averaging frequent samples to obtain fewer values.
Another signal processing task is "signal decomposition". The complex behaviors of the variables of a natural system result from interactions between multiple physical forces. Signal decomposition involves digital filtering to split a signal into sub-signals, called "components", that are independently attributable to different physical forces. Components can be periodic, chaotic, or random, or a combination. Digital filtering can also diminish the effect of noise in a signal to improve the amount of useful information that it contains. Working from filtered signals makes the modeling process more efficient, precise, and accurate.
Step 3. Correlation Analysis and Sensitivity Estimation
Correlation analysis quantifies the relationships between many variables and provides deeper understanding of the data. The computer systematically correlates factors that influence parameters of interest, such as water level, conductance, and phosphorous to combinations of controlled and uncontrolled variables, such as inflows, outflows and rainfall. Correlation methods based on statistics and machine learning are applied in combination. Comparing them to known patterns of behavior validates promising results found by the computer. Correlation analysis identifies:
- Relative impact - For example, "What variables impact the increased conductance and phosphorous? And to what degree?"
- Relationships between controlled (inflows and outflows) and uncontrolled variables (meteorology forcing).
- Quantifiable answers to complex questions - For example, "What are the critical temporal and spatial relationships between the controlled releases and the water level, conductance, and phosphorous response in the interior of the Refuge? Which has more effect on these responses – large releases over a short period of time or weekly flow volumes? What are the relative impacts of the inflows/outflow locations on these responses?"
Step 4. Predictive Modeling
Using machine learning, predictive models are developed directly from the data and correlations determined in Steps 2 and 3. To maximize accuracy, the model is constructed from sub-models, which independently correlate periodic and chaotic components. Their outputs are combined to obtain an overall prediction that manifests all of the different forcing functions that are represented by input variables, which affect the output variables. The models of the Refuge will predict water level, conductance, and phosphorous at multiple locations from inputs such as inflow, outflow, rainfall, wind direction and speed.
PUBLICATIONS
Many of the results of this ongoing study have been presented in conference proceedings papers and a USGS reports. Listed below are publications that describe aspects of the data-mining activities of the Everglades.
Conrads, P.A., and Roehl, E.A., Jr. 2007, Hydrologic record extension of water-level data in the Everglades Depth Estimation Network (EDEN) using artificial neural network models, 2000-2006: U.S. Geological Survey, Open-File Report 2007-1350, 56 p.
Conrads, P.A. and Roehl, E.A., 2006, Estimating water depths using artificial neural networks, Hydroinformatics 2006, edited by Philippe Gourbesville, Jean Cunge, Vincent Guinot, Shie-Yui Liong, Vol. 3, p.1643-1650
Conrads, P.A. and Roehl, E.A., Daamen, R.C., and Kitchens, W.M., 2006, Using artificial neural network models to integrate hydrologic and ecological studies of the snail kite in the Everglades, USA, Hydroinformatics 2006, edited by Philippe Gourbesville, Jean Cunge, Vincent Guinot, Shie-Yui Liong, Vol. 3, p.1651-1658
The application of data-mining techniques, including artificial neural network (ANN) models, to the Comprehensive Everglades Restoration Plan (CERP) supported databases demonstrates how empirical models of complex hydrologic systems can be developed, disparate databases and models can be integrated to support multidisciplinary research, and study results can be easily disseminated to meet the needs of a broad range of end users.
Hydrology Monitoring Network: Data Mining and Modeling to Separate Human and Natural Hydrologic Dynamics
—————————————————————————————————————————————————————
SUMMARY
New technologies in environmental monitoring have made it cost effective to acquire tremendous amounts of hydrologic and water-quality data. Although these data are a valuable resource for understanding environmental systems, often these data are under utilized and/or under interpreted. The monitoring network(s) supported by the Comprehensive Everglades Restoration Plan (CERP) records tremendous amounts of data each day and the data base incorporates millions of data points describing the environmental response of the system to changing conditions. To enhance the evaluation of the CERP data base, there is an immediate need to apply new methodologies to systematically analyze data sets to address critical issues such as water depths at ungaged locations, water-depths and water-quality responses to controlled flow releases, and optimization of existing hydrologic data-collection networks. There also is a need to maximize data resources by integrating disparate hydrologic and ecologic data bases
PROBLEM
An important part of the USGS mission is to provide scientific information for the effective water-resources management of the Nation. To assess the quantity and quality of the Nation's surface-water, many agencies and universities collects hydrologic and water-quality data from rivers, lakes, and estuaries. The techniques used for this study in FY05, FY06, and FY07 have demonstrated how valuable information can be extracted from existing databases to assist local, state and Federal agencies. The application of data-mining techniques, including artificial neural network (ANN) models, to the CERP supported databases demonstrates how empirical models of complex hydrologic systems can be developed, disparate databases and models can be integrated to support multidisciplinary research, and study results can be easily disseminated to meet the needs of a broad range of end users.
The South Atlantic Water Science Center's data-mining activities in the Everglades has been focused on three areas; the integration of hydrologic database in support of an ecological study of the Snail Kite, the hydrologic record extension and water-level estimate for the Everglades Depth Estimation Network (EDEN), and the development of ANN models to analyze hydrologic and water-quality dynamics in the Loxahatchee Wildlife Refuge.
Snail Kite Hydrology
Hydrologists and ecologists have been working in the Everglades on integrating a long-term hydrologic data network and a short-term ecological database to support ecological models of the habitat of the snail kite, a threatened and endangered raptor. The vegetative structure of these habitats is an expression of both recent past and current hydrological conditions. It is critically important to determine how the species associations within these communities respond differentially to changes in hydrology through time and space.
Record Extension for the Everglades Depth Estimation Network
The Everglades Depth Estimation Network (EDEN) is an integrated network of real-time water-level gaging stations, ground-elevation models, and water-surface models designed to provide scientists, engineers, and water-resource managers with current (2000-present) water-level information for the entire freshwater portion of the greater Everglades. To increase the accuracy of the water-surface models, 25 real-time water-level gaging stations were added to the network of 253 established water-level gaging stations. The expansiveness of the Everglades, limited number of gaging stations, and extreme sensitivity of fauna to small changes in water depth have created a need for accurately predicting water level at locations of the new gaging location to extend the record back in time (hindcast) to be concurrent with the beginning of the EDEN database. Simulated water levels at these new locations has been challenging because an ultra-low gradient makes interactions between meteorology, vegetation, topology, and hydrology complex.
Loxahatchee Hydrologic and Water-Quality Analysis
The Arthur R. Marshal Loxahatchee National Wildlife Refuge (NWR) is the last of the soft-water ecological systems in the Everglades. Historically, the ecosystem was driven by precipitation inputs to the system that were low in conductance and nutrients. With controlled releases into the canal that surround the Refuge, the transport of water with higher conductance and nutrient concentration could potentially alter critical ecosystem functions. With potential alteration of flow patterns to accommodate the restoration of the Everglades, the Refuge could be affected not only by changes in the timing and frequency of hydroperiods but by the quality of the water that inundate the Refuge.
OBJECTIVES AND SCOPE
The objectives and scope of the data-mining activities for the three focus areas are described below.
Snail Kite Hydrology
The principal objective of the snail kite study in Water Conservation Area (WCA) 3b is to separate plant community response due to typical seasonal and inter-annual variances in hydrologic regimes. Hydroperiods of water depths have a significant effect on the nesting and foraging of the snail kite. A critical element of the study is to determine how the vegetative communities respond to temporal and spatial changes in hydrology. Seventeen water-depth recorders are co-located at transects where extensive plant samples are collected. These continuous recorders were established in 2003. A long-term network of three water-level recorders has been maintained since 1991. Using inputs representing the three long-term gages, very accurate ANN models were developed as input to predict the water levels at the 17 short-term sites. The models were then used to hindcast water levels to 1991, resulting, much longer water-level record to help scientists better learn how the snail kite's habitat is affected by changing hydrology.
To maximize the usefulness of the ANN models and the hindcasted data to a broad range of users, a decision support system (DSS) was developed to integrate the historical data, ANN models, simulation controls, statistical analysis, and output. The DSS was developed as a Microsoft ExcelTM/Visual Basic for Applications (VBA) program.
Record Extension for the Everglades Depth Estimation Network
To incorporate the data from the newly added stations to the 7-year EDEN database in the greater Everglades, the short-term water-level records (generally less than 1 year) needed to be simulated back in time (hindcasted) to be concurrent with data from the established gaging stations in the database. A three-step modeling approach using artificial neural network models was used to estimate the water levels at the new stations. The artificial neural network models used static variables that represent the gaging station location and percent vegetation in addition to dynamic variables that represent water-level data from the established EDEN gaging stations. The final step of the modeling approach was to simulate the computed error of the initial estimate to increase the accuracy of the final water-level estimate.
Loxahatchee Hydrologic and Water-Quality Analysis
There are two objectives of this project. The first is to compile the current and historical databases from 1990s to the present and then apply data mining techniques, including ANN models, to analyze the inflows, outflows, precipitation, water-level, conductance, and phosphorus data. The second objective is to build a DSS that integrates the databases, ANN models, simulation controls, streaming graphics, optimization routines, and model output in an easily disseminated Excel application. The DSS will allow Refuge managers to simulate the water level, conductance, and phosphorous models to evaluate various water resource management scenarios.
APPROACH
The approach for the three data-mining activities in the Everglades are described below.
Snail Kite Hydrology
Seventeen water-depth recorders are co-located at transects where extensive plant samples are collected. These continuous recorders were established in 2002. A long-term network of three water-level recorders has been maintained since 1991. Using inputs representing the three long-term gages, very accurate ANN models were developed as input to predict the water depths at the 17 short-term sites. The models were then used to hindcast water depths to 1991, resulting, much longer water-level record to help scientists better learn how the snail kite's habitat is affected by changing hydrology. The results from this study are described in Conrads and Roehl, 2006.
Record Extension for the Everglades Depth Estimation Network
To incorporate the data from the newly added stations to the 7-year EDEN database in the greater Everglades, the short-term water-level records (generally less than 1 year) needed to be simulated back in time (hindcasted) to be concurrent with data from the established gaging stations in the database. A three-step modeling approach using artificial neural network models was used to estimate the water levels at the new stations. The artificial neural network models used static variables that represent the gaging station location and percent vegetation in addition to dynamic variables that represent water-level data from the established EDEN gaging stations. The final step of the modeling approach was to simulate the computed error of the initial estimate to increase the accuracy of the final water-level estimate. The results of this study are presented in Conrads and Roehl, 2007.
Loxahatchee Hydrologic and Water-Quality Analysis
To understand the relationships between canal inflows/outflows and water level, conductance, and phosphorous a Data Mining-based model will be developed to predict water level, conductance, and phosphorous at various locations interest. The steps to be taken are described below.
Step 1. Data Compilation and Merging.
Historic hydrologic and meteorological data from the various Federal and State databases will be merged and time synchronized. Parameters of interest include inflows, outflows, rainfall, wind direction and speed, groundwater levels, water levels, conductance, and phosphorous.
Step 2. Data Preparation
Methods will be used to maximize the information content in the raw data, while diminishing the influence of poor or missing measurements. Signal (time series) processing methods include clustering, filtering, spectral decomposition, estimation of data characteristics and time delays, and synthesizing missing data. Signal processing transforms the "raw" data into "pre-processed" data for analysis and modeling. The data collected from the agencies have different sampling frequencies, ranging from every 15 minutes to once per month. The variables must be "time-merged" by either interpolating between less frequent measurements, or by averaging frequent samples to obtain fewer values.
Another signal processing task is "signal decomposition". The complex behaviors of the variables of a natural system result from interactions between multiple physical forces. Signal decomposition involves digital filtering to split a signal into sub-signals, called "components", that are independently attributable to different physical forces. Components can be periodic, chaotic, or random, or a combination. Digital filtering can also diminish the effect of noise in a signal to improve the amount of useful information that it contains. Working from filtered signals makes the modeling process more efficient, precise, and accurate.
Step 3. Correlation Analysis and Sensitivity Estimation
Correlation analysis quantifies the relationships between many variables and provides deeper understanding of the data. The computer systematically correlates factors that influence parameters of interest, such as water level, conductance, and phosphorous to combinations of controlled and uncontrolled variables, such as inflows, outflows and rainfall. Correlation methods based on statistics and machine learning are applied in combination. Comparing them to known patterns of behavior validates promising results found by the computer. Correlation analysis identifies:
- Relative impact - For example, "What variables impact the increased conductance and phosphorous? And to what degree?"
- Relationships between controlled (inflows and outflows) and uncontrolled variables (meteorology forcing).
- Quantifiable answers to complex questions - For example, "What are the critical temporal and spatial relationships between the controlled releases and the water level, conductance, and phosphorous response in the interior of the Refuge? Which has more effect on these responses – large releases over a short period of time or weekly flow volumes? What are the relative impacts of the inflows/outflow locations on these responses?"
Step 4. Predictive Modeling
Using machine learning, predictive models are developed directly from the data and correlations determined in Steps 2 and 3. To maximize accuracy, the model is constructed from sub-models, which independently correlate periodic and chaotic components. Their outputs are combined to obtain an overall prediction that manifests all of the different forcing functions that are represented by input variables, which affect the output variables. The models of the Refuge will predict water level, conductance, and phosphorous at multiple locations from inputs such as inflow, outflow, rainfall, wind direction and speed.
PUBLICATIONS
Many of the results of this ongoing study have been presented in conference proceedings papers and a USGS reports. Listed below are publications that describe aspects of the data-mining activities of the Everglades.
Conrads, P.A., and Roehl, E.A., Jr. 2007, Hydrologic record extension of water-level data in the Everglades Depth Estimation Network (EDEN) using artificial neural network models, 2000-2006: U.S. Geological Survey, Open-File Report 2007-1350, 56 p.
Conrads, P.A. and Roehl, E.A., 2006, Estimating water depths using artificial neural networks, Hydroinformatics 2006, edited by Philippe Gourbesville, Jean Cunge, Vincent Guinot, Shie-Yui Liong, Vol. 3, p.1643-1650
Conrads, P.A. and Roehl, E.A., Daamen, R.C., and Kitchens, W.M., 2006, Using artificial neural network models to integrate hydrologic and ecological studies of the snail kite in the Everglades, USA, Hydroinformatics 2006, edited by Philippe Gourbesville, Jean Cunge, Vincent Guinot, Shie-Yui Liong, Vol. 3, p.1651-1658