EDNA Stage 1B Seamless Process
The EDNA Stage1B process involves collecting the raw data from the EDNA cooperators, performing QA/QC checks on the raw data, and preparing the data for Stage II of the project. ArcInfo amls are executed to create the seamless database, and ArcView tools are utilized to determine seamless accuracy. The Stage 1B database development provides seamless drainage basin delineations and synthetic streamline coverages that are passed on to Stage II cooperators for quality checks and resource applications.
Stage1B involves quality assurance and quality control checks of the Stage 1 data where thousands of hydrologic units forwarded to the EROS Data Center (EDC) from the EDNA cooperators are edited and processed into seamless 30-meter hydrologic derivatives. The above is an example where eighteen hydrologic units in the northwestern part of the United States are "pieced together" into a seamless dataset.
The three steps of the Stage1B seamless process are:
- Step 1: Preparing and organizing gigabytes of raw data (Stage 1 data processed by the EDNA cooperators)
- Step 2: Processing the raw data into a seamless dataset
- Step 3: Preparing the seamless data for the EDNA Stage 2 process
EDNA Stage1B Seamless Process
Preparing and organizing the EDNA raw data
Step one of the Stage1B seamless process is managing and organizing the coordinated efforts of the EDNA cooperators. This involves the transferring of the raw data from the cooperators, organizing the raw data spatially, and preparing the raw data for seamless production.
Creating a seamless dataset is a space and time issue. EDNA cooperators individually process over 2,200 hydrologic units at different times and locations throughout the United States during the first stage of the project. As each unit is completed, the cooperators transfer the data to EDC where they are tracked prior to the seamless production. Refer to the Stage 1 Status Graphics for the time series production.
In preparation of the Stage1B process, each Hydrologic Unit Code (HUC) is reviewed to ensure that the following exist:
- A 25,000 meter buffer
- Adjacent hydrologic units are received and accessible during the processing stage
- Correct processing as either an open basin or a closed basin
- All Stage1 derivatives are included within the directory
EDNA Stage1B Seamless Process
Processing the raw data into a seamless dataset
Step two is a QA/QC process. It consists of inputting the raw data received from the EDNA cooperators, editing the raw data, and outputting seamless hydrologic derivatives. A specified set of ArcINFO amls are executed to complete these processes. The objective is to edit the raw data to obtain coincidence between all seed points and streamlines throughout the dataset.
An open basin HUC may contain several inflow and outflow points magnifying the number of hydrologic editing situations throughout the country (see graphic below). Each inflow and outflow point is one pixel that contains a seed point and synthetic streamline that must be coincident with adjacent HUC(s) seed points and synthetic streamlines in order to create a seamless dataset.
The HUC in the graphic below contains three inflow seed points (yellow), and one outflow seed point. During the editing session of this unit, four seed points and streamlines are checked for coincidence with all adjacent HUC seed points and streamlines.
The graphic below shows that eight surrounding HUCs affect the probability of 12090106 becoming part of the seamless dataset. Each HUC contains its own set of seed points and streamlines. Therefore, not only are the four seed points and four streamlines that belong to 12090106 checked for coincidence during the processing of 12090106, but all seed points and streamlines that belong to 12090101, 12090105, 12090107, and 12090201 are checked for coincidence with 12090106.
This method also applies to each of the surrounding HUCs during their processing sessions. In order for all of the above HUCs to become part of the seamless dataset, a total of 24 seed points and 24 streamlines are checked for coincidence, and all surrounding raw HUC data must be received from the cooperators.
Several factors that delay seamless production include:
- Adjacent raw data not received yet - HUC 12090106 may be processed through Stage1B, however, until the adjacent raw data are received coincidence cannot be determined. Thus, a true seamless dataset does not exist at the point of processing 12090106 without the adjacent HUC raw data.
- Coincidence does not exist - If coincidence between seed points and streamlines does not exist during the editing session, the seed points for the HUCs are moved either upstream or downstream to a point of coincidence (see graphic right).
- Time series production of raw data - Periodically DEM tiles are updated throughout the country to provide the best available data. This may create different EDNA derivatives. For example, if a cooperator processes a series of HUCs in an area of the country at the beginning of the year and then processes an adjacent area during the later part of the year, the possibility exists that different DEM tiles are accessed for the adjacent data (if that was the most recently updated area for DEMs). As a result, adjacent HUCs may not be coincident with the HUCs processed earlier in the year. To correct this situation, seed points are once again moved upstream or downstream to a point of coincidence between streamlines and seed points (see above graphic).
- Inconsistent buffers - The standard buffer size for one HUC is 25,000 meters. The buffer provides adequate relief surrounding the hydrologic unit boundary to correct the flow accumulation and flow direction. If one HUC is processed at a 25,000 meter buffer and an adjacent HUC is processed at a 5,000 meter buffer, inconsistencies may exist between the hydrologic derivatives. Thus coincidence problems may arise.
Creating the EDNA Stage1B Seamless database involves "piecing together" open basins and closed basins throughout the country. In open basin processing, the DEM is "filled" to a specified threshold creating spurious sinks throughout the hydrologic unit, which allows the synthetic flow to spill off the edge of the DEM. During this process, seed points are placed where synthetic streamlines intersect a HUC boundary. These "intersection points" are the inflow an outflow points of an open basin HUC.
In the case of a closed basin, seed points are placed wherever a "true sink(s)" exists within the HUC. The synthetic flow is directed toward the seed points rather than the edge of the DEM so that the flow remains within the closed basin hydrologic boundary.
Step three of the Stage1B process involves preparing the data for Stage 2 of the EDNA project. This includes:
- Attributing the flow accumulation values for each HUC.
- Attributing each reach catchment within a HUC with a Pfafstetter code to allow for upstream and downstream tracing.
The flow accumulation values represent the number of upstream cells that flow to a specific cell. These values are attributed at the outflow point of each HUC so the total number of upstream cells contributing to a particular HUC can be quantified. The mouth of the Mississippi contains the largest value for the country.
The EDNA dataset contains a unique feature called Pfafstetter. This method delineates and codifies EDNA synthetic streamlines so that upstream and downstream tracing can be determined. Each HUC contains a reach catchment basin for every synthetic streamline confluence (see graphic below).
A "seed point" is added at each synthetic streamline confluence. Every seed point is attributed with a Pfafstetter code that provides the tracing capability (see next graphic).
The Pfafstetter code is based on the following methodology. Within a watershed boundary, the mainstem streamline is numbered one, and the four largest tributaries off the mainstem are assigned even numbered basins (2, 4, 6, & 8) (see graphic below).
The interbasins, every basin between the four largest tributary basins, are assigned odd numbers (3, 5, 7, 9) (see graphic below). The numbering system always the mainstem with 1 and 9 at the top of the watershed boundary.
Within each basin, the Pfafstetter code becomes more defined for every synthetic streamline within each catchment. Every basin is considered independently when attributing the Pfafstetter code by searching for the four largest tributaries within each basin, assigning the four largest tributaries 2, 4, 6, and 8 accordingly, and then assigning the interbasins with the odd numbers; 1, 3, 5, 7, and 9. As the watershed network becomes more detailed, the appropriate numbers are assigned at each integer level. This process continues until all the streamlines within each catchment and basin within a HUC have been assigned a Pfafstetter code.
Each Pfafstetter code within one watershed boundary (each HUC) must have the same number of integers. For instance, if basin 1 only had one tributary but basin 5 had six levels of tributaries where one of the streams was numbered 521000, the Pfafstetter code for basin 1 would need to be assigned 100000 so all the Pfafstetter codes for that particular HUC would have six integers. Within each of the numbered basins in the next graphic, a Pfafstetter code is assigned to the four largest tributaries 2, 4, 6, and 8. The remaining streams are assigned odd numbered basins.
This unique characteristic of the EDNA dataset is a meticulous methodology that can be applied to many types of natural resource applications. For more detailed information contact the authors at kverdin@usgs.gov.
The EDNA Stage1B process involves collecting the raw data from the EDNA cooperators, performing QA/QC checks on the raw data, and preparing the data for Stage II of the project. ArcInfo amls are executed to create the seamless database, and ArcView tools are utilized to determine seamless accuracy. The Stage 1B database development provides seamless drainage basin delineations and synthetic streamline coverages that are passed on to Stage II cooperators for quality checks and resource applications.
Stage1B involves quality assurance and quality control checks of the Stage 1 data where thousands of hydrologic units forwarded to the EROS Data Center (EDC) from the EDNA cooperators are edited and processed into seamless 30-meter hydrologic derivatives. The above is an example where eighteen hydrologic units in the northwestern part of the United States are "pieced together" into a seamless dataset.
The three steps of the Stage1B seamless process are:
- Step 1: Preparing and organizing gigabytes of raw data (Stage 1 data processed by the EDNA cooperators)
- Step 2: Processing the raw data into a seamless dataset
- Step 3: Preparing the seamless data for the EDNA Stage 2 process
EDNA Stage1B Seamless Process
Preparing and organizing the EDNA raw data
Step one of the Stage1B seamless process is managing and organizing the coordinated efforts of the EDNA cooperators. This involves the transferring of the raw data from the cooperators, organizing the raw data spatially, and preparing the raw data for seamless production.
Creating a seamless dataset is a space and time issue. EDNA cooperators individually process over 2,200 hydrologic units at different times and locations throughout the United States during the first stage of the project. As each unit is completed, the cooperators transfer the data to EDC where they are tracked prior to the seamless production. Refer to the Stage 1 Status Graphics for the time series production.
In preparation of the Stage1B process, each Hydrologic Unit Code (HUC) is reviewed to ensure that the following exist:
- A 25,000 meter buffer
- Adjacent hydrologic units are received and accessible during the processing stage
- Correct processing as either an open basin or a closed basin
- All Stage1 derivatives are included within the directory
EDNA Stage1B Seamless Process
Processing the raw data into a seamless dataset
Step two is a QA/QC process. It consists of inputting the raw data received from the EDNA cooperators, editing the raw data, and outputting seamless hydrologic derivatives. A specified set of ArcINFO amls are executed to complete these processes. The objective is to edit the raw data to obtain coincidence between all seed points and streamlines throughout the dataset.
An open basin HUC may contain several inflow and outflow points magnifying the number of hydrologic editing situations throughout the country (see graphic below). Each inflow and outflow point is one pixel that contains a seed point and synthetic streamline that must be coincident with adjacent HUC(s) seed points and synthetic streamlines in order to create a seamless dataset.
The HUC in the graphic below contains three inflow seed points (yellow), and one outflow seed point. During the editing session of this unit, four seed points and streamlines are checked for coincidence with all adjacent HUC seed points and streamlines.
The graphic below shows that eight surrounding HUCs affect the probability of 12090106 becoming part of the seamless dataset. Each HUC contains its own set of seed points and streamlines. Therefore, not only are the four seed points and four streamlines that belong to 12090106 checked for coincidence during the processing of 12090106, but all seed points and streamlines that belong to 12090101, 12090105, 12090107, and 12090201 are checked for coincidence with 12090106.
This method also applies to each of the surrounding HUCs during their processing sessions. In order for all of the above HUCs to become part of the seamless dataset, a total of 24 seed points and 24 streamlines are checked for coincidence, and all surrounding raw HUC data must be received from the cooperators.
Several factors that delay seamless production include:
- Adjacent raw data not received yet - HUC 12090106 may be processed through Stage1B, however, until the adjacent raw data are received coincidence cannot be determined. Thus, a true seamless dataset does not exist at the point of processing 12090106 without the adjacent HUC raw data.
- Coincidence does not exist - If coincidence between seed points and streamlines does not exist during the editing session, the seed points for the HUCs are moved either upstream or downstream to a point of coincidence (see graphic right).
- Time series production of raw data - Periodically DEM tiles are updated throughout the country to provide the best available data. This may create different EDNA derivatives. For example, if a cooperator processes a series of HUCs in an area of the country at the beginning of the year and then processes an adjacent area during the later part of the year, the possibility exists that different DEM tiles are accessed for the adjacent data (if that was the most recently updated area for DEMs). As a result, adjacent HUCs may not be coincident with the HUCs processed earlier in the year. To correct this situation, seed points are once again moved upstream or downstream to a point of coincidence between streamlines and seed points (see above graphic).
- Inconsistent buffers - The standard buffer size for one HUC is 25,000 meters. The buffer provides adequate relief surrounding the hydrologic unit boundary to correct the flow accumulation and flow direction. If one HUC is processed at a 25,000 meter buffer and an adjacent HUC is processed at a 5,000 meter buffer, inconsistencies may exist between the hydrologic derivatives. Thus coincidence problems may arise.
Creating the EDNA Stage1B Seamless database involves "piecing together" open basins and closed basins throughout the country. In open basin processing, the DEM is "filled" to a specified threshold creating spurious sinks throughout the hydrologic unit, which allows the synthetic flow to spill off the edge of the DEM. During this process, seed points are placed where synthetic streamlines intersect a HUC boundary. These "intersection points" are the inflow an outflow points of an open basin HUC.
In the case of a closed basin, seed points are placed wherever a "true sink(s)" exists within the HUC. The synthetic flow is directed toward the seed points rather than the edge of the DEM so that the flow remains within the closed basin hydrologic boundary.
Step three of the Stage1B process involves preparing the data for Stage 2 of the EDNA project. This includes:
- Attributing the flow accumulation values for each HUC.
- Attributing each reach catchment within a HUC with a Pfafstetter code to allow for upstream and downstream tracing.
The flow accumulation values represent the number of upstream cells that flow to a specific cell. These values are attributed at the outflow point of each HUC so the total number of upstream cells contributing to a particular HUC can be quantified. The mouth of the Mississippi contains the largest value for the country.
The EDNA dataset contains a unique feature called Pfafstetter. This method delineates and codifies EDNA synthetic streamlines so that upstream and downstream tracing can be determined. Each HUC contains a reach catchment basin for every synthetic streamline confluence (see graphic below).
A "seed point" is added at each synthetic streamline confluence. Every seed point is attributed with a Pfafstetter code that provides the tracing capability (see next graphic).
The Pfafstetter code is based on the following methodology. Within a watershed boundary, the mainstem streamline is numbered one, and the four largest tributaries off the mainstem are assigned even numbered basins (2, 4, 6, & 8) (see graphic below).
The interbasins, every basin between the four largest tributary basins, are assigned odd numbers (3, 5, 7, 9) (see graphic below). The numbering system always the mainstem with 1 and 9 at the top of the watershed boundary.
Within each basin, the Pfafstetter code becomes more defined for every synthetic streamline within each catchment. Every basin is considered independently when attributing the Pfafstetter code by searching for the four largest tributaries within each basin, assigning the four largest tributaries 2, 4, 6, and 8 accordingly, and then assigning the interbasins with the odd numbers; 1, 3, 5, 7, and 9. As the watershed network becomes more detailed, the appropriate numbers are assigned at each integer level. This process continues until all the streamlines within each catchment and basin within a HUC have been assigned a Pfafstetter code.
Each Pfafstetter code within one watershed boundary (each HUC) must have the same number of integers. For instance, if basin 1 only had one tributary but basin 5 had six levels of tributaries where one of the streams was numbered 521000, the Pfafstetter code for basin 1 would need to be assigned 100000 so all the Pfafstetter codes for that particular HUC would have six integers. Within each of the numbered basins in the next graphic, a Pfafstetter code is assigned to the four largest tributaries 2, 4, 6, and 8. The remaining streams are assigned odd numbered basins.
This unique characteristic of the EDNA dataset is a meticulous methodology that can be applied to many types of natural resource applications. For more detailed information contact the authors at kverdin@usgs.gov.