Environmental DNA (eDNA) Data Management
The environmental DNA (eDNA) samples collected, processed, and sequenced by the Upper Midwest Environmental Sciences Center (UMESC) and partner agencies are being archived in a cloud-based database application. Consolidating eDNA data will significantly improve researchers and managers ability to visualize, analyze, and integrate sequence data as a monitoring and early detection system for invasive species. Warehousing eDNA sequencing data in a single location will also provide future research opportunities on previously analyzed sample data as analysis methods and existing DNA reference libraries improve. Both positive- and negative-detection data is retained within the database to provide an accurate picture of any analysis that was previously conducted on the sample data.
These tools and processes aim to improve the availability of eDNA data produced at UMESC and to establish a successful eDNA data management strategy that can be implemented across the USGS. Consolidating data storage and streamlining the analysis of sequencing data is critical for expanding the capacity of eDNA to serve as a monitoring and early detection system for invasive species. UMESC researchers and data management staff are working together to deploy cloud-based data pipelines that will generate actionable datasets from Next-Generation Sequencing (NGS) data utilizing software and methods recognized throughout the eDNA community of practice. Managing eDNA data processing in a cloud environment provides an opportunity to consolidate software scripts and programs used for data processing and transform them into manageable semi-automated data processing pipelines. Naming conventions and minimum data requirements are being established at UMESC in order to standardize the release of eDNA data beyond metadata records, which will allow for reproducible analytical comparisons and assessments across large geographic and taxonomic scales. As the amount of eDNA data being released increases, adhering to data standards will ensure consistent and reliable access to datasets regardless of when they were produced.
The environmental DNA (eDNA) samples collected, processed, and sequenced by the Upper Midwest Environmental Sciences Center (UMESC) and partner agencies are being archived in a cloud-based database application. Consolidating eDNA data will significantly improve researchers and managers ability to visualize, analyze, and integrate sequence data as a monitoring and early detection system for invasive species. Warehousing eDNA sequencing data in a single location will also provide future research opportunities on previously analyzed sample data as analysis methods and existing DNA reference libraries improve. Both positive- and negative-detection data is retained within the database to provide an accurate picture of any analysis that was previously conducted on the sample data.
These tools and processes aim to improve the availability of eDNA data produced at UMESC and to establish a successful eDNA data management strategy that can be implemented across the USGS. Consolidating data storage and streamlining the analysis of sequencing data is critical for expanding the capacity of eDNA to serve as a monitoring and early detection system for invasive species. UMESC researchers and data management staff are working together to deploy cloud-based data pipelines that will generate actionable datasets from Next-Generation Sequencing (NGS) data utilizing software and methods recognized throughout the eDNA community of practice. Managing eDNA data processing in a cloud environment provides an opportunity to consolidate software scripts and programs used for data processing and transform them into manageable semi-automated data processing pipelines. Naming conventions and minimum data requirements are being established at UMESC in order to standardize the release of eDNA data beyond metadata records, which will allow for reproducible analytical comparisons and assessments across large geographic and taxonomic scales. As the amount of eDNA data being released increases, adhering to data standards will ensure consistent and reliable access to datasets regardless of when they were produced.