Bureau Data Integration
The Science Data Management (SDM) Team, within the Core Science Systems Science Analytics and Synthesis program, leads efforts across the USGS to improve Bureau data integration and optimize information access.
SDM serves a core role in managing information assets and science products within the Bureau. This consists of hosting published scientific data outputs and supporting functional linkages between systems and tools to help manage business information needs. Below are examples of how SDM is advancing Bureau data integration.
Standardized Information Services
Hosting and Delivering Public Data Products
SDM additionally supports varied workflows to assist scientists in documenting and publishing finalized data products within ScienceBase, a Bureau repository, and with digital object identifiers (DOIs) through the Asset Identifier Service (AIS). This ensures robust storage and provides a persistent data citation mechanism to support publication needs. SDM workflows also help scientists track downstream use of published resources and perform queries to link related research products across different systems. The application programming interfaces (APIs) for the Asset Identifier Service (AIS) and ScienceBase provide much of the structured content displayed on the public-facing USGS Website's Data Release pages, in the USGS Science Data Catalog, and in Data.gov.
USGS Contact Lookup Service
The SDM Team maintains a RESTful API and reusable web interface components to support querying USGS people and non-USGS collaborators. Contact ask-sdm@usgs.gov if you’re interested in using these resources in your USGS application.
Controlled Vocabulary Hosting and Integration
The SDM Team hosts vocabulary services through ScienceBase Vocab that help support the National Geological and Geophysical Data Preservation Program’s Registry of Scienctific Collections (ReSciColl). Additionally, SAS also hosts the USGS Thesaurus to support the ability for tools to present quality-controlled keyword lookups available under an API service. These resources are used within the USGS Metadata Wizard tool and within several steps of the SDM team’s data publication workflow to support authors’ ability to accurately label products and facilitate scientific search and reporting needs.
Helpful links:
- https://www.sciencebase.gov/vocab/
- https://doimspp.sharepoint.com/sites/usgs-sdm-apps/ScienceBaseProjectDocumentation
- https://apps.usgs.gov/thesaurus/
USGS Science Center Name Management across Bureau Tools
The SDM Team, in collaboration with the USGS Water Mission Area, maintains a standardized science center list and GraphQL service for use across USGS tools and applications such as ScienceBase, the Digital Object Identifier (DOI) Tool, the ScienceBase Data Release Summary Dashboard, and more. This resource helps ensure consistency in information across Business Information Systems in the USGS.
ScienceBase Directory services are available as machine-consumable JSON. Contact ask-sdm@usgs.gov to learn more about how this service is maintained and how it can be used to ensure a quality-controlled process to link resources or reporting needs back to active USGS Science Centers in local workflows and data management tasks.
Science in the Modern Information Technology Landscape
ScienceBase Integration with USGS Cloud and Globus for Data Transfer and Storage
Transferring and storing large scientific datasets are a significant consideration for many research efforts. In partnership with the USGS Advanced Resource Computing High Performance Compute team (ARC-HPC), SDM has built out and continues to refine workflows which connect ScienceBase and the USGS Cloud Hosting Solutions (CHS) environment through the Globus transfer client (www.globus.org). USGS researchers can now store ScienceBase files within USGS CHS cloud storage, which supports larger file uploads and advanced access functionality for cloud-optimized files. Researchers with very large file transfer and storage needs (100 GB+) can also now use Globus to move data into ScienceBase or into other storage locations. Interested users can reach out for additional information and consultation at sciencebase@usgs.gov.
ScienceBase Integration with USGS Dremio Environment
To support USGS data integration goals and reusable workflows for importing data and its corresponding metadata into novel analysis platforms, the ScienceBase team has partnered with USGS CHS to develop a workflow that allows users to easily bring data from ScienceBase into the USGS Dremio environment. This workflow is still in a testing phase, but interested users can contact sciencebase@usgs.gov to learn more.
Linking USGS Systems
Connecting USGS Data with Published Manuscripts
In coordination with the USGS Library Publications Warehouse staff, SDM has implemented an automated workflow through the Application Programming Interface (API) functionality of both business systems to link USGS data release landing pages in ScienceBase to their related primary (manuscript) publication. Reach out to the team at ask-sdm@usgs.gov for additional details on this process.
Python Tools for Business Information Integration
In addition to the core ScienceBase library (https://github.com/usgs/sciencebasepy) which provides convenient functionality to use ScienceBase Catalog with Python, the SDM Team maintains a separate Python package, usgs-datatools, to support linking USGS business systems to help with common data management tasks. This package currently contains modules for working with the DOI Tool, the Metadata Parser, ScienceBase data release data, and more! Contact ask-sdm@usgs.gov to learn more and to start using usgs-datatools.
Spatial Search and Display for USGS Science
ScienceBase integrates with the internal USGS BASIS+ system to store spatial information about ongoing science efforts. Advanced USGS users and tools can work against this information for query and display to better understand where research and science products can be found on the physical landscape. Please contact sciencebase@usgs.gov for more details.