Community Practices for Data and Sample Preservation
Practices and guidelines for data and sample preservation developed by the earth science and data preservation community.
This is a compilation of practices and guidelines for data and sample preservation used and developed by the wider earth science and data preservation community. Practices are organized by topics listed alphabetically. These topics focus on data and sample preservation; community practices for other data management topics can be found on the USGS Data Management Website
For more information regarding the USGS collections management policy and requirements, see the USGS Survey Manual Instructional Memorandum (IM) CSS 2019-01 and accompanying guidance.
Disclaimer
Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government.
The inclusion of links and pointers to websites external to the Department of the Interior (DOI) is not intended to assign importance to those sites or to the information contained on those sites. It is also not intended to endorse or recommend the organizations sponsoring the websites nor the views expressed or products/services offered on these sites. DOI and the bureaus do not control or guarantee the accuracy, relevance, timeliness, completeness, or Section 508 compliance of information contained on a linked website. Additionally, DOI and the bureaus cannot authorize the use of copyrighted materials contained in linked websites. Visitors must request such authorization from the sponsor of the linked website. DOI and the bureaus are not responsible for any communications visitors receive from linked websites.
Quick Navigation Links:
CARE Principles
The following resources provide an overview of CARE (Collective benefit, Authority to control, Responsibility, and Ethics) principles and guidance for operationalizing CARE and FAIR (Findable, Accessible, Interoperable, and Reusable) principles.
- An overview of the CARE Principles for Indigenous Data Governance, including definitions and components, developed by the Research Data Alliance International Indigenous Data Sovereignty Interest Group
- This publication, based on activities by the Research Data Alliance, provides an introduction to CARE principles and the intersection of CARE and FAIR principles. It outlines steps needed for operationalizing FAIR and CARE principles
- Overview of the CARE principles, the relationship between CARE and FAIR principles, and examples of the application of CARE principles
Collection and Acquisition of Physical Samples
The following resources provide information regarding the proper permitting and authority for collecting physical samples and community practices for acquiring samples and data.
- Department of the Interior Paleontological Resources Protection Act policy, relevant for collecting paleontological material
- USGS policy for obtaining permission for access to private lands
- Guidance for USGS policy about coming onto private property
- The USGS Collection Plan Template provides an overview of the information to document when collecting materials and assists with the planning and documentation of USGS working collections
- Practices for collecting data from the Data Management Skillbuilding Hub
- USGS Data Management website overview of the methods for acquiring data
- Guidelines and best practices for creating and using data templates from the USGS Data Management website
Data Citation
The following resources provide guidelines for developing citations for earth and environmental science data.
- Data citation guidelines from the USGS Data Management website
- Data citation guidelines for earth and environmental science data generated by the Earth Science Information Partners Data Preservation and Stewardship Committee
Data Organization
- Guidance for file and data organization from the USGS Data Management website
- Guidelines for organizing data using the ‘tidy data’ system. These guidelines are intended for use with the tidyverse package for R, but the organization principles can be applied to all data
- Guidelines for data standards from the USGS Data Management website
Data Versioning and Backup
The following resources provide guidance and community best practices for data versioning and backup.
- Best practices for data backups and data security from the USGS Data Management website
- Community best practices for data versioning endorsed by the Research Data Alliance Data Versioning Working Group
- Community best practices for backing up data from the Data Management Skillbuilding Hub
- Community best practices for creating and documenting a backup policy from the Data Management Skillbuilding Hub
- Community best practices for ensuring integrity and accessibility when backing up data from the Data Management Skillbuilding Hub
- Guidance on Documenting Revisions to USGS Scientific Digital Data Releases
Digitization
The following resources provide recommendations and guidance for digitizing documents and physical materials.
- Guidance for the acquisition, processing, and archiving of digital media from iDigBio
- Publication that describes and illustrates five major clusters of tasks that enable efficient and effective digitization of biological collections
Examples of online databases and data catalogs
Links to online databases of scientific collections and data.
- USGS Registry of Scientific Collections (ReSciColl)
- Registry of Research Data Repositories
- DataONE
- U.S. Government’s Open Data
- USGS ScienceBase catalog
- USGS Coastal and Marine Geoscience Data System
- USGS list of Trusted Digital Repositories
- USGS list of acceptable digital repositories for scientific publications and data
- USGS Data Management list of data catalogs and portals
- Global Biodiversity Information Facility
- Integrated Taxonomic Information System
FAIR Principles
The following resources provide an overview of the FAIR principles as well as strategies and recommendations for achieving and maintaining FAIR practices.
Metadata
The following resources provide guidance for metadata standards, preparing metadata, as well as examples of collection metadata records within the USGS.
- Guide for preparing metadata for the USGS Registry of Scientific Collections (ReSciColl)
- Example of an individual scientific collection metadata record in ReSciColl
- Example of multiple working collections maintained within a USGS repository documented in ReSciColl
- A list of metadata standards for a variety of organizations and groups compiled by the Digital Curation Centre
- Guidelines for the development of metadata for physical samples that define the data elements and associated rules for both International Generic Samples Number (IGSN) description and registration
- Guidelines for metadata associated with vertebrate fossils from the Society of Vertebrate Paleontology
- Community practices from Data Management Skillbuilding Hub for data/metadata quality assurance and quality control to enhance the quality of the data and metadata and identify potential errors and techniques to address them
- An example of one agency developing a workflow for physical sample metadata rescue
Open Science
The following resources provide information about open science practices.
- Transform to Open Science (TOPS) mission is a NASA initiative designed to rapidly transform agencies, organizations, and communities to an inclusive culture of open science. The TOPS webpage provides information and guidance regarding open science practices.
- Open science announcements from federal agencies
- Fact sheet regarding the White House Office of Science and Technology’s Year of Open Science 2023
Persistent Identifiers (PIDs)
The following resources provide an overview of the use of persistent unique identifiers, including the International Generic Sample Number (IGSN).
- An overview of the use of persistent identifiers for research samples, resources, and instruments
- An overview of the IGSN describing the current architecture and technical implementation of the IGSN, how IGSNs relate to other identifiers, and how the technical systems are supported by an international governance structure
- An overview of and best practices for the use of Digital Object Identifiers (DOIs) for publications, datasets, software, and physical specimens from the USGS Data Management website
- An overview of the use of Open Researcher and Contributor Identifiers (ORCIDs) for researchers from the USGS Data Management website
- The Research Organization Registry (ROR) is a global, community-led registry of open persistent identifiers for research organizations
Repositories/Physical Storage
The following are community practices for repositories and/or physical sample storage and preservation.
- National Park Service’s recommended practices for museum collections
- USGS National Geological and Geophysical Data Preservation Program’s information for safety, risk management, and continuing operations for samples and data
- American Alliance of Museums guidance on developing Collections Management Policy
Sample Access
The following are guidelines regarding accessing and sampling or subsampling materials in a repository.
- USGS National Geological and Geophysical Data Preservation Program recommended practices for accessing collections and sampling
- Lamont-Doherty Core Repository’s sample request guidelines
- The USGS Directory of Public Repositories of Geological Materials provides information for anyone looking for publicly accessible repositories of geological materials in the U.S. and Canada and has many examples of sample access policies and guidelines
Software and File Formats
The following resources provide guidelines for the use of software in data/sample preservation, including file format guidelines.
- Best practices for file formats for digital data preservation from the USGS Data management website
- Library of Congress’ recommended formats statement
- Guidance for how to identify the most appropriate software from the Data Management Skillbuilding Hub
- Software, data tools, and resources for federal data from the Federal Enterprise Data Resources
- Guidelines for research code and software development more broadly within research communities developed by the Earth Science Information Partners, focusing on current development practices and ways to improve them
- USGS National Geological and Geophysical Data Preservation Program’s information for updating digital data to newer formats
Additional Resources
The following are other compilations of resources and community practices for data and sample preservation topics.
- USGS Data Management website
- USGS Community for Data Integration
- USGS National Geological and Geophysical Data Preservation Program’s collections management resources
- ESIP Data Management Training Clearinghouse, resources on a variety of data-related topics
- Research Data Alliance overview of 23 resources and tools for physical samples
- Research Data Alliance’s recommended practices and outputs catalog
- USGS Data Management training modules
- A curated, informative, and educational resource on data and metadata standards, inter-related to databases and data policies
- A list of digitization resources compiled by iDigBio
- Data Management Skillbuilding Hub’s catalog of best practices for data management
- Additional resources from the Data Management Skillbuilding Hub
- A repository of Federal Enterprise Data Resources