Office of Science Quality and Integrity

Fundamental Science Practices (FSP) Guide to Data Releases With or Without a Companion Publication

Version 02/13/2019

Purpose
Overview
Publishing Path Options
Data Release Obligations
Data Not Associated with a Publication
Data Associated with a Publication
Definitions/Explanations/Requirements
Related References and Tools

Purpose

This document provides guidance for U.S. Geological Survey (USGS) authors to meet the Office of Science and Technology Policy (OSTP) public access and the Office of Management and Budget (OMB) open data directives. This guidance is intended to help publishing USGS scientists and science center data managers meet and understand the USGS Bureau’s requirements for data release, how publishing of scientific results must occur in parallel with the release of the data associated with these results, and how to be in compliance with USGS Fundamental Science Practices (FSP) and Survey Manual (SM) publishing policy requirements. As guidance and processes evolve, this document will be updated.

 

Overview

The OSTP and OMB public access and open data directives, which are interconnected, require data used to support the conclusions in federally authored and (or) federally funded scholarly publications be provided free to the public. The OSTP and OMB approved USGS Public Access Plan provides details on how, effective October 1, 2016, the data upon which our scholarly conclusions are based must be made available free to the public before or simultaneously with publication of those related scholarly conclusions. Currently, not only within the Federal Government but broadly in the worldwide scientific community, the act of providing public access to data supporting scholarly conclusions is embraced and required (refer to this article). Within the USGS, this can result in dual processes (in other words, release of scholarly peer-reviewed publications and separately, release of the data). These dual processes have differing requirements, options, and timelines.

Historically, USGS funded data were commonly released as supplemental files associated with a USGS publication or were included as supporting material in outside publications such as journal articles. Considerations about publishing data were sometimes secondary to the analysis and interpretation of the data.

Through time, changes have occurred that make it necessary to implement new USGS data management requirements. As articulated in the USGS Public Access Plan, for example, datasets have grown in size and scope, technology has allowed more frequent updates to data, more stringent Federal requirements for data have been established, and new tools have emerged that allow users to view data on a map or in a model. With these changes, it has become necessary and required to consider data separately from an associated scholarly publication and ensure free public access to both. In all cases, the handling and release of USGS funded data and other data associated with USGS science must be described in the data management plan (DMP) associated with the project. A DMP, as described in SM 502.6, includes standards and intended actions as appropriate to the project for acquiring, processing, analyzing, preserving, publishing/sharing, describing, and managing the quality of, backing up, and securing the data holdings (Data Management Planning). The DMP is a living document that should be updated as needed to reflect the reality of the scope of work and it serves as a record of the data management activities throughout the lifecycle of the project.

USGS funded scientific data are those data collected with federally appropriated funds provided to the Bureau as part of the congressionally enacted USGS budget and therefore are subject to USGS requirements for data release (refer to SM 502.8). Data collected by USGS scientists with funding from another Federal agency or a non-Federal cooperator are not considered USGS funded data, and the responsibility for providing public release of these data should be clearly specified in the contract or agreement with the other agency or cooperator as well as in the USGS project data management plan. All final scientific data either released by the USGS or provided to research partners under cooperative or other collaborative agreements must first be reviewed for quality and accuracy and must include complete metadata.

 

Publishing Path Options

USGS authors should determine the appropriate path for publishing or releasing data before a project begins. The appropriate publication path must be identified in the DMP that must be part of the project proposal as described in SM 502.6. Refer to this chart, which outlines the steps for each publishing path described above and lists resources for help with specific requirements.

Nearly every publishing situation will fit into one of four paths, all of which require valid metadata associated with the USGS funded data (as described in SM 502.7):

  • dataset,
  • dataset with descriptive information beyond the required metadata,
  • dataset supporting interpretive content published by the USGS, or
  • dataset supporting interpretive content published outside the USGS (journal or periodical article, cooperator publication).

 

Data Release Obligations

Different requirements apply for data created with USGS funding than for data funded by others. Following are distinctions regarding funding source agencies for the data and the roles of USGS scientists.

  1. USGS funded data, USGS scientist is principal investigator (PI) or first author – USGS data release is required. Data publicly available from existing sources, such as the National Water Information System (NWIS), the BioData Retrieval, or the Earth Resources Observation and Science (EROS) Center, used to support scholarly conclusions in publications do not require a separate data release.
  2. USGS funded data, USGS scientist is not PI but a co-author – USGS data release is required but not necessarily by the USGS co-author. Who releases these data and when is described explicitly in the DMP.
  3. Federally (not USGS) funded data, USGS scientist is PI or first author – Federal funding agency has the primary responsibility for releasing data unless an alternative release path is explicitly stated in the DMP requiring USGS to take responsibility for and release the data.
  4. Federally (not USGS) funded data, USGS scientist is not PI but a co-author – Federal funding agency has the primary responsibility for releasing data unless an alternative release path is explicitly stated in the DMP requiring USGS to take responsibility for and release the data.
  5. State/local/non-governmental organization (NGO)/private sector (PS) funded data, USGS scientist is PI or first author – State/local/NGO/PS has the primary responsibility for releasing data unless an alternative release path is explicitly stated in the DMP requiring USGS to take responsibility for and release the data.
  6. State/local/NGO/PS funded data, USGS scientist is not PI but a co-author – State/local/NGO/PS has the primary responsibility for releasing the data unless an alternative release path is explicitly stated in the DMP requiring USGS to take responsibility for and release the data.

 

Data Not Associated with a Publication

Is it allowable to release USGS funded data before or without a companion interpretive publication?

 

Data Associated with a Publication

When is a separate data release required for USGS funded data that support the conclusions in the publication?

  • A separate data release is required when the data are not contained within the body of the outside publication or USGS series publication or available from an acceptable digital repository.

Is there a size cutoff for data tables within the body of a publication or in associated appendixes and supplemental files?

  • The size of a data table presented solely within the body of a publication depends on publisher requirements. Journals and other outside publishers generally have a size cutoff for within-article tables. For USGS series publications, authors should contact the Bureau Approving Official (BAO) or local Publishing Service Center Chief during the early stages of product development for guidance on the maximum sizes for tables and associated appendix and supplemental files. Although an appendix or supplemental file may contain a summary of the data that support the publication, the complete dataset may not be contained solely within an appendix or supplemental file, regardless of size.

Can graphs and other illustrations in a publication with USGS authors serve as a form of data release?

  • No. Illustrations in publications generally portray visual representations of the interpretations being presented. While the illustrations (whether they be graphics or plots or maps) are used to support the interpretations and conclusions being presented, the data portrayed or plotted in illustrations are to be included in the data release associated with the publication, contained within the body of the publication, or available digitally from an acceptable digital repository.

When is a separate data release not required for USGS funded data that support the conclusions in the publication?

  • A separate data release is not required when the data are contained (in a numbered table) within the body of the outside publication or USGS series publication or are publicly available from an acceptable digital repository.

What are the citation and referencing requirements for data releases associated with a publication?

  • The references section of the companion publication must include a citation, with the DOI, for the data release.
  • The landing page and metadata for the data release must include a citation, with the DOI, for the companion publication.

What statement(s) must be used to indicate the availability and, if applicable, the location of data that support the conclusions in a publication, and where should the statement(s) be placed?

  • Below are examples of statements to be used in various cases to describe where the data reside or to clarify disposition of any data or reasons for partial release or lack of release. Add the applicable data statement(s) to the internal USGS Information Product Data System (IPDS) Notes Tab and the publication manuscript before peer review.

    Insert appropriate text for bracketed information and retain parentheses where indicated. See “Data Citation” and USGS Publishing Standards Memorandum 2014.03 for additional citation guidance.

    Case 1. Data are available from an acceptable repository (includes USGS data release products). 

    • IPDS: Data generated during this study are available from the [acceptable repository], [DOI URL].
    • Manuscript: Data generated during this study are available as a USGS data release ([author], [date]).
                          Data generated during this study are available from the [acceptable repository] ([author], [date]).

    Case 2. Data are partially available from an acceptable repository.

    • IPDS: Data generated during this study are partially available from the [acceptable repository], [DOI URL]. Funding for this study was provided by [responsible agency]. [Describe funding and responsibility for data release].
    • Manuscript: Data generated during this study are partially available from the [acceptable repository] ([author],[date]). Funding for this study was provided by [responsible agency]. [Describe funding and responsibility for data release].

    Case 3. Data are not available at time of publication.

    • IPDS and Manuscript: At the time of publication, data are not available from the [responsible non-USGS agency].

    Case 4. Data either are not available or have limited availability owing to restrictions (proprietary or sensitivity).

    • IPDS and Manuscript: Data either are not available or have limited availability owing to restrictions ([state reason for restrictions, such as proprietary interest or sensitivity concern]). Contact [third party name] for more information.

    Case 5. Data generated or analyzed are included in the main text of the publication.

    • IPDS and Manuscript: All data generated or analyzed during this study are included in the main text of this publication.

    Case 6. Data were not generated or analyzed for this publication.

    • IPDS and Manuscript: No datasets were generated or analyzed for this publication.
  • Refer also to “Data Release” in the “Definitions/Explanations/Requirements” section of this guidance for additional requirements.

 

Definitions/Explanations/Requirements

Data

  • Observations or measurements (unprocessed or processed) represented as text, numbers, or multimedia (from SM 502.8)

Dataset

  • A structured collection of data (from SM 502.8).

USGS Funded Data

  • Those data collected or created by using USGS appropriated funds.

Data Release

  • A type of USGS information product designed to provide USGS scientists with a channel to publish reviewed and approved data. Unless exempted under provisions addressing sensitive or proprietary data, USGS authors are required to make Bureau-funded data publicly available. The data that support scholarly publications must be released prior to, or simultaneously with, the associated publication. Other project data must be released no later than the end of the project. These requirements may be met with USGS data releases. Data release products contain one or multiple datasets, alongside metadata that describe what the files contain, provide relevant details about the collection or production of the data, and offer guidelines regarding appropriate use of the data.
  • A USGS information product that, after receiving Bureau approval (SM 502.8), makes basic data, datasets, and databases, individually cataloged with the Federal Geographic Data Committee (FGDC) Content Standard for Digital Geospatial Metadata (CSDGM) or International Organization for Standardization (ISO) compliant metadata, available to the public free of charge. Additionally, a data release may include digital data in the form of multimedia, animations, videos, and digital photographs.
  • May use a map-based viewer through which the data are accessed.
  • Does not include interpretation of what the data or observations mean.
  • Is approved at the Science Center Director level.
  • Requires FGDC or ISO metadata to describe the data. This applies to both nongeospatial and geospatial data (refer to SM 502.7).
  • Requires tracking and documentation of review and approval in the IPDS.
  • Requires assignment of a DataCite DOI by the author, Science Center data manager, or USGS ScienceBase staff using the USGS DOI Tool if hosted on a USGS data repository.
  • The metadata record must be registered with the USGS Science Data Catalog. How this is accomplished is dependent upon where the data are housed (contact CSAS@usgs.gov).
  • Made available to the public through an acceptable repository for USGS digital assets.
  • Data must be preserved in accordance with the requirements described in SM 502.9.
  • Commonly, but not necessarily, paired with and linked to a USGS series or outside publication that presents an extended description and (or) interpretation of those data.
  • USGS data series (DS) reports are used to release extended descriptions about data or a project as well as other noninterpretive information, including user manuals, but should not include the data. DS reports also may include interpretation of what the data or observations mean, but such products also should not include the data. Noninterpretive DS reports should be approved at the Science Center Director level; interpretive DS reports must be approved by a BAO.
  • A data release is not used to release preliminary data.

Digital Object Identifier (DOI)

  • A unique, persistent identifier that is permanently assigned to a specific electronic resource and remains tied to that resource, no matter where the resource is located.
  • Required for USGS data approved for release.
  • Enhances access, discovery, and reuse of USGS scientific data and research results by others.
  • DataCite DOIs are used for data releases and can be reserved and created by using the tool. These DOIs are for USGS science products only and should only resolve to USGS servers.
  • Crossref DOIs are used for USGS series publications and are assigned by USGS Science Publishing Network staff when files are prepared for publication.
  • The DOI links to the product landing page (refer to "Landing Page" below).

Metadata

  • Information about data or a dataset that helps the user understand and use the data.
  • Must follow a standard format such as that from the FGDC’s CSDGM or ISO (ISO19115-1 with ISO19115-3 preferred or ISO19115-2 accepted).
  • Tools and best practices to help authors create metadata are available here.
  • Does not include analysis or interpretation of the data.
  • Must include a DOI for the data being described in the record.
  • Must include as much information as possible to allow an understanding of how the data were processed and structured and to describe other details that enable use of the data.
  • Descriptive information that goes beyond the information contained in a complete metadata record should be released in a separate USGS series publication such as a DS report, scientific investigations report, open-file report, techniques and methods report, or outside publication and linked to the data release by identification in the metadata record associated with the data release.
  • Metadata must be reviewed before data and metadata are released (SM 502.7). This review can be done by the same person who reviews the data. Comments and resolution documentation must be submitted through the USGS IPDS as part of the data release approval process.
  • Metadata for a given released data set must be included in the USGS Science Data Catalog.

Web Page

  • As defined in SM 601.1, a web page may be static, dynamic, a form, or an application interface. A web page may contain text, images, sounds, and video that are viewed through a web browser, at a single unique URL.
  • A web page cannot be used to release new interpretive material, except in certain applications, such as an interactive web-based map or model simulations based on real-time data that cannot be effectively released in a USGS publication series or other information product. For these interactive applications, a minimum of two peer reviews and approval by a BAO in the USGS Office of Science Quality and Integrity is required and these reviews and approval must be entered in the IPDS.
  • Web pages with content based on previously published information products must also be entered into the IPDS and are approved by the Science Center Director or designee and the need for peer review determined by the Center Director.
  • Web pages may not be assigned Crossref or DataCite DOIs unless they are associated with an approved USGS-owned data release, online database, or web data service.

Landing Page

  • A landing page is a unique, persistent, web page for each data release that the DataCite DOI points to rather than either to the dataset itself or to the metadata record. Access to the data and metadata must be made available from the landing page. The landing page should not be an exact replica of the metadata record itself. Similarly, a USGS generated Crossref DOI points to a publication’s citation (or landing) page in the USGS Publications Warehouse.
  • A landing page may include information such as a description of the dataset; versions available; contextual guidance, caveats, and documentation; any restrictions; links to tools and software associated with the dataset; related datasets; the dataset citation; metadata; and links to studies and interpretations based on the dataset.

 

Related References and Tools

 

« Return to FSP Procedures and Guidelines