Office of Science Quality and Integrity

Guidance on Documenting Revisions to USGS Scientific Digital Data Releases

Updated August 2020

Purpose

This guidance describes a revision process for scientific digital datasets (hereafter referred to as data) and associated metadata that have been released as U.S. Geological Survey (USGS) information products. This guidance supplements USGS Fundamental Science Practices (FSP) requirements for metadata and data releases in Survey Manual (SM) chapters SM 502.7 and SM 502.8.

Released data revisions are characterized as Level 1, Level 2, Level 3, Level 4, or Level 5, similar to the characterization of levels for revisions to USGS publication series information products. The procedures for documenting released data revisions vary depending on the level of revision needed.

The guidance covers USGS data that have been previously released. Not covered in this guidance are USGS approved databases and data services as defined in SM 502.8 because they have other approved processes in place for making revisions, including data quality evaluation prior to data being uploaded to the database system or data service. Examples of these systems or services include National Water Information System (NWIS-Web), USA National Phenology Network (USA-NPN), and Biodiversity Information Serving Our Nation (BISON).

Criteria for Determining Revision Level

The revision level (1, 2, 3, 4, or 5) depends upon whether the changes could affect outcomes of data use (including automated access) and on the proportion of the data that need to be corrected. The data author, their supervisor, the data manager, and the Center Director should collaborate to determine the level of revision needed. Levels 2-5 require Center Director approval. The following are criteria to determine the level of revisions. Refer to the appropriate level and the summary table for details on how to complete the revision process.

  • Level 1 revisions do not change the data itself.
  • Level 2 revisions are changes that are not expected to have a significant impact on the use of the data, and only apply to a small number of data values.
  • Level 3 revisions are data-appending revisions, usually adding new data records without changing the data structure. If the addition of new data would cause issues for automated access to the previously available data, refer to Level 4. 
  • Level 4 revisions are changes to data structure that are expected to cause issues for existing automated processes that have been using the data.
  • Level 5 revisions are changes that are expected to have such a significant impact on the use of the data that the original data must be withdrawn.

Specific guidance on each of the levels of revision, as well as a table that simplifies the instructions for each level, are found below.

Level 1 Revision

A Level 1 revision does not change the data. These revisions can include changes to the metadata record that do not affect understanding of the data; changes to data files that do not involve modifying the data itself; or changes to the data landing page.

The following are examples of Level 1 revisions:

  • Changes in the metadata record to add new keywords, contact information, or a clarification of a processing step in the metadata, or a link to a new publication.
  • Changes in a data file to correct a misspelling in a data header or in a site location name.
  • Changes in a data landing page to correct a misspelled word in the title or abstract, or to revise one of the contacts listed.

These revisions can be made by replacing or updating the erroneous file or text and updating the metadata record and any additional supporting documentation.

Level 2 Revision

A Level 2 revision requires creating a new version of the released dataset. The changes for a Level 2 revision, however, should not significantly impact the use of the data.

The following are examples of Level 2 revisions:

  • Adding five data values that were missing in the original released dataset.
  • Correcting transposed latitude and longitude values for geospatial locations in the metadata record.
  • Modifying a polygon shapefile by slightly shifting a line, so that a boundary is consistent with the boundary in another polygon shapefile that was subsequently released.

Level 3 Revision

A Level 3 revision updates the released dataset to include additional data, which might be from a new time period, place, or field activity. Required actions depend upon the expected status and persistence of the previous version. Level 3 actions, listed in the chart below, assume the data author has decided the user will not have access to a previous version of the data.

The following are examples of Level 3 revisions:

  • Releasing data in stages to meet project timelines and increase the amount of data provided in an information product.
  • Appending a year of data to a time series dataset.

If the data author determines that access to the previous version should persist, follow instructions for Level 4 in the chart.

Level 4 Revision

A Level 4 revision modifies the data in a way that may cease support of previously enabled automated access to the released dataset, for example by services, workflows, and Application Programming Interfaces (APIs). This may break existing dependent downstream services and processes. Level 4 actions, listed in the chart below, assume access to the previous version is retained and a revised version of the data will be made available.

Level 4 revisions can include:

  • Modifying a data structure to change a format of a column of values.
  • Changing column header names.

Level 5 Revision

For a Level 5 revision, released data are significantly and substantially changed, and are expected to have a significant impact on the use of the data. These data, and any prior versions, must therefore be withdrawn and the new version of the data is a new data release with a new DOI.  

The following are examples of Level 5 revisions:

  • Correcting a large number of data values when an error is discovered in an algorithm used for calculating a column of numbers.
  • Correcting an error in a processing step, for example, a new data release of a bathymetry grid is prepared after an error is detected in the processing step that applied tide corrections.
  • Correcting errors to an underlying data source.

Version Numbering

Version numbers consist of two parts—a major component and a minor component, separated by a period. The originally released data is considered version 1.0, although the version annotation is not used unless a revision is made. Either the major component or the minor component of the version number will be incremented when a new version is released.

For example in “version 1.3,” the number to the left of the period, “1,” is the major component and the number to the right of the period, “3,” is the minor component and represents the number of separate Level 2 revisions. Level 2 revisions, regardless of how many there are, do not initiate a change in the major component of the version number. For instance, if the released data was revised on seven separate occasions for Level 2 revisions, the new version will be numbered “version 1.7.”

For “version 2.0,” a Level 3 revision was completed, and thus the major component number (“2”) was increased by one number and the minor component was reset to zero (“0”).

Preserving Withdrawn Versions of Data

Versions of data that have been withdrawn are no longer publicly offered but may be made available to users on request. Because previous versions may have been used to support scientific conclusions in an associated publication or in a policy decision, their preservation is essential to ensure provenance. Potential preservation locations may include a dark archive (an offline location for preservation) or a non-public web page in a repository. The landing page for previous versions should make clear that the released data have been superseded. If frequent, minor revisions of large data files are anticipated, the Science Center or Program should consider investing in an automated version management system that can automatically recreate each prior version by generating a standard revision history file, rather than manually archiving each version.

Examples

The following examples show various notations for documenting data revision changes on the data release landing page.

1. Examples of citation changes:

Original citation:

Klunk, O.T., 2012, Bathymetry of the Bermuda Triangle: U.S. Geological Survey data release, https://doi.org/10.5066/123456 (non-working link for example use only).

Revised citations:

Level 2 Example: Klunk, O.T., 2012, Bathymetry of the Bermuda Triangle (ver. 1.1, May 2013): U.S. Geological Survey data release, https://doi.org/10.5066/123456 (non-working link for example use only).

Level 3 Example: Klunk, O.T., 2012, Bathymetry of the Bermuda Triangle (2012-2013) : U.S. Geological Survey data release, https://doi.org/10.5066/123456 (non-working link for example use only).

Level 4 Example (with new DOI): Klunk, O.T., 2012, Bathymetry of the Bermuda Triangle (ver. 2.0, May 2013): U.S. Geological Survey data release, https://doi.org/10.5066/666999 (non-working link for example use only).

Level 5 Example: Klunk, O.T., 2018, Bathymetry of the Bermuda Triangle Using 2017 Datum: U.S. Geological Survey data release, https://doi.org/10.5066/658568 (non-working link for example use only).

Note that version changes can be indicated by adjusting a date range or appending a version number and date. Additionally, the publication year should reflect the year that the original version was released. The exception is Level 5, which represents a new data release. Citations in the metadata and on the landing page should match.  

2. Example of a list of version release dates and version numbers as it would appear on the landing page:

First release: 2012

Revised: July 2012 (ver. 1.1)

Revised: May 2013 (ver. 2.0)

3. Examples of revised data releases:

Pendleton, E.A., Ackerman, S.D., Baldwin, W.E., Danforth, W.W., Foster, D.S., Thieler, E.R., and Brothers, L.L., 2014, High-resolution geophysical data collected along the Delmarva Peninsula, 2014, USGS Field Activity 2014-002-FA (ver. 4.0, October 2016): U.S. Geological Survey data release.

Pinzari, C.A. and Bonaccorso, F.J., 2018, Hawaiian Islands Hawaiian Hoary Bat Genetic Sexing 2009-2018 (ver. 3.0, November 2019): U.S. Geological Survey data release, https://doi.org/10.5066/P9R7L1NS.

Instructions for Revision Levels Table

Instructions Level 1 Level 2 Level 3 Level 4 Level 5
IPDS          
  • Create a new data release record in the IPDS and complete the review and approval steps for the new data release record in IPDS
  x x x x
Digital Object Identifier (DOI)refer to detailed instructions for DOIs when data are versioned           
  • Retain and update (e.g., dates of versioned release) original DOI for the revised data release
x x x    
  • Create a new DOI for new version (ensure link to previous version’s DOI)
      x* x
  • Update the original DOI (ensure link to new DOI) in the DOI Tool
      x x**
Version Number          
  • Assign a version number to the revised dataset or update the existing version number in metadata, DOI and landing page
   x (e.g. 1.1)  x (e.g. 2.0)  x (e.g. 2.0)  
Landing Page          
  • Maintain original landing page
x x x x x
  • Upload revised files to original landing page
x x x    
  • Create a new landing page for the revised version; add version history 
      x x
  • Update citation and title on original landing page; add version history including a link to new version of data
  x x x  
  • Original landing page becomes tombstone page without data (original metadata remains on tombstone page)
        x
  • Add instructions for obtaining prior versions.
  x x   x**
Metadata          
  • Revise and replace previous version metadata record; document and link to the newer version. 
x x x x  
  • Create new metadata record for new landing page. Previous version metadata remains on original landing page.
      x x
  • Revise the title by which the data are known to communicate the update, for example by appending the new version number and/or a revision date.
  x x x  
  • Ensure that the metadata date in the metadata record is updated from the previous version and the newest version is provided to the USGS Science Data Catalog.
x x x x  
Additional Requirements          
  • Preserve the previous version of the dataset and metadata in accordance with records management and litigation holds requirements in case that version is needed to understand any information that was based on it. Refer to the "Preserving Withdrawn Versions of Data" section for additional guidance.
x x x x x
x x x x x

 

* (Level 4) Create a new DOI (ensure link to old DOI): At the landing page associated with the previous version, indicate that a newer version of the data is available and provide the linked DOI to the new version. The notice should also indicate the period of time for which the previous version will continue to be available.

** (Level 5) Update the original DOI (ensure link to new DOI) in the DOI Tool (and) Add instructions for obtaining prior versions: In this instance the “original DOI” refers to the DOI for the withdrawn data. The landing page for the withdrawn (superseded) data details why the data were withdrawn, points to the landing page for the replacement data set, and explains how to access the prior data. The new landing page should provide citation for the superseded dataset.