Office of Science Quality and Integrity

Guidance on Documenting Revisions to USGS Scientific Digital Data Releases

Updated October 4, 2019

Purpose

This guidance describes a formal revision process for scientific digital data and associated metadata that have been released as USGS information products. This guidance supplements U.S. Geological Survey (USGS) Fundamental Science Practices (FSP) requirements in SM 502.7 and SM 502.8.

Data release revisions are characterized as Level 1, Level 2, Level 3, or Level 4, similar to the characterization of levels for revisions to USGS publication series products. The procedures for documenting data release revisions vary depending on the level of revision.

This guidance covers individual USGS datasets. Not covered in this guidance are USGS approved databases and data services as defined in SM 502.8 because they have other approved processes in place for making revisions, including data quality evaluation, prior to data being uploaded. Examples of these systems or services include National Water Information System (NWIS-Web), USA National Phenology Network (USA-NPN), and Biodiversity Information Serving Our Nation (BISON).

Reasons for Revisions

The reason for revising a data release will guide the process of review and approval. The revision level (1, 2, 3, or 4) depends upon whether the changes could affect outcomes of future data use and on the proportion of the data that needs to be corrected.

  • Level 1 revisions are changes to the metadata record that do not affect the understanding of the data, changes to data files that do not involve modifying the data itself, and changes to a landing page.
  • Level 2 revisions are changes that are not expected to have a significant impact on the use of the data, and apply to a small number of data values. Examples include adding negative signs to one or two values in the data; adding five values that were missing from the original data release; or making corrections to transposed latitude and longitude values in the metadata record.
  • Level 3 revisions are data-appending revisions, that is, adding new data records without changing the data structure. A primary example is the release of data in stages to meet project timelines and increase the amount of data provided in an information product.
  • Level 4 revisions are changes that are expected to have a significant impact on the use of the data, including changing a large number of data values, such as correcting an error in the formula for calibrating the data. Changes to the data structure are also Level 4 revisions. These revisions might add new tables to a data release that is structured as a database, or add new variables to a table. These revisions are appropriate for data releases that are standalone research products, rather than for data that are foundations of associated or companion scientific publications, or a policy decision.

Level 1 Revision

A Level 1 revision does not change the dataset. The following are examples of Level 1 revisions:

  • Changes in the metadata record to add new keywords, contact information, or a link to a new publication.
  • Changes in a data file to correct a misspelling in a data header or in a site location name.
  • Changes in a data landing page to correct a misspelled word in the title or abstract, or to revise one of the contacts listed.

These revisions can be done by replacing or updating the erroneous file or text and updating the metadata record and any additional supporting documentation. Ensure that the updated metadata record replaces the previous version provided to the USGS Science Data Catalog.

Although it is a good practice to have an independent reviewer check to ensure that no errors were introduced during the revision process, review and approval for Level 1 revisions do not need to be documented in the internal USGS Information Product Data System (IPDS).

Level 2 Revision

A Level 2 revision creates a new version of the data release that will normally be used instead of the previous version. The changes for a Level 2 revision, however, should not significantly impact the use of the data. The following are examples of Level 2 revisions:

  • Adding negative signs that were omitted from one or two data values in the original data release.
  • Adding five data values that were missing in the original data release.
  • Correcting latitude and longitude values for geospatial locations that were transposed in the metadata record.
  • Modifying a polygon shapefile by slightly shifting a line, so that a boundary is consistent with the boundary in another polygon shapefile that was subsequently released.

Science Center approving officials for data releases should be consulted if help is needed to distinguish between Level 2 and Level 4 error corrections, in recognition of the differences in methods among scientific disciplines. Level 2 review and approval not only focus on the sections of the data release that are corrected but also identify any inadvertent changes made to other sections as a consequence of the corrections.

When a Level 2 revision is needed, the following actions are required:

     1. Create a new data release record in the IPDS and complete the review and approval steps for the new data release version. Review and approval should focus on the new or corrected sections but also identify any inadvertent changes made to other sections as a consequence of the modification. The new IPDS record is used to ensure that the requirements of SM 502.7 and SM 502.8 have been met.

     2. Do not create a new Digital Object Identifier (DOI). The existing DOI should be used for the revised data release. If the data must be removed from public access for any period of time during the revision process, the DOI should be directed to a “temporary tombstone page,” explaining that the release is being revised and will be available again soon. Most Trusted Digital Repositories should be able to provide this messaging on the existing landing page of the data release, without needing to change the Location URL of the DOI.

     3. Assign a version number to the revised data release product or update the existing version number, for example change version 1.1 to version 1.2, and revise the title of the data release in the recommended citation and the metadata file to include the new version number. Refer to the "Examples" section.

     4. Revise the metadata record as follows:

          a. Add processing steps that describe the changes.
          b. Insert the version number and version release date into the title and recommended citation.
          c. Update the metadata revision date.
          d. Add instructions for obtaining prior versions.
          e. Provide the revised metadata record to the USGS Science Data Catalog.

     5. Modify the landing page as follows:

          a. Point users to the new version of the data and metadata.

          b. Include a list of version numbers and version release dates.

          c. Link to a revision history text file that provides a detailed description of the changes and a justification for making the changes.

     6. Once the new version is published, update the DOI in the DOI Tool as follows:

          a. Login to the DOI Tool, open the DOI, and go to the Supplemental Information tab.

          b. In the section “Dates Relevant to the Data,” add a Date Type of ‘updated’ and pair it to a Date that denotes the mm/dd/yyyy of the published update.

          c. Click the ‘Add’ button.

          d. On the ‘Manage Record’ tab, update the Title to include the version number (refer to action 3 above).

          e. Click ‘Update Published Record in DataCite’ in the left menu.

     7. Preserve the previous version of the data in accordance with records management and litigation holds requirements in case that version is needed to understand any information that was based on it. Refer to the "Archiving Prior Versions of Data" section for additional guidance.

     8. If the revision could affect scientific conclusions in an existing USGS publication, consult your assigned Bureau Approving Official (BAO) in the Office of Science Quality and Integrity (OSQI) for guidance.

Level 3 Revision

For a Level 3 revision, the data are updated to include additional data, which might be from a new time period, place, or field activity. Level 3 review and approval focus on the new data that are added, but also identify any inadvertent changes made to other sections as a consequence of the appended data.

When a Level 3 revision is needed, the following actions are required:

     1. Create a new data release record in the IPDS and complete the review and approval steps for the new data release version. Review and approval should focus on the new sections but also identify any inadvertent changes made to other sections. The new IPDS record is used to ensure requirements in SM 502.7 and SM 502.8 have been met.

     2. Do not create a new Digital Object Identifier (DOI). If the data must be removed from public access for any period of time during the revision process, the DOI should be directed to a “temporary tombstone page,” explaining that the release is being revised and will be available again soon. Most Trusted Digital Repositories should be able to provide this messaging on the existing landing page of the data release, without needing to change the Location URL of the DOI.

     3. Assign a version number to the revised data product and revise the title of the data release in the recommended citation and the metadata file to include the new version number. The change in the version number for Level 3 revisions is usually done by changing the number before the decimal point, for example, changing version 1.1 to version 2.0. Refer to the "Examples" section.

     4. Once the new version is published, update the DOI in the DOI Tool as follows:

          a. Login to the DOI Tool, open the DOI, and go to the Supplemental Information tab.
          b. In the section “Dates Relevant to the Data,” add a Date Type of ‘updated’ and pair it to a Date that denotes the mm/dd/yyyy of the published update.
          c. Click the ‘Add’ button.
          d. On the ‘Manage Record’ tab, update the Title to include the version number (refer to action 3 above).
          e. Click ‘Update Published Record in DataCite’ in the left menu.

     5. Revise the metadata record as follows:

          a. Add processing steps that describe the changes.
          b. Insert the version number and version release date into the title and recommended citation.
          c. Update the time period information to address the dates of the newly appended data.
          d. Update the metadata revision date.
          e. Add instructions for obtaining prior versions.
          f. Provide the revised metadata record to the USGS Science Data Catalog.

     6. Modify the landing page as follows:

          a. Point users to the new version of the data and metadata.
          b. Include a list of version numbers and version release dates.
          c. Link to a revision history text file that provides a detailed description of the changes and a justification for making the changes.

     7. Preserve the previous version of the data in accordance with records management and litigation holds requirements in case that version is needed to understand any information that was based on it. Refer to the "Archiving Prior Versions of Data" section for additional guidance.

     8. If the revision could affect scientific conclusions in an existing USGS publication, consult your assigned BAO for guidance.

Level 4 Revision

For a Level 4 revision, the data structure is modified, or data are significantly and substantially changed. Review and approval focus on the new structure and the new data, but also identify any inadvertent changes made to other sections as a consequence of the revisions. The following are examples of Level 4 revisions:

  • Modifying a data structure to allow inclusion of a new table or column of values.
  • Correcting a large number of data values when an error is discovered in an algorithm used for calculating a column of numbers.
  • Correcting an error in a processing step. For example, a new data release of a bathymetry grid is prepared after an error is detected in the processing step that applied tide corrections.
  • Updating or changing the underlying authoritative data source.

When a Level 4 revision is needed to address a modification to the data structure, the following actions are required:

     1. Create a new data release record in the IPDS and complete the review and approval steps for the new data release version. Review and approval should focus on the new or corrected sections but also identify any inadvertent changes made to other sections. The new IPDS record is used to ensure that the requirements of SM 502.7 and SM 502.8 have been met.

     2. Create a new DOI for this new version.

     3. Update the status of the DOI for the previous version in the USGS DOI Tool as follows:

          a. Change the URL associated with the previous DOI to a web page (a ‘tombstone URL’) that explains the reason for the new version and provides the new DOI.

          b. Update the Date information on the Supplemental Information tab of the DOI Tool as follows: change Date Type to ‘withdrawn’ and enter or update the date (YYYY-MM-DD) to designate the date that the data were removed from public access.

          c. On the Supplemental Information tab, create a related identifier within the DOI records for the previous DOI and the new DOI, using the Relationship Type pair “Obsoletes/isObsoletedBy.” In the record for the previous DOI, assign the relationship ‘isObsoletedBy’ and enter the URL for the new DOI. In the record for the new DOI, assign the relationship ‘Obsoletes’ and enter the URL for the previous DOI.

Note: there may be cases when it is appropriate to leave a previous version of the dataset accessible online, thus eliminating step 3. Consult your assigned BAO if you have questions.

When a Level 4 revision is needed to correct significant and substantial errors in the dataset the following actions are required:

     1. Remove access to the data and metadata from the public landing page (for example, in a repository) and provide notice on the page to users that the data have been withdrawn.

     2. Preserve the previous version of the data in accordance with records management and litigation holds requirements in case that version is needed to understand any information that was based on it. Refer to the "Archiving Prior Versions of Data" section for additional guidance.

     3. Login to the USGS DOI Tool, and update the DOI for the original data release. Update the Date information on the Supplemental Information tab of the DOI Tool as follows: add a Date Type ‘withdrawn’ and the date YYYY-MM-DD to designate the date upon which the data were removed from public access.

     4. Create a new data release record in the IPDS and complete the review and approval steps for the new data release version. Review and approval should focus on the new or corrected sections but also identify any inadvertent changes made to other sections. The new IPDS record is used to ensure that the requirements of SM 502.7 and SM 502.8 have been met.

     5. Create a new DOI for this new version. On the Supplemental Information tab, establish a Related Identifier linkage between this new DOI and the DOI for the withdrawn previous version. Assign the relationship ‘Obsoletes’ and enter the URL for the previous DOI.

     6. Reopen the DOI of the withdrawn version of the data release. On the Supplemental Information tab, assign the relationship ‘isObsoletedBy’ and enter the URL for the new DOI.

     7. Determine the version number for the revised data product. The change in the version number for Level 4 revisions is usually done by changing the number before the decimal point, for example, changing version 1.1 to version 2.0. Refer to the "Examples" section.

     8. Revise the metadata record as follows:

          a. Add processing steps that describe the changes.
          b. Insert the version number and version release date into the title and recommended data citation.
          c. Update the metadata revision date.
          d. Add instructions for obtaining prior versions.
          e. Provide the revised metadata record to the USGS Science Data Catalog.

     9. Create a new landing page for the new version of the data release:

          a. Include a list of version numbers and version release dates.
          b. Link to a revision history text file that provides a detailed description (for example, see ‘Version History 2.0’ link for data release https://doi.org/10.5066/F7542MHG) of the changes and a justification for making the changes.

     10. Complete the new DOI with the Location URL of the new landing page, and publish the DOI.

     11. Return to the landing page of the withdrawn data release. Provide a detailed description that gives information on the reason for the revision and uses the new DOI to point the user to the landing page of the new version of the data release.

     12. If the revision could affect scientific conclusions in an existing USGS publication, consult your assigned BAO for guidance.

More About Version Numbering

Version numbers consist of two parts--a major component and a minor component, separated by a period. The original release is considered version 1.0, although the version annotation is not used if no subsequent versions are released. Either the major component or the minor component of the version number will be incremented when a new version is released.

In the example “version 1.2,” the number to the left of the period, “1,” is the major component and the number to the right of the period, “2,” is the minor component and represents the number of separate Level 2 revisions. Level 2 revisions, regardless of how many there are, do not initiate a change in the major component of the version number. For example, if the data release was revised on seven separate occasions for Level 2 revisions, the new version will be numbered “version 1.7.”

In the example “version 2.0,” a Level 3 revision was completed, and thus the major component number (“2”) was increased by one number and the minor component was reset to zero (“0”).

Preserving Prior Versions of Data

When data releases are replaced with a new version, the previous versions are not publicly offered but may be made available to users on request. Because previous versions may have been used to support scientific conclusions in a publication or a policy decision, it is essential to preserve them, for example in a dark archive (an offline location for preservation) or on an inaccessible page in a repository. The file name and accompanying documentation for previous versions should make clear that the data have been superseded. If frequent small revisions of large data files are anticipated, the science center or program should consider investing in an automated version management system that can automatically recreate each prior version by processing a standard revision history file, rather than manually archiving each version.

Examples

The following examples show various notations for documenting data revision changes on the data release landing page.

1. Examples of citation changes:

Original citation:
Klunk, O.T., 2012, Bathymetry of the Bermuda Triangle: U.S. Geological Survey data release, https://doi.org/10.5066/XXXXXXXX.

Revised citations:
Klunk, O.T., 2012, Bathymetry of the Bermuda Triangle (ver. 1.1, July 2012): U.S. Geological Survey data release, https://doi.org/10.5066/XXXXXXXX.
Klunk, O.T., 2012, Bathymetry of the Bermuda Triangle (ver. 2.0, May 2013): U.S. Geological Survey data release, https://doi.org/10.5066/XXXXXXXX.
Note that the data product title and DOI do not change but that version information is added. Additionally, the publication year should reflect the year that the original version was released. Include the new version number and version year in parentheses in the citation.

2. Examples of version release dates and version numbers:
First release: 2012
Revised: July 2012 (ver. 1.1)
Revised: May 2013 (ver. 2.0)

3. Example of revision history:
A revision history text file that concisely describes what changed in each revision is needed. For an example, refer to Pendleton, E.A., Ackerman, S.D., Baldwin, W.E., Danforth, W.W., Foster, D.S., Thieler, E.R., and Brothers, L.L., 2014, High-resolution geophysical data collected along the Delmarva Peninsula, 2014, USGS Field Activity 2014-002-FA (ver. 4.0, October 2016): U.S. Geological Survey data release.