Skip to main content
U.S. flag

An official website of the United States government

Data Release Instructions

This workflow is intended for USGS authors who are publishing data through ScienceBase.

 

Instructions

1.  Prepare data and metadata
2.  Create a new record in IPDS
3.  Create a new landing page
4.  Finalize metadata
5.  Decide how to organize and display data and metadata
6.  Upload files and edit the landing page
7.  Format citation
8.  Final steps

Frequently Asked Questions

Links to additional information

 

1. Prepare data and metadata

A data release should contain only 1) data and 2) metadata.

1) Data:

  • A best practice is to release data in an open, machine-readable format. For example, tabular data in .csv or .txt format is preferable to Excel.
  • Data obtained from published sources do not need to be included - simply document the source and methods in your metadata.
  • Proprietary or sensitive data should not be included.

2) Metadata:

  • Metadata should be in XML format
  • Metadata should also conform to an FGDC-endorsed metadata standard, FGDC CSDGM* or ISO**.
          *Federal Geographic Data Committee Content Standard for Digital Geospatial Metadata
          **International Organization for Standardization
  • See step 4 below for information on how to finalize metadata for ScienceBase.

Resources:

Additional Notes:

USGS Fundamental Science Practices (FSP) guidance states that "a data release is an information product that is non-interpretive and does not include extended descriptions beyond what is required in the full metadata record." Extended text descriptions, figures, maps, and files in PDF format are more appropriate for USGS series publications handled by the USGS Science Publishing Network (SPN).

 

2. Create a new record in IPDS

Data and metadata for a data release should be reviewed and approved according to the USGS Fundamental Science Practices (FSP) process. The USGS uses its Information Product Data System (IPDS) to track the data and metadata review process. 

  • When you create a new record in IPDS, select "Data Release" in the Product Type dropdown menu. 
  • New records in IPDS are assigned an IP number.
  • Each new data release should correspond to one IP number. 

Resources:

  • For more information on data and metadata review, see the review checklists on the USGS data management website.

Additional Notes:

Data releases often have associated manuscripts that also go through review. In these cases, the review processes are separate. There should be an IPDS record for the data release and another for the manuscript.

 

3. Create a new landing page

You can create a new data release landing page via the ScienceBase Data Release Tool.

  • Sign in to the ScienceBase Data Release Tool.
  • Follow the form instructions to provide basic information about the data you are releasing.
  • When the form is submitted successfully, you will receive an automated email with a link to your new landing page and a reserved Digital Object Identifier (DOI)

 

4. Finalize metadata

Note: if the following steps are not completed by the author, metadata will be finalized automatically by the ScienceBase team at the end of the process. 

These instructions are for metadata records in the Federal Geographic Data Committee (FGDC) Content Standard for Digital Geospatial Metadata (CSDGM) format. The USGS metadata creation tool, the Metadata Wizard, creates metadata in this format.

 

1) Check the title in your metadata record
  • The title element from the metadata record will be prominently displayed in search results of the USGS Science Data Catalog, Department of the Interior's data catalog, and data.gov.
  • The title listed in the metadata should be the full title of the dataset it is describing, not the filename. 
  • Choose a descriptive title for your dataset that incorporates who, what, where, why, and scale. For example, the title could be "(measurement) of (phenomenon) in (geographic feature) at (geographic location) in (time period)".
  • All metadata records should have a unique title, even if there are multiple metadata records in the data release.
  • Remove non-ASCII characters from metadata title and other special characters such as < > and &. See number 4 below.

 

2) Add your Digital Object Identifier (DOI)
  • USGS policy requires the use of a Digital Object Identifier (DOI) for data releases. Note: if you use the ScienceBase Data Release Tool to start a new data release, you will have the option to reserve a DOI for your landing page. 
  • Add the full DOI URL (e.g., https://doi.org/10.5066/P9R7L1NS) to the online linkage element (<onlink>) in the citation information section of your metadata (instructions).
  • Optional: add the DOI URL to the network resource element (<networkr>) in the distribution section (instructions). Note: some advanced metadata authors use this field for data download links. 

 

3) Add the following distribution information to your metadata
  • Distribution liability statement: please select the USGS disclaimer statement(s) that are relevant to your data release (instructions). Disclaimer statements are available on the FSP website.
     
  • Distribution contact information: please add ScienceBase as the distribution contact (instructions).
    • In the Metadata Wizard, the distribution contact information is set to ScienceBase by default.

   Contact Organization and/or Contact Person: "U.S. Geological Survey - ScienceBase"
   Contact Address: "Denver Federal Center, Building 810, Mail Stop 302" "Denver" "CO" "80225"
   Contact Phone: "1-888-275-8747"
   Contact Email: "sciencebase@usgs.gov"

 

4) Check for non-ASCII characters in your metadata
  • Non-ASCII characters (curly quotes, em dash, en dash) and other special characters (greater than and less than signs, ampersands) can be problematic in downstream applications, such as USGS webpages.
  • Please avoid using non-ASCII characters and other special characters in your metadata title and abstract. The ScienceBase data release team will have to modify these characters in order to save your title and abstract to your digital object identifier (DOI).
  • Check your content for non-ASCII characters using an ASCII validation tool such as Online ASCII Tools.

 

5) Add keywords from a controlled vocabulary to your metadata
  • Standardized keywords improve discoverability of your data and help data catalogs like the Science Data Catalog tag and organize your metadata.
  • The USGS Thesaurus is the recommended source for keywords. The Metadata Wizard can auto-populate your metadata with keywords from USGS controlled vocabularies.

 

5. Decide how to organize and display data and metadata

ScienceBase data releases can be organized in several ways. Data authors can choose the approach that works best for their product. The optimal organization often depends on the number of data and metadata files. 

Note: please upload only one metadata record per page in ScienceBase (it is possible to upload additional records if they are in zipped files). This is because the USGS Science Data Catalog, which harvests metadata records from data releases in ScienceBase, can only pull one metadata record from a page.

► If you have one metadata record to describe your data, upload your files (both data and metadata) directly to the landing page (example).

► If you have multiple metadata records and data sets, you have two options:

  1. Upload data and metadata directly to the landing page in zipped bundles (example). There should be one metadata record uploaded separately - a summary metadata record that describes the entire data release. The summary metadata record will be the only one harvested by the Science Data Catalog.
  2. Create subpages that are nested under the landing page (example). Use this option if you would like your data sets to be independently discoverable. Nested pages in ScienceBase are called "child items". To create a new child item, click the "Add" dropdown menu, then select “Add Child Item”. On each child item, upload one metadata record and its associated data file(s).
    • Note: a best practice is to also upload a summary metadata record to the landing page to describe the entire product.
    • Note: all unzipped metadata records will be harvested by the Science Data Catalog, including those attached to child items (please make sure all child item metadata records have unique and descriptive titles).
  • Adding an image to a data release: if you would like to display an image on a ScienceBase page, upload the image in .JPG or .PNG format. The image will be automatically displayed on the page.
     
  • ScienceBase can generate web services for certain geospatial file types: shapefiles, GeoTIFFs and ESRI Service Definition (.SD) files. The web services can be used to serve the data to outside applications and to display the data in the preview map on a ScienceBase page. For more information, see the ScienceBase Geospatial Services page.

Resources: This  tutorial video  can help you determine the best way to structure and document your data releases.

 

6. Upload files and edit the landing page  

Note: the current file size limit for uploads in ScienceBase is about 30 GB. If your file sizes exceed 30 GB per file, please contact sciencebase@usgs.gov. Also note that there is a 100 file limit for the number of files that can be attached to a single item. 

► The most efficient way to populate an empty ScienceBase page is to start by uploading an XML metadata record in an FGDC-endorsed format. Click the "Add" dropdown menu on the upper right side of the page, then select "Attach Files": 

Click "Add" dropdown menu, then "Attach Files"

When you upload a metadata record, ScienceBase will recognize the format and bring up a popup window to ask if you would like to pull content from the metadata:

screenshot of dialog window

Select "Yes" to automatically populate the key fields in the edit form. You may still need to manually edit some of the information. Click "Save" to save your changes.

► To edit your page, click the "Manage Item" dropdown menu on the upper right side of the page, then select "Edit Item":

screenshot showing option to edit ScienceBase item through the edit form

► To add a child item (subpage nested under the landing page), click the "Add" dropdown menu, then select "Add Child Item":

screenshot showing option to add child item to landing page in ScienceBase

► If you need to give additional people access to your ScienceBase item while it is private, click the "Manage Item" dropdown menu, then select "Manage Item Permissions". You can then search for ScienceBase user accounts and grant read/write permissions. 

Screenshot showing option to manage permissions for a page in ScienceBase

To share a private data release with people outside the USGS (e.g., for a journal review), click "Manage Anonymous Access Links" in the "Item Actions" section at the bottom of the page:

Screenshot showing option to manage anonymous access link for a ScienceBase page

You can generate a temporary URL to share with your reviewers, who can view the data release without having to sign up for a ScienceBase account. (Note: the data release will be locked for editing while the link is active).

 

7. Format citation

The data release citation should include the following information:

  • Each author (last name, first and middle initials)
  • Year
  • Title
  • Publication type (U.S. Geological Survey data release)
  • Digital Object Identifier URL

► ScienceBase can automatically generate citations from the content of uploaded metadata records, but the citation format usually needs to be modified. Please verify that automatically generated citations have the correct format and author order. The citation field can be edited in the first tab of the edit form.

Note: if a data release has child items, the ScienceBase team will propagate the landing page citation to all child items, so only the landing page citation should be edited.

Citation Examples:

  • Cartwright, J.M., 2015, Hydrologic and soil data collected in limestone cedar glades at Stones River National Battlefield, Tennessee: U.S. Geological Survey data release, https://doi.org/10.5066/F7NV9G9C.
  • Coates, P.S., Casazza, M.L., Ricca, M.A., Brussee., B.E., Blomberg, E.J., Gustufson, K.B., Overton, C.T., Davis, D.M., Niell, L.E., Espinosa, S.C., Gardner, S.C., and Delehanty, D.J., 2015, Integrating spatially explicit indices of abundance and habitat quality: an applied example for greater sage-grouse management: U.S. Geological Survey data release, https://doi.org/10.5066/F75D8PW8.

Resources:

Additional Notes:​ 
If the citation format is not correct, the ScienceBase team will reformat the citation for the authors before making the data release public.

 

8. Final steps

► When you are ready to make the data release public, please email sciencebase_datarelease@usgs.gov.

  • Note: if the data release is associated with a primary publication and you haven't yet provided the publication's DOI or IPDS, please include this information in your email.
  • A member of the ScienceBase team will check the data release against the checklist and share any recommendations they have. Please allow up to 2 business days for completion of this step. 

View Checklist

  • When the data release has been finalized, the ScienceBase team will:
         1. Make the data release public. (Public data releases are no longer open for modifications).
         2. Register the DOI so it's an active link.
  • Once you have been notified that your data are public, you can use the recommended citation on the landing page to cite your data
  • If you cite the data in a publication, please send the publication's citation to sciencebase_datarelease@usgs.gov so that it can be added to the landing page. ​

 

_____________________________________________________________________

Frequently Asked Questions

 

Where can I find information about how to create and/or review a metadata record?

  • The USGS data management website: https://www.usgs.gov/products/data-and-tools/data-management/describe-metadatadocumentation.
  • The USGS tool for metadata creation is the Metadata Wizard. Users fill out a form by answering questions about their data. They can then generate and output XML metadata records in the correct format. The Metadata Wizard has the ability to parse information from certain geospatial and tabular file types, as well as automate the process of describing column (and value) definitions.
  • The USGS Metadata Parser tool (https://mrdata.usgs.gov/validation/) allows users to validate an XML metadata file against the FGDC CSDGM standard and view it in an easy-to-read format.

 

How can I grant read/write permissions to USGS and non-USGS users while a data release is still in progress?

  • To give permissions to USGS employees and other users with ScienceBase accounts, select the "Manage Item" dropdown menu, then "Manage Item Permissions": 

    Screenshot showing option to manage permissions for a page in ScienceBase

    Select "Custom Permissions". Enter a user’s name or email address into the "User" text box. Wait for the autocomplete to find the user's ScienceBase account, then select it and click "Add".

    ScienceBase accounts are automatically created for users the first time they log in with their Active Directory credentials. If someone hasn't logged in to ScienceBase before, they won’t yet have an account. Users without Active Directory credentials can request a ScienceBase account if they are collaborating with USGS partners.

    If you would like to create a user group in ScienceBase for managing permissions, please contact sciencebase@usgs.gov.

  • To share a private data release with someone outside the USGS (e.g., for a journal review), click "Manage Anonymous Access Links" in the "Item Actions" section at the bottom of the page: 

    Screenshot showing option to manage anonymous access link for a ScienceBase page

    Select "Create New Anonymous Entry Link". This will create a temporary URL you can share with reviewers, allowing them to view the data release without having to sign up for a ScienceBase account. The data release will be locked for editing while the link is active. To unlock, select "Manage Anonymous Access Links" again and remove the link.

 

► What if I need to revise my data after they have been released?

The USGS Fundamental Science Practices (FSP) website describes procedures for documenting revisions to data releases. Please follow this guidance if you need to correct or add to published data. Contact the ScienceBase team at sciencebase_datarelease@usgs.gov when you are ready to update your data release. 

Here are examples of revised data releases in ScienceBase:

  • Pinzari, C.A. and Bonaccorso, F.J., 2018, Hawaiian Islands Hawaiian Hoary Bat Genetic Sexing 2009-2018 (ver. 3.0, November 2019): U.S. Geological Survey data release, https://doi.org/10.5066/P9R7L1NS.
  • Engott, J.A., 2018, Mean annual water-budget components for the Island of Oahu, Hawaii, for current conditions, 2001-10 rainfall and 2001-10 land cover (ver. 2.0, February 2018): U.S. Geological Survey data release, https://doi.org/10.5066/F72F7KH4.

 

Will ScienceBase send the XML metadata record(s) from my data release to the USGS Science Data Catalog?

Yes, by default ScienceBase will automatically perform this function for authors. Metadata records on the landing page and all child items will be sent to the USGS Science Data Catalog (SDC) after the data release is made public. 

Some science centers and programs have alternate methods of submitting metadata records to the SDC and may not wish for their records to be sent from ScienceBase. This option is also supported; ScienceBase keeps a list of these centers, and XML records associated with their data release products will not be sent from ScienceBase. If you would like to add your center to this list, please contact sciencebase_datarelease@usgs.gov.

 

Why is CSV format recommended instead of Excel?

Comma-separated values format (.csv) is preferable to Microsoft Excel format (.xlsx) because .csv is often more machine-readable and can be more easily incorporated into other workflows. While both .csv and .xlsx are considered open formats (that is, you don't need proprietary software to view them), .xlsx supports features that can make it less machine-readable. For example, if there are multiple worksheets in an Excel workbook or if some of the information is conveyed through formatting, it would be more difficult to use or work with the data in other applications (e.g. Python, R).

 

What is the file size limit for uploading and downloading files?

Files larger than 1GB should be uploaded using the ScienceBase Cloud Uploader tool available in the "Item Actions" section at the bottom of a ScienceBase page. While performance may still be dependent on users' local internet connections, files up to ~30 GB in size can be uploaded. 

 

► Can I release legacy data in ScienceBase?

Yes, but ScienceBase has a formal process for publicly releasing data, which enables the ScienceBase team to catalog, track, and update these resources in a uniform way. If you would like to release your legacy data in ScienceBase, you will need to go through FSP review and work with the ScienceBase team.

 

► A). My data release is associated with a publication. How will the two reference each other?

A). The citation will be added to the landing page in the "Related External Resources" section (see example). In associated publications, data release citations should be included in the reference section. USGS publications have links to their associated data releases at the top of their landing pages in the USGS Publications Warehouse.

B). I don’t have the publication’s citation yet, but I would like to release the data now. Can I add the citation at some point in the future?

B). Yes, a publication’s citation can be added to a data release at any time, even after it has been made public and the edit permissions have been restricted. If you would like to add a citation to a public data release, please send the citation to sciencebase_datarelease@usgs.gov (or to someone on the ScienceBase team) and we’ll add it to the landing page. If you’ve updated the metadata to include the publication’s citation, please also send the most recent version of the metadata and we’ll replace the metadata in the data release.

 

► Which repository should I use to release code? 

The repository for software is USGS GitLab (https://code.usgs.gov), a Git-based platform for software development (additional information). Users can mint a DOI using the USGS Asset Identifier Service to point to the software in GitLab.

If a data release has associated code (e.g., a Python script used to process the data), it can be included as part of the data release in ScienceBase. All code uploaded to ScienceBase must be well-documented.

 

► What repository services does ScienceBase provide for USGS data release products?

ScienceBase supports the following services:

  • Providing reliable access to public data release items
  • Curating landing page content
  • Creating multiple backups of data and metadata
  • Calculating checksums to ensure file integrity
  • Directing inquiries about the data to the point of contact listed for the data release

Science centers / data authors are responsible for the following:

  • Answering questions about the data
  • Correcting any errors discovered in the data
  • Records management and data archival responsibilities for internal Bureau purposes (e.g., Scientific Case Files) according to the USGS Records Program. These responsibilities extend beyond public data access requirements for open data. Contact your local Records Management Contact or the USGS Records Management Program at recman@usgs.gov for additional information.
  • Performing file format migrations or data transcriptions, if necessary

 

How can I see other data releases from my Science Center or from a particular period of time?

Check out the ScienceBase Data Release Summary Dashboard to see a breakdown of data releases by Mission Area, Region, and Science Center and to filter by time ranges. This dashboard uses ScienceBase's advanced querying capabilities to generate this information. Learn how to create these queries yourself here.

 

► How do I update my name on a previously published data release?

Once your name change has been updated in Active Directory, email sciencebase_datarelease@usgs.gov with a list of data release DOIs that need to be updated.