ScienceBase Updates - Spring 2023
Spring 2023 topics include information on the dynamic data release process in ScienceBase, making your data release more reusable, a tip on the ScienceBase training & resources SharePoint, and a featured data release on glaciers in Glacier National Park.
Table of Contents
- How to Make Your Data Release More FAIR – Reusable
- Featured Data Release
- Did You Know? SBDR Training Resources
Dynamic Data Release - What it is and what it isn't
The standard (abbreviated) process for releasing data in ScienceBase is as follows:
- The author finalizes their data and metadata
- The Center Director (or designee) approves the reviewed data and metadata in IPDS
- The author uploads their data and metadata to ScienceBase
- The ScienceBase Data Release (SBDR) Team finalizes and publishes the landing page and digital object identifier (DOI)
This process, however, isn’t always a good fit for projects that are continuously collecting data. There are three options for releasing data for these types of projects:
- Versioned Data Releases
- Provisional Data Releases
- Dynamic Data Releases
Versioned Data Releases
Versioned data releases require that the author go through the same steps as the standard data release process (review and approval in IPDS, and finalization with the SBDR Team). This approach works well for projects in which data collection is continuous, but processing and quality control steps are completed in discrete intervals (e.g., on a quarterly basis).
Provisional Data Releases
Provisional data releases are made publicly available before official review and approval in IPDS. These data releases include disclaimer statements to ensure that users are aware that these are not finalized data. Authors do still need to receive approval from their Center Director to release provisional data. The authors will release a final, static data release at the conclusion of the project. Provisional data releases work well when data are continuously collected and processed but not continuously quality controlled. There is usually an immediate need for access to the data. For example, provisional data releases can be appropriate in cases of natural disasters.
Dynamic Data Release
Dynamic data releases require that the data, metadata, and the processing and quality control steps be reviewed and approved before initial release. This approach is a good fit for data that are continuously processed and quality controlled as they are collected. The released data are in their final form and no final static release is planned. Survey Manual Chapter 502.8, section 7 outlines requirements for the review, approval and maintenance of USGS-owned online databases and web data services.
The process for releasing dynamic data releases is as follows and is subject to change:
- Authors finalize data collecting, processing, and quality control procedures
- Center Directors (or designees) approve the reviewed data, metadata, and maintenance processes in IPDS
- Authors send SBDR Team written approval from Center Director to release data as a dynamic data release
- Authors develop scripted approach to making updates to their ScienceBase data release
- The SBDR Team finalizes and publishes the landing page and DOI, but the authors retain edit permissions to the landing page
Dynamic data releases require a lot more upfront work compared to a standard data release or a versioned data release. Authors should not use this type of data release as a path to avoid data and metadata reviews.
The SBDR Team will be adding information on dynamic data releases to the SBDR Revision Trainings that they hold quarterly. The next quarterly training will be July 27, 2023 at 10:00 am MT. Contact sciencebase_datarelease@usgs.gov for more details.
How to Make Your Data Release More FAIR – Reusable
The FAIR (findable, accessible, interoperable, and reusable) guiding principles for data, first outlined in Wilkinson and others (2016), have quickly become a popular way to assess and improve the usability and utility of scientific datasets. However, it can be difficult to glean practical ways to implement the principles in your own data releases. In the last few ScienceBase Updates, we have explored a few small ways to make your data more FAIR. We will conclude this series with Reusable (see the Winter 2023 Updates for the piece on Interoperable).
Using the ScienceBase data release process and following USGS policy ensures that a few of the principles under Reusable are already fulfilled:
-
data releases published in ScienceBase are open and accessible
-
data releases contain the required USGS distribution liability statement
-
data releases are assigned a unique and persistent identifier (a DOI).
We’ve also covered using standard formats, like .xml, .csv, and .shp, in previous newsletters. Here are a few other simple ways to make your data more Reusable on ScienceBase.
Include recommended use/reuse limits in metadata
One way to make your data more reusable is to include recommended reuse limits in the metadata. This information can help potential users understand any limitations of your dataset and avoid misusing it. For example, if your dataset is limited to a specific geographic region or time period, you could specify this information in the metadata, along with recommended reuse limits based on these parameters. You could also include information about the accuracy and precision of your data, as well as any known biases. By providing this information upfront, you can help ensure that your data are used appropriately and effectively.
Make process/methodology information as detailed as possible
Another way to make your data more reusable is to provide detailed process/methodology information. This includes entity and attribute definitions, which should be defined as richly and specifically as possible. For example, if your dataset includes measurements of soil moisture, you could provide detailed information about the specific instruments used to collect the data, as well as any calibration procedures that were performed. You could also include information about any data processing or cleaning procedures, along with any assumptions or decisions that were made during the process. By providing this level of detail, you can help potential users understand how your dataset was collected and processed, and how it can be used effectively.
Link related resources
Finally, linking related resources can help make your data more reusable. This includes linking to larger works that your data may be a part of, as well as providing links on the landing page of your dataset. For example, if your dataset is part of a larger research project, you could provide a link to the project website or to related publications. You could also provide links to related datasets or tools that may be useful for analyzing your data. By providing these links, you can help potential users understand the context of your data and how it fits into the larger research landscape.
References:
Wilkinson, M.D., Dumontier, M., Aalbersberg, I.J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J.W., da Silva Santos, L.B., Bourne, P.E. and Bouwman, J., 2016. The FAIR Guiding Principles for scientific data management and stewardship. Scientific data, 3(1), pp.1-9.
Featured Data Release
USGS Data Owner: Northern Rocky Mountain Science Center
Fagre, D.B., McKeon, L.A., Dick, K.A., and Fountain, A.G., 2017, Glacier margin time series (1966, 1998, 2005, 2015) of the named glaciers of Glacier National Park, MT, USA: U.S. Geological Survey data release, https://doi.org/10.5066/F7P26WB1.
Glacier National Park is home to a collection of 37 named glaciers, with two more glaciers situated on nearby Flathead National Forest land managed by the U.S. Forest Service. A data release comprising a time series of digitized glacier margins for each of these glaciers was released in 2017. The polygons in the data release represent the primary body portions of the glaciers, as determined by analyzing aerial imagery from 1966, 1998, 2005, and 2015.
This data release is currently among the most frequently downloaded resources in ScienceBase (according to the SBDR Dashboard). Furthermore, this dataset has been cited or used in thirteen other publications, covering a diverse range of topics such as glacier recession and vegetation change in glacier forefronts.
Did You Know? SBDR Training Resources
The ScienceBase Data Release team has a page for Training & Resources and Resource Links within the ScienceBase Data Release SharePoint site. These sites provide users with training material on how to release data in ScienceBase and data management resource links.
Past and future training material can be found at our Training & Resources page on the ScienceBase Data Release SharePoint site. Users can access presentation slides, past recordings, and announcements on upcoming training on this site.
Helpful links and tools can be found on the Resource Links page, also on the ScienceBase Data Release SharePoint site. This site provides users with resources on policy guidance, Science Data Management tools, along with other useful information related to data management.
If you are unable to access the ScienceBase Data Release SharePoint, please contact sciencebase_datarelease@usgs.gov.