ScienceBase Updates - Winter 2023
Winter 2023 topics include news on the map service migration from ScienceBase ArcGIS server to ArcGIS Online (AGOL), making your data release more interoperable, a tip on the ScienceBase JSON model, and a featured data release on bathymetry and topography data from 14 surveys performed on the Elwha River delta between 2010 and 2017.
Table of Contents
- Map Service Migration from ScienceBase ArcGIS Server to ArcGIS Online (AGOL)
- How to Make Your Data Release More FAIR – Interoperable
- Featured Data Release
- Did You Know? ScienceBase JSON Model
Map Service Migration from ScienceBase ArcGIS Server to ArcGIS Online (AGOL)
The ScienceBase team has a newly developed workflow to pull an existing ESRI service definition (.sd) file from on-premise storage in Denver and make a small modification to the file to work with the USGS ArcGIS Online (AGOL) instance. An automated process can then move the file to ScienceBase cloud storage, push the file to AGOL to start an ESRI map service in AGOL, and update the ScienceBase item JSON with the new map service URL.
This will result in an updated map service URL being listed in the item, and a change in the file location in the JSON, but otherwise the item will remain unchanged.
This workflow is intended for public/finalized data that benefit from being additionally accessible via a public ESRI map service. The intent is to support better uptime and service resiliency for these resources, and the process is also consistent with Bureau goals to move resources into the cloud. The team is currently testing this process with authors, with plans to migrate all ScienceBase ArcGIS services to the cloud at a future point. We will communicate additional plans as they evolve.
Users interested in testing this new process, or with a need for maximum uptime in map services, can provide a list of item IDs in ScienceBase with associated AGS services, and the ScienceBase team will perform the migration on their behalf. If you are interested in trying out this functionality for existing ScienceBase items (or know others who might be) please contact sciencebase@usgs.gov.
How to Make Your Data Release More FAIR – Interoperable
The FAIR (findable, accessible, interoperable, and reusable) guiding principles for data, first outlined in Wilkinson and others (2016), have quickly become a popular way to assess and improve the usability and utility of scientific datasets. However, it can be difficult to glean practical and straightforward ways to implement the principles in your own data releases. We will explore a few small ways to make your data more FAIR in the next few Updates, continuing with Interoperable (see the Fall 2022 Updates for the piece on Accessible).
Using the ScienceBase data release process ensures that a few of the principles under Interoperable are already fulfilled for you: for instance, we check that the attached metadata record is in standard format like ISO and CSGDM, and identifiers (DOI and metadata PID) are present. Here are a few other simple ways to make your data more Interoperable on ScienceBase.
Use standards
Using standardized formats and guidelines make your data easier to understand, integrate, and use. Standards can include data-set level standards, which specify the scientific domain, structure, relationships, field labels, and parameter-level standards for the dataset (e.g., Darwin Core), or individual parameter-level standards, which help define the format and units for a given parameter or field within a dataset and help users correctly interpret the values (e.g., ISO 8601 as a standard for entering date/time). See the USGS Data Management Website for more on data standards.
Use published data dictionaries/labels and controlled vocabularies where possible
Controlled vocabularies and data dictionaries provide crucial information about your data. These files describe various elements of the dataset so it can be correctly interpreted and reused by other collaborators in the future. Using a controlled vocabulary (like the USGS Thesaurus) in your metadata further helps facilitate the interoperability of data. It allows for the categorization, indexing, and retrieval of information across multiple platforms. For example, if metadata records from USGS Earthquake data all have the keyword “earthquake”, a person can search for earthquake in the Science Data Catalog (SDC) and find all USGS data on earthquakes. When no controlled vocabularies exist, it is advised to use a data dictionary that explains the terms being used. You can learn more on how to create a data dictionary on the USGS Data Management Website.
Things to consider when choosing a file format
Choosing the format in which you’ll release your data can be a difficult decision. Federal policy dictates that data be released in a non-proprietary and machine-readable format, but these criteria leave many file formats available as options. Here are a few questions that may help guide your decision.
-
How do you plan on analyzing and sharing your data?
-
Is your file open or non-proprietary? Use file formats that are open or non-proprietary wherever possible. These type of file formats will most likely be accessible in the future, even if the software that created them is not available.
-
Does your community discipline have any specific norms or standards?
-
Are the files machine readable? Machine readable data can be automatically read and processed by a computer, like a .csv or .json file.
-
Do you know which file formats are widely used for the long-term preservation of data? If you aren't aware of any these type of file formats, you can find examples of preferred and acceptable file formats on the Federal Records Management website.
-
Could important information be lost when converting between different file formats? JPEGs use a lossy compression which reduces the file size by removing data, whereas .tiffs use a lossless compression which retains all original data. Due to this, data will be lost when converting a .tiff. to a .jpeg file.
Featured Data Release
USGS Data Owner: Pacific Coastal and Marine Science Center
Stevens, A.W., Gelfenbaum, G., Warrick, J.A., Miller, I.M., and Weiner, H.M., 2017, Bathymetry, topography, and sediment grain-size data from the Elwha River delta, Washington: U.S. Geological Survey data release, https://doi.org/10.5066/F72N51GC.
The removal of the Elwha and Glines Canyon Dams currently constitute the largest dam removal project in U.S. history. These dams in Washington state trapped over 20 million m3 of sediment and contributed to the erosion of the Elwha River's coastal delta. With the removal of the dams, an opportunity arose to examine the response of a delta system to changes in sediment supply. This data release (Stevens and others, 2017) contains bathymetry and topography data from 14 surveys performed on the Elwha River delta between 2010 and 2017.
This data release is one of the most accessed and downloaded in ScienceBase (according to the SBDR Dashboard). The related publication (Ritchie and others, 2018), which evaluates geomorphic evolution during and after the sediment pulse precipitated by the dam removals, has been cited by 47 other publications relating to subjects from river restoration to climate change in the western United States. The publication was also included in the Top 100 in Earth Science collection, which gathers the most accessed Earth science articles published by Nature Scientific Reports each year.
References
Ritchie, A.C., Warrick, J.A., East, A.E., Magirl, C.S., Stevens, A.W., Bountry, J.A., Randle, T.J., Curran, C.A., Hilldale, R.C., Duda, J.J. and Gelfenbaum, G.R., 2018. Morphodynamic evolution following sediment release from the world’s largest dam removal. Scientific reports, 8(1), pp.1-13, https://doi.org/10.1038/s41598-018-30817-8.
Did You Know? ScienceBase JSON Model
Every ScienceBase item is based on a standardized data model written in JSON, which stands for JavaScript Object Notation. JSON is structured text that a computer can understand by parsing but can also be easily read by a human. JSON consists of a basic data dictionary that stores information in a key (e.g., “title”) followed by the value for that key. For example:
“title”: “Bathymetry, topography, and sediment grain-size data from the Elwha River delta, Washington”
Understanding the ScienceBase JSON item model (sbJSON) provides users a powerful way to read (or edit, when logged in) items in bulk, using code libraries such as sciencebasepy (Python) and sbtools (R) to pull information into workflows and tools. Parsing file information from the JSON can allow users to read some types of data directly from ScienceBase and incorporate them into programmatic workflows (See last quarter’s “Did You Know”). This core item model with defined fields is also how ScienceBase supports cross-walking (or ‘mapping’) from a user-provided metadata file (another form of structured text) into an item.
The best way to start understanding the ScienceBase item model is to view the JSON for an existing, well-populated item. From the item’s landing page, click “View” --> “JSON”. Some browsers, such as Chrome, Firefox, or Safari, can format the JSON so that it is easier to read by adding a simple plugin or extension.