Skip to main content
U.S. flag

An official website of the United States government

ScienceBase Updates - Spring 2022

Spring 2022 topics include information on our migration from Confluence to SharePoint, making sure your metadata record gets into the SDC, a tip on non-ASCII characters, and a featured data release on a novel interaction between intraguild predators.

ScienceBase Updates Header
ScienceBase Updates Header

Migration from Confluence to SharePoint - Training and Resources

This past winter, there was a big push to migrate USGS content from Confluence to SharePoint. Previously, the ScienceBase Data Release (SBDR) Team maintained a Confluence site to document upcoming trainings and post recordings of past events. We have now migrated that content to a new ScienceBase Data Release SharePoint Site.

Screenshot of the ScienceBase SharePoint Site
Screenshot of the ScienceBase SharePoint Site

This site is a great starting place for people who are new to the SBDR process. There are links to the SBDR Tool, the SBDR instructions, and our new SBDR Training & Resources page. On the SBDR Training & Resources page, you can see when we will be hosting our next General SBDR Training and Revision Training Events, as well as connection information for joining the events. You can also find recordings from previous events and notes from the question-and-answer sessions.   

Is there missing information or additional resources that would help you with the SBDR process? Let us know by emailing sciencebase_datarelease@usgs.gov!

Did You Know?

What are non-ASCII characters (sometimes called special characters) and why do we often see recommendations to avoid them?  

ASCII stands for "American Standard Code for Information Interchange". It’s a system that assigns codes to individual text objects such as letters, digits, and symbols. For example, the ASCII code for an exclamation mark is 33, and it can be written in html as !. The purpose is to give computers a common language so that they can more seamlessly share information. 

Non-ASCII characters are those that are not part of the standard ASCII code set, which is limited to 128 unique characters. For example, straight quotation marks (" ") are standard ASCII, but curly quotes (“ ”) are not. Most applications can handle extended code sets, such as Unicode or extended ASCII (which goes up to 256 characters); however, if information is shared between applications, some things can get lost in translation. You may find that a character that looked fine in your original text, such as an em dash, shows up as — when it's parsed into a different application, and it could even show up as the dreaded � replacement character. 

The ScienceBase data release process uses APIs to transfer standardized information between applications. For example, metadata records created in the Metadata Wizard are uploaded and parsed in ScienceBase to populate a landing page, and the information stored with the landing page is transferred via API to the USGS DOI Tool. From there, it's sent on to DataCite's DOI database. Because of all these connections, we recommend limiting use of special characters within text fields such as data release title and abstract.  

So, what should authors look out for? Common culprits within text fields are curly quotes, curly apostrophes, em dashes, en dashes, and symbols such as the degree sign. You can easily check whether your text contains non-ASCII characters by using online tools such as this one: https://onlineasciitools.com/validate-ascii

 

Upcoming Updates to the SBDR Tool

Updates are coming soon to the ScienceBase Data Release (SBDR) Tool to support looking up authors and importing their ORCIDs from Active Directory or from our new non-USGS author database. This people lookup will help us improve the quality of the metadata that we maintain in our digital object identifiers (DOIs). Currently, we have a lot of messy data about people. In the DOI Tool, we sometimes have multiple records for the same individual with conflicting information and we can't always tell if the multiple records represent the same person. Messy data are very difficult for people and computers to resolve, and make it challenging for our USGS systems to communicate. For example, if we don't have the correct information in a DOI record for an author, that product will not show up on their USGS staff profile. It's also impossible for us to understand who our USGS staff are collaborating with on products, especially non-USGS co-authors.  

The updates to the SBDR Tool will help us keep our author data clean. These updates will include uniquely identifying authors as they are entered into the system - both USGS and non-USGS authors - through a people lookup service. The people lookup service is a separate service that we are integrating into the Tool. The lookup service will work for both the IPDS Autofill feature and for entering authors manually.

Screenshot of SBDR Tool People lookup feature
Screenshot of SBDR Tool People lookup feature

If you work with non-USGS authors on data products, you will be asked to enter information about those authors into the non-USGS author database. The more information that you can provide about these authors, especially their ORCIDs or professional email addresses, the easier it will be to look them up in the future. If you have a list of non-USGS authors that you or your Science Center commonly collaborate with, please send us their information in advance (e.g., Full Name, ORCID, Email, Affiliation) and we can bulk load them into the non-USGS author database. Contact sciencebase_datarelease@usgs.gov for more information! 

 

Featured Data Release

Wildlife camera capture image of a bobcat and a burmese python altercation
Photo 1-k - Burmese python (Python bivittatus) on nest facing bobcat (Lynx rufus) in left foreground. Bobcat swiping at python on nest.

Currylow, A.F., Anderson, G.E., and Yackel Adams, A.A., 2022, Photo-documented sequences from 01 Jun 2021-30 Aug 2021 showing novel interactions between intraguild predators in southern Florida, USA, bobcat and Burmese python: U.S. Geological Survey data release, https://doi.org/10.5066/P97ZDQHY

USGS Data Owner: Fort Collins Science Center 

Burmese pythons are known as a widely invasive species in southern Florida and have been linked to mammal population declines in Everglades National Park, especially raccoons, opossums, and bobcats (USGS, 2022). USGS researchers recently captured a novel interaction between a Burmese python and a bobcat using a wildlife surveillance camera. Photos captured from this interaction show a bobcat depredating, caching, and uncovering eggs from an unguarded nest over several days, eventually encountering and swiping at a female Burmese python after her return to the nest. This encounter marks the first recorded instance of a Burmese python actively defending a nest and the first record of a bobcat depredating a python nest. 

The data release landing page in ScienceBase has been accessed over 34,000 times since its publication in February 2022 and the data have been downloaded over 28,000 times (metrics obtained from the ScienceBase Data Release Summary Dashboard). The story has also been picked up by over 49 news outlets according to Altmetrics on the related publication. 

If you know of a data product available in ScienceBase that has gone on to be reused in other projects, inform policy decisions, garner attention in major media outlets, or any other interesting use, we'd love to hear about it. Please complete this form to contribute your data story. 

U.S. Geological Survey, 2022, How have invasive pythons impacted Florida ecosystems?, accessed April 21, 2022 at URL https://www.usgs.gov/faqs/how-have-invasive-pythons-impacted-florida-ecosystems

 

Why Isn't My Metadata Record in the SDC?

USGS policy requires that metadata for approved USGS scientific data must be deposited in and shared through the USGS Science Data Catalog (SDC). The SDC provides access to data across multiple USGS repositories. If you’re releasing data through the ScienceBase data release process, your metadata will automatically be sent to the SDC by the ScienceBase Data Release Team within 24 hours of your data going public. However, not all published data releases make it into the SDC due to specific requirements not being met. 

If your data release has been published, but you don’t see the metadata in the SDC, your metadata record may be missing one of the following SDC requirements: 

Title: 

Title is one of the most important pieces of information in a metadata record. A title must be present in the metadata record (.xml file) to be valid and indexed in the SDC. Best practices suggest using a title that incorporates who, what, where, when, and scale. To be admitted to the SDC, titles must be at least five characters. 

Metadata contact electronic mail address:  

The metadata contact email address of the organization or individual author must be present and valid. Weblinks are not valid email addresses and will prevent your metadata record from being harvested by the SDC. 

Metadata date:   

The metadata date is the date that the metadata was created or last updated. This date must be present and valid in the metadata . A valid metadata date includes year, month, and day in the following format: ‘20220430’. 

Validation: 

The metadata record must pass validation against the metadata standard to ensure it has been structured properly and all required elements have been filled in. When you validate a metadata record, it compares the metadata's XML content to the metadata standard to ensure it conforms to the structure and is thus able to be parsed by SDC. See best practices for Checking Metadata with Data [PDF] for FGDC-CSDGM metadata. Please be aware that many metadata creation and editing tools (such as OME and Metadata Wizard) can validate these records automatically. 

PIDs 

The SDC requires a persistent identifier (PID) that’s been registered and is unique for every metadata record in the Catalog. This persistent identifier enables the SDC and downstream federal data catalogs to uniquely identify and recognize metadata records. The ScienceBase Data Release Team will register and add PIDs to all ScienceBase data releases automatically upon publication, so authors don’t have to worry about this step.  

The requirements above need to be met or the metadata attached to the release will not be properly collected by the SDC. If one of the requirements above was missed when the data release was created, please reach out to the ScienceBase Data Release Team (sciencebase_datarelease@usgs.gov) so that a solution can be determined. 

Subscribe to the ScienceBase Mailing List for Quarterly Updates.