Skip to main content
U.S. flag

An official website of the United States government

ScienceBase Modernization FAQs

This page will be updated periodically with frequently asked questions and updates on the ScienceBase modernization project.

Table of Contents

ScienceBase Data Release Overview 

ScienceBase has served as the generalist data repository for USGS Data Releases since 2016. It’s developed and managed by the Science Analytics and Synthesis (SAS) program within the Core Science Systems Mission Area. Within SAS, the Science Data Management (SDM) branch is the team that directly supports ScienceBase and the data release process.

Originally launched in 2009, ScienceBase was designed as a content management system for data collections, with a special focus on programmatic access via its API. In 2016, following the release of the USGS data policies, the ScienceBase data release workflow was developed to help data authors publish data in full compliance with these policies. The integration of programmatic workflows and other data release tools (e.g., the Asset Identifier System) improved efficiency and allowed the ScienceBase team to scale up their rate of publication and release data from across the Bureau.

Since then, the data release collection in ScienceBase has grown to over 13,000 data releases. More than 80% of the USGS data holdings in the Science Data Catalog are published through ScienceBase. Of the 8 USGS data repositories that existed in 2015, 4 have since consolidated their data releases into ScienceBase. ScienceBase also provides programmatic access to data and metadata for more than 20 public-facing USGS applications. 
 

What are the goals of ScienceBase modernization? 

The overarching goal is to move ScienceBase into a more flexible, modernized tech stack with less dependence on custom code. The final product will be more sustainable, especially in light of budget and contract fluctuations. We will continue supporting essential data repository functions and are taking steps to facilitate a smooth period of transition between systems.  

The new system will use Globus services for user auth, indexing, file transfer, and access to data storage. A lightweight frontend and API endpoints are being developed using FastAPI, a widely used framework that will provide faster delivery and easier integration with other systems.

Additional opportunities have been identified during this process. For example, we’re planning to unify two search and discovery catalogs and simplify tool integration. The updated system is being designed with a new, shared data model, which interoperates with multiple SDM tools (e.g., AIS, People Picker, and SDC). We’re also taking the opportunity to consolidate our development pipeline.

The new data release workflow will provide an improved user experience for both data authors and the ScienceBase data release team. In 2016, the original data release process leveraged an existing system, so there were adjustments that we had to make to adapt to it. The new process is being developed specifically for USGS data release and the needs of maintaining and managing a data repository.
 

What will the new system look like? 

We’re combining ScienceBase Data Release (SBDR) Tool with the Science Data Catalog (SDC). The result will be one portal for USGS data release search and discovery, the USGS Science Data Portal. It's being built within the SDC website and deployment pipeline.

A simple summary: SDC + SBDR = USGS Science Data Portal

Within the combined search index, users will be able to find:

  • Landing pages for SDC metadata records – pointing to repositories external to ScienceBase 
  • Landing pages for ScienceBase data releases – providing access to both data and metadata

This consolidation is enabled by our new, shared data model, which is provisionally released through GitHub. The model was built using Pydantic, a Python library that defines consistent data structures and validation.

Users can expect a more streamlined workflow and simplified user interface. Data authors will be able to log in to the Science Data Portal and create a new data release within the application (in the current workflow, the ScienceBase Data Release Tool and ScienceBase have separate logins). Workflows for the SBDR team will also be more integrated (current workflows depend on external Jupyter Notebooks).

Another benefit of a consolidated search index is that we won’t need a harvest process for ScienceBase data releases, which takes time and can be a challenge to maintain. Data releases will be created within the SDC, so data authors can expect to see their products within the search index immediately upon publication. 
 

How will ScienceBase leverage Globus? 

The goal of partnering with Globus is to integrate products and services that are maintained beyond the USGS. SDM has a cooperative agreement to work with Globus developers and to integrate Globus services into the new data release process.

Globus Transfer is the flagship service offered by Globus. It’s a fast, secure, and reliable way to move large data files and will automatically resume transfers if there are network disruptions. Its integration will facilitate data transfer into and out of the new system, for both data authors and data users.  

We are also using the following Globus services: 

  • Globus Collections for organizing files and providing access to storage systems, including AWS S3 and BlackPearl 

What will stay the same? What will change?  


Staying the same:


The SBDR Team  

We’re still available and will continue to provide the same level of support to data authors and data managers. You can contact us at sciencebase_datarelease@usgs.gov
 

API accessibility 

The new system will have an API, and users will be able to run create, read, update, and delete operations. We also plan to create Python and R packages to support programmatic interactions. These will be analogous to the sciencebasepy and sbtools libraries that are currently used with ScienceBase. Due to changes in the data model, however, existing scripts will need to be updated. 
 

XML metadata parsing 

Data authors will still be able to build their landing pages by uploading XML metadata files and automatically parsing the content. 
 

Revisions 

Users will be able to revise their published data releases. The general revision process will be same as the current one and will allow data authors to comply with all relevant USGS data policies. 
 

Publishing to S3 

There will still be an option to copy data files to S3 and generate URLs for use in programmatic workflows. This option will be analogous to the 'publish to S3' feature currently in ScienceBase. Note: it may not be available in the first version of the system, but will be added in later as an advanced feature. 
 


Changes: 

The data model

SDM has developed a new Pydantic data model for the Science Data Portal application, referred to as the 'horizon' data model. It’s been provisionally released on GitHub (please note there are still ongoing updates).

The benefit of the new model is that it allows us to consolidate the SDC and SBDR data models. It enables interoperability with the Asset Identifier Service (AIS), the People Picker (a directory application for USGS authors and contributors), and uses DCAT vocabulary whenever possible (DCAT is the standard used by metadata catalogs such as data.gov).

We’re creating a mapping between the two models, together with scripts to test migration on individual data releases. At a later date (we anticipate 4th quarter of FY26 or 1st quarter of FY27), we’ll run a batch migration of all ScienceBase data releases into the new system. For the more complex data releases in ScienceBase, we’ll work with authors and data managers to make sure we maintain the integrity of their data products.


Landing page content sync with XML

In ScienceBase, content displayed on data release landing pages has two sources: the user input form and an uploaded XML metadata file. This will be similar in the new system; however, an important difference is that the new system will automatically sync content between the landing page and the XML file. That is, certain content in an uploaded metadata file (e.g., title, description, dates) will display automatically on a landing page, without the option to edit manually.

Certain fields in ScienceBase, including title and abstract, can be independently edited through the user input form and can therefore be out of sync with the XML. In the new system, users will only be able to edit these fields by editing and reuploading their XML file.


ScienceBase identifiers

ScienceBase IDs are unique alphanumeric strings assigned to individual ScienceBase items. They are included within URLs (e.g., https://www.sciencebase.gov/catalog/item/5f63ad9182ce38aaa23b0340). Data releases migrated into the new system will be assigned new identifiers. Original ScienceBase IDs will be stored within the new system to maintain an association between the two.

All data release Digital Object Identifiers (DOIs) in AIS will be rerouted to their new landing pages when we migrate to the new system. Please be aware, however, that if original ScienceBase URLs are documented elsewhere (e.g., within XML metadata), they will no longer work.


Geospatial services

Users won't be able to create new geospatial services within the Science Data Portal application. That is, the new system won't have its own geoserver instance or built-in connection that writes to USGS ArcGIS Online (AGOL). However, users will have the option to enter existing service URLs through the edit form for display on landing pages (the new data model contains an optional distribution section for services).

We know geospatial services are often a high priority for users, and there are existing services in ScienceBase that are actively used for data distribution. ArcGIS REST services published via AGOL (i.e., by uploading and processing .sd files in ScienceBase) will persist within AGOL. These existing service URLs will be migrated together with their data releases into the new system. Services created on ScienceBase’s geoserver, however, won’t persist. These are the OGC WMS/WFS services that can be automatically generated from shapefiles and rasters. If users would like to retain geoserver services, we recommend recreating them in AGOL to generate more persistent service URLs. Here are instructions on how to create an AGOL service from a service defintion (.sd) file in ScienceBase.
 

Child item hierarchy

The new system will have only one level of child items. A 'component' class is built into the new data model for child items, but only one level of nesting will be possible. Although there won’t be support for multiple levels of hierarchy, the need for that should be reduced, because the new system will be able to store more files on a single entry. The current limit in ScienceBase is 100 files and the limit in the new system will be between 1,000 and 10,000. If you're currently working on a data release in ScienceBase, please only use one level of child items so that migration in the future will be easier.

For current data releases in ScienceBase that have a deeply nested structure, we’ll first notify their associated points of contact. Then we’ll work with them to reorganize their data releases. If there are 2 levels of child items, we may be able to programmatically flatten the structure. Complex products may require more manual edits.
 

Project pages and associated items

Some ScienceBase data releases are grouped under project pages. When migrated, they won’t keep their top-level folder. In the new system, it will still be possible to use tags and queries to organize and retrieve related data releases. An alternative to ScienceBase project pages could be Drupal project pages in the USGS website to present collections of data releases (example). 

 
Dynamic data releases

The current dynamic data release process, in which data authors use scripts to automatically update their ScienceBase data releases, is described here. Because of changes to the data model and API, existing scripts will need to be modified. Also, we may not be able to roll out advanced features right away, due to our condensed time frame and development uncertainties. As a result, we recommend holding off on new dynamic data releases in ScienceBase until it’s possible to create them in the new system. If you’re planning to publish a dynamic dataset, please consider the pros and cons and decide what would work best for your project. 
 

Provisional data releases

We are still working out details on how the new system will be able to support provisional data release. Similar to the dynamic data process, we are considering an interim recommended workflow until the new system can be developed enough to support the more specialized provisional process.

If you have questions about how these changes will affect your data releases, please feel free to contact us at sciencebase_datarelease@usgs.gov. We can discuss your specific data products and any considerations relating to migration.  

 

What should data authors and managers keep in mind for their work now? 

Given these changes, here’s a summary of what we recommend for current work: 

  • Please be aware that the data model will change in the future and that current scripts will need to be updated.
  • For new data releases, keep landing page and XML metadata file content in sync.
  • Use DOIs instead of ScienceBase URLs to reference data.
  • If you’d like to retain geoserver services, recreate them in USGS AGOL to generate more persistent service links.
  • Use only one level of child items in new data releases.
  • If possible, don’t create new dynamic data releases in the current ScienceBase system.
  • Don't create new project pages. If you’d like to present a data release collection online, consider using project pages in USGS Drupal. 

 

What about USGS collections in ScienceBase that aren't data releases? 

ScienceBase as a whole will have a narrower, more targeted focus. For non-data release functions, such as content management and non-public data, we recommend evaluating departmental resources where possible: 

  • For content management (CMS capabilities), consider using USGS or center websites 
  • For team documents, SharePoint is a good option 
  • For large non-publication data collections, USGS Cloud Hosting Solutions (CHS) offers tools for data hosting, processing, and other services.

Again, the SDM team is available to help users navigate upcoming changes. For questions relating to non-data release content, you can contact us at sciencebase@usgs.gov or ask-sdm@usgs.gov to discuss options and recommendations. 
 

What will this mean for other Department of Interior users and for non-Federal collaborators?

ScienceBase has historically supported several partnerships with Department of Interior agencies (FWS, BLM, etc.) and non-federal collaborators (universities and research cooperatives). Support has included providing data storage and permission-controlled access to ScienceBase resources. 

In response to changes in program-level funding, the new system is currently scoped for USGS only. USGS data releases can have external authors, but permission-controlled access and editing will be limited to USGS personnel.

We're working with USGS teams and research partners to communicate these planned changes and work through migration strategies where required. We don't anticipate any breaking changes to workflows prior to December 2026; however, research teams should be aware of planned updates. If you have questions, please contact your project data manager or reach out to the SDM team at ask-sdm@usgs.gov.
 

What will replace ScienceBase Directory? 

ScienceBase directory will be replaced by the People Picker application, an SDM tool that is already used by AIS and the current ScienceBase Data Release Tool (SBDR Tool). The People Picker is API-only, that is, it doesn’t have an interactive user interface. For information about the API, see the 'People' category here.

Author information in the existing ScienceBase data release collection is already synced with the People Picker (via the DOI Tool). So users of the new system should see very little change in how contact information is handled, for both USGS authors and external authors. 
 

Will data release display on USGS Drupal pages be affected? 

No. The USGS Website harvests data release info via their DOIs, not directly from content in ScienceBase (the Asset Identifier Service and its API are staying the same). 
 

Data Manager FAQs  

Some data managers have elevated permissions in ScienceBase. Will this still be an option?  

Not at first. Because of our condensed timeline, we are not rolling out advanced features right away. This includes the ability for non-admins to directly publish or edit products that are already published. After the new system and data release process is more established, we’ll circle back to evaluate the need for this. Given the streamlined nature of the new system, however, it could be that there will be less of a need for direct access. If you are a data manager with a use case for elevated permissions to continue uninterrupted, please contact us at ask-sdm@usgs.gov.  
 

What about custom workflows for science centers, e.g., notifying data managers when someone from their center creates a new data release?

This will also not be available at first. We are saving all configuration info from the SBDR Tool and our publication scripts, and this could be an advanced feature we add in later.
 

What is the anticipated timeline?

ScienceBase and the data release process will remain largely unchanged through the end of FY26. Although users won’t see direct changes, we encourage people to keep the future developments in mind, to help make the eventual transition as smooth as possible.

In this 1st quarter of FY26, we’re actively working with the Globus Team on the new API. We’re also starting to test migration of existing data releases to our development environment. We hope to perform a final migration of existing data releases to our production environment in the 4th quarter of FY2026. Based on our current timeline, users will be able to create new data releases in the Science Data Portal at the end of FY26 or the 1st quarter of FY27.

Once the new system is operational, metadata records in the Science Data Catalog (SDC) will also be migrated to the Science Data Portal. Until then, we’ll continue to harvest, index, and publish metadata using the existing SDC, and will potentially begin the SDC migration in the 2nd quarter of FY27.

The current API will remain operational through FY26. We anticipate deprecation starting in calendar year 2027. Because of this, we recommend that users begin testing and converting scripts to the new API during FY26. As of December 2025, the new API is still in development, and we let users know when it's ready for testing (likely in March or April 2026). 
 

How can users submit questions and feedback? 

If you have questions or feedback, please contact us at ask-sdm@usgs.gov.
 

How will SDM share updates? 

As additional questions come up, we’ll expand on this FAQ document and provide up-to-date information. 

We’ll also provide updates through the ScienceBase Notify email list (sign up here).

Was this page helpful?