Skip to main content
U.S. flag

An official website of the United States government

Repositories

A data repository is a centralized location to store, curate, and maintain data. Data repositories are often managed by data curation personnel who ensure that files are managed and preserved for the long-term.

Why Use a Repository? 

Storing data in data repositories and data warehouses is highly encouraged and is part of the Preserve portion of the data lifecycle. Data repositories can help make a researcher's data more discoverable and accessible, and lead to potential reuse.

Data repositories can also serve as backups during rare events where data are lost to the researcher and must be retrieved. However, it is still important for researchers to perform their own data backups and not to rely on data repositories as the only backups.

Depending on the field, scientists may be required to store their data in certain repositories. Examples of repositories include the Core Research Center, the National Ice Core Laboratory, and the National Water Information System.

 

Best Practices 

 

Example USGS Repositories 

ScienceBase thumbnail

ScienceBase

ScienceBase is an information management platform designed to centralize and preserve USGS science and products. ScienceBase is considered a Trusted Digital Repository by the USGS and accepts USGS data from all disciplines. USGS researchers can learn more about formally publishing data to ScienceBase on the ScienceBase Instructions and Documentation site.

 

NWIS - National Water Information System

Screenshot of the National Water Information System (NWIS) web interface
Screenshot of the National Water Information System web interface (NWISweb)

The National Water Information System (NWIS) provides access to water-resources data collected at approximately 1.5 million sites in all 50 States, the District of Columbia, Puerto Rico, the Virgin Islands, Guam, American Samoa, and the Commonwealth of the Northern Mariana Islands. Online access to this data is organized around the following categories:

  • Current Conditions
  • Site Information
  • Surface Water
  • Groundwater
  • Water Quality

The USGS investigates the occurrence, quantity, quality, distribution, and movement of surface and underground waters and disseminates the data to the public, State and local governments, public and private utilities, and other Federal agencies involved with managing our water resources.

 

EROS - Earth Resources Observation and Science Center

Screenshot of the Earth Resources Observation and Science (EROS) Center Online Data Repository
Screenshot of EROS Data Repository Website

Earth Resources Observation and Science (EROS) Center is a USGS data repository that is also considered an archive. EROS stores and serves remotely sensed images of the Earth's land surface. These data are acquired by civilian satellites and aircraft and used to study a wide range of natural hazards, global environmental change, and economic development and conservation issues.

Available data include:

  • Aerial Photography
  • Satellite Imagery
  • Elevation
  • Land Cover
  • Digitized Maps
  • Image Gallery Collections

EROS staff members manage and distribute these data to scientists, policy makers, and educators worldwide.

 

Coastal and Marine Geoscience Data System (CMGDS) 

Screenshot from the U.S. Geological Survey Coastal and Marine Geoscience Data System (CMGDS)
Screenshot of the Coastal and Marine Geoscience Data System 

The Coastal and Marine Geoscience Data System (CMGDS) provides data services for published U.S. Geological Survey, Coastal and Marine Geology Program (CMGP) data. Access to CMGP data is provided via Open Geospatial Consortium (OGC) standards services; serving CMGP data to GeoMapApp 2-D and 3-D earth browsing tools, for data integration, visualization and analysis; and metadata catalogs for data discovery. It is important to note that this site is a 'work in progress'. Currently, the bulk of our content is geophysical data. In time, we will expand our holdings to include other data types.

The CMGDS can be used in two different ways: data discovery and data access. The data access is provided by direct data download and a variety of web services that provide direct data access. Data discovery can be done locally by a single user or the site can be harvested by other metadata collections. The CMGDS can also be accessed by software that is capable of using our information for metadata search or GIS display.

 

GenBank (National Center for Biotechnology Information (NCBI)) 

Screenshot of GenBank homepage, including an overview, information on access, and data usage

GenBank, managed by the National Institutes of Health (NIH), is broadly used by geneticists globally as a repository for genetic sequence data. Every sequence is given a globally unique accession number, which can be used to find the exact sequence and should be included in the accompanying documentation. USGS genetic data releases often include GenBank accession numbers along with other associated data.

GenBank has very specific QA/QC protocols and documentation requirements for sequences that are submitted for accession in the database; however, the metadata standard required for GenBank is different from the FGDC-endorsed metadata required for USGS data releases. Because of this difference, scientists are required to create a separate metadata record in a USGS approved standard (CSDGM or ISO) to describe the GenBank data. This metadata record must also be submitted to the USGS Science Data Catalog (SDC).

For data in GenBank, authors could create a high-level metadata record, describing genomic data generated by the project, and submit the record to the SDC directly. For cases in which a data release contains data files in addition to the data in GenBank, authors could create a ScienceBase data release for the additional data and metadata. The landing page could describe the project, with information about the generated data (GenBank and other) and could contain the GenBank accession number. The metadata record from the landing page in ScienceBase would then be automatically harvested by the SDC.

See the full list of acceptable digital repositories for USGS scientific publications and data

 

What the U.S. Geological Survey Manual Requires: 

SM 502.9 - Fundamental Science Practices: Preservation Requirements for Digital Scientific Data states:

At the start of the project, as part of the data management plan, USGS scientists must identify an appropriate digital data repository. USGS digital data and associated metadata must be stored in digital repositories approved by the USGS.  Non-USGS repositories may also be used for release of USGS data as long as the Bureau maintains the authoritative copy.

SM 502.8 - Fundamental Science Practices: Review and Approval of Scientific Data for Release states:

USGS data approved for release are made available at no cost to the public and are managed through a data repository that can ensure their long-term preservation, discoverability, accessibility, and usability as described in SM 502.9.

 

References 

Page last updated 8/20/24.

Was this page helpful?