Data Management

Repositories

A data repository is a centralized place to store and maintain data. A repository can consist of one or more databases or files which can be distributed over a network. Data repositories are often managed by data curation personnel who ensure that files are managed and preserved for the long-term.

Archive vs. Repository: Is There a Difference?

Archive vs. Repository: Is There a Difference?

In the field of data management, the terms "archive" and "repository" often are used interchangeably; however, within the Federal government, the term "archive" has a special meaning.

Learn more

Many Types of Repositories

Many Types of Repositories

This page primarily discusses digital data repositories; however, there are many different types of repositories. The USGS has a number of repositories for physical samples, as well.

Learn more

Why Use a Repository? 

    Storing data in data repositories and data warehouses is highly encouraged and is part of the Preserve portion of the data lifecycle. Data repositories can help make a researcher's data more discoverable and accessible, and lead to potential reuse.

    Data repositories can also serve as backups during rare events where data are lost to the researcher and must be retrieved. However, it is still important for researchers to perform their own data backups and not to rely on data repositories as the only backups.

    Depending on the field, scientists may be required to store their data in certain repositories. Examples of repositories include the Core Research Center, the National Ice Core Laboratory, and the National Water Information System.

     

    Best Practices 

     

    Example USGS Repositories 

    ScienceBase thumbnail

    ScienceBase

    ScienceBase is an information management platform designed to centralize and preserve USGS science and products. ScienceBase is considered a Trusted Digital Repository by the USGS and accepts USGS data from all disciplines. USGS researchers can learn more about formally publishing data to ScienceBase at https://www.sciencebase.gov/about/content/data-release.

     

    NWIS - National Water Information System

    Screenshot of the National Water Information System (NWIS) web interface

    Screenshot of the National Water Information System web interface (NWISweb)

    The National Water Information System (NWIS) provides access to water-resources data collected at approximately 1.5 million sites in all 50 States, the District of Columbia, Puerto Rico, the Virgin Islands, Guam, American Samoa, and the Commonwealth of the Northern Mariana Islands. Online access to this data is organized around the following categories:

    • Current Conditions
    • Site Information
    • Surface Water
    • Groundwater
    • Water Quality

    The USGS investigates the occurrence, quantity, quality, distribution, and movement of surface and underground waters and disseminates the data to the public, State and local governments, public and private utilities, and other Federal agencies involved with managing our water resources.

     

    EROS - Earth Resources Observation and Science Center

    Screenshot of the Earth Resources Observation and Science (EROS) Center Online Data Repository

    Screenshot of EROS Data Repository Website

    Earth Resources Observation and Science (EROS) Center is a USGS data repository that is also considered an archive. EROS stores and serves remotely sensed images of the Earth's land surface. These data are acquired by civilian satellites and aircraft and used to study a wide range of natural hazards, global environmental change, and economic development and conservation issues.

    Available data include:

    • Aerial Photography
    • Satellite Imagery
    • Elevation
    • Land Cover
    • Digitized Maps
    • Image Gallery Collections

    EROS staff members manage and distribute these data to scientists, policy makers, and educators worldwide.

     

    Coastal and Marine Geoscience Data System (CMGDS) 

    The Coastal and Marine Geoscience Data System (CMGDS) provides data services for published U.S. Geological Survey, Coastal and Marine Geology Program (CMGP) data. Access to CMGP data is provided via Open Geospatial Consortium (OGC) standards services; serving CMGP data to GeoMapApp and Virtual Ocean 2-D and 3-D earth browsing tools, for data integration, visualization and analysis; and metadata catalogs for data discovery. It is important to note that this site is a 'work in progress'. Currently, the bulk of our content is geophysical data. In time, we will expand our holdings to include other data types.

    The CMGDS can be used in two different ways: data discovery and data access. The data access is provided by direct data download and a variety of web services that provide direct data access. Data discovery can be done locally by a single user or the site can be harvested by other metadata collections. The CMGDS can also be accessed by software that is capable of using our information for metadata search or GIS display.

     

    See the full list of acceptable digital repositories for USGS scientific publications and data

     

    What the U.S. Geological Survey Manual Requires: 

    SM 502.9 - Fundamental Science Practices: Preservation Requirements for Digital Scientific Data states:

    At the start of the project, as part of the data management plan, USGS scientists must identify an appropriate digital data repository. USGS digital data and associated metadata must be stored in digital repositories approved by the USGS.  Non-USGS repositories may also be used for release of USGS data as long as the Bureau maintains the authoritative copy.

     

    SM 502.8 - Fundamental Science Practices: Review and Approval of Scientific Data for Release states:

    USGS data approved for release are made available at no cost to the public and are managed through a data repository that can ensure their long-term preservation, discoverability, accessibility, and usability as described in SM 502.9.

     

    References