An official website of the United States government
Here's how you know
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
Secure .gov websites use HTTPS
A lock () or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.
A data repository is a centralized location to store, curate, and maintain data. Data repositories are often managed by data curation personnel who ensure that files are managed and preserved for the long-term.
Archive vs. Repository: Is There a Difference?
In the field of data management, the terms "archive" and "repository" often are used interchangeably. Within the Federal government, however, the term "archive" is specific to the mission and activities of the National Archives and Records Administration (NARA).
Many Types of Repositories
This page primarily discusses digital data repositories; however, there are many different types of repositories. The USGS has a number of repositories for physical samples, as well.
Storing data in data repositories and data warehouses is highly encouraged and is part of the Preserve portion of the data lifecycle. Data repositories can help make a researcher's data more discoverable and accessible, and lead to potential reuse.
Data repositories can also serve as backups during rare events where data are lost to the researcher and must be retrieved. However, it is still important for researchers to perform their own data backups and not to rely on data repositories as the only backups.
Depending on the field, scientists may be required to store their data in certain repositories. Examples of repositories include the Core Research Center, the National Ice Core Laboratory, and the National Water Information System.
Best Practices
Check the list of acceptable digital repositories for USGS Scientific Publications and Data. Follow appropriate guidelines specified by the data repository to which you are submitting.
ScienceBase is an information management platform designed to centralize and preserve USGS science and products. ScienceBase is considered a Trusted Digital Repository by the USGS and accepts USGS data from all disciplines. USGS researchers can learn more about formally publishing data to ScienceBase on the ScienceBase Instructions and Documentation site.
The National Water Information System (NWIS) provides access to water-resources data collected at approximately 1.5 million sites in all 50 States, the District of Columbia, Puerto Rico, the Virgin Islands, Guam, American Samoa, and the Commonwealth of the Northern Mariana Islands. Online access to this data is organized around the following categories:
Current Conditions
Site Information
Surface Water
Groundwater
Water Quality
The USGS investigates the occurrence, quantity, quality, distribution, and movement of surface and underground waters and disseminates the data to the public, State and local governments, public and private utilities, and other Federal agencies involved with managing our water resources.
Earth Resources Observation and Science (EROS) Center is a USGS data repository that is also considered an archive. EROS stores and serves remotely sensed images of the Earth's land surface. These data are acquired by civilian satellites and aircraft and used to study a wide range of natural hazards, global environmental change, and economic development and conservation issues.
Available data include:
Aerial Photography
Satellite Imagery
Elevation
Land Cover
Digitized Maps
Image Gallery Collections
EROS staff members manage and distribute these data to scientists, policy makers, and educators worldwide.
The Coastal and Marine Geoscience Data System (CMGDS) provides data services for published U.S. Geological Survey, Coastal and Marine Geology Program (CMGP) data. Access to CMGP data is provided via Open Geospatial Consortium (OGC) standards services; serving CMGP data to GeoMapApp 2-D and 3-D earth browsing tools, for data integration, visualization and analysis; and metadata catalogs for data discovery. It is important to note that this site is a 'work in progress'. Currently, the bulk of our content is geophysical data. In time, we will expand our holdings to include other data types.
The CMGDS can be used in two different ways: data discovery and data access. The data access is provided by direct data download and a variety of web services that provide direct data access. Data discovery can be done locally by a single user or the site can be harvested by other metadata collections. The CMGDS can also be accessed by software that is capable of using our information for metadata search or GIS display.
GenBank (National Center for Biotechnology Information (NCBI))
GenBank, managed by the National Institutes of Health (NIH), is broadly used by geneticists globally as a repository for genetic sequence data. Every sequence is given a globally unique accession number, which can be used to find the exact sequence and should be included in the accompanying documentation. USGS genetic data releases often include GenBank accession numbers along with other associated data.
GenBank has very specific QA/QC protocols and documentation requirements for sequences that are submitted for accession in the database; however, the metadata standard required for GenBank is different from the FGDC-endorsed metadata required for USGS data releases. Because of this difference, scientists are required to create a separate metadata record in a USGS approved standard (CSDGM or ISO) to describe the GenBank data. This metadata record must also be submitted to the USGS Science Data Catalog (SDC).
For data in GenBank, authors could create a high-level metadata record, describing genomic data generated by the project, and submit the record to the SDC directly. For cases in which a data release contains data files in addition to the data in GenBank, authors could create a ScienceBase data release for the additional data and metadata. The landing page could describe the project, with information about the generated data (GenBank and other) and could contain the GenBank accession number. The metadata record from the landing page in ScienceBase would then be automatically harvested by the SDC.
At the start of the project, as part of the data management plan, USGS scientists must identify an appropriate digital data repository. USGS digital data and associated metadata must be stored in digital repositories approved by the USGS. Non-USGS repositories may also be used for release of USGS data as long as the Bureau maintains the authoritative copy.
USGS data approved for release are made available at no cost to the public and are managed through a data repository that can ensure their long-term preservation, discoverability, accessibility, and usability as described in SM 502.9.
The National Science and Technology Council, Desirable Characteristics of Data Repositories for Federally Funded Research, 2022, DOI: https://doi.org/10.5479/10088/113528.