Skip to main content
U.S. flag

An official website of the United States government

Data Citation

Data citation is the practice of referencing data products used in research. A data citation includes key descriptive information about the data, such as the title, source, and responsible parties.

Why Cite Data? 

The goal of data citation is to provide scientific transparency and attribution. Data citations benefit the researcher, funding organization, data repositories, scientific community, and general public. Data citations serve many purposes:  

  • To aid scientific trustworthiness and reproducibility  

  • To provide fair credit for data creators or authors, data stewards 

  • To ensure scientific transparency and reasonable accountability for data authors and stewards;  

  • To aid in tracking the impact of a dataset and the associated repository 

  • To help data authors verify how their data are being used

  • To help future users identify how others have used the data. (ESIP Data Preservation and Stewardship Committee, 2019)  

For more information see Data Citation FAQ 1.2

 

Data Citation Elements 

A typical data citation generally consists of seven elements:

If relevant, other elements can be included such as query parameters, direct access link, data format, 3rd party producer, name of editor or contributor, publication place, data within a larger work. 

For more details on the elements that should be included in a data citation see Data Citation FAQ 1.3

 

Examples 

Example Data Citations in a Publication’s Text and References 

  • Reference to data release in publication: https://doi.org/10.3133/sir20225004  

    • In text: “Ground surface elevation (GSE) data were collected at the downstream section of the Middle Rice Creek Restoration (fig. 1; Groten and others, 2022)”. 
    • In references: Groten, J.T., Livdahl, C.T., and DeLong, S.B., 2022, Suspended sediment and bedload data, simple linear regression models, loads, elevation data, and FaSTMECH models for Rice Creek, Minnesota, 2010-2019: U.S. Geological Survey data release, https://doi.org/10.5066/ P9SJIY32.
  • Reference to data release in publication: https://doi.org/10.3133/sir20225013

    • In text: “Scripts and data used to perform analyses for this study are available in a USGS data release (Tatge and others, 2022)”. 

    • In references: Tatge, W.S., Nustad, R.A., and Galloway, J.M., 2022, Data and scripts used in water-quality trend and load analysis in the Heart River Basin, North Dakota, 1970–2020: U.S. Geological Survey data release, access February 2022 at https://doi.org/10.5066/P987APZ8.

  • Reference to data release in publication: https://doi.org/10.1016/j.isprsjprs.2021.08.014

    • In text: “Model predictor variables and outputs developed for this study are included in Enwright et al. (2021)”. 

    • In references: Enwright, N.M., Kranenburg, C.J., Patton, B.A., Dalyander, P.S., Brown, J.A., Piazza, S.C., Cheney, W.C., 2021, Developing Bare-Earth Digital Elevation Models from Structure-from-Motion Data on Barrier Islands: U.S. Geological Survey data release, https://doi.org/10.5066/P99PX0O3

For examples of the types of data you should cite in reports and publications, see Data Citation FAQ 2.1

Example Data Citations for USGS Released Data 

  • Groten, J.T., Livdahl, C.T., and DeLong, S.B., 2022, Suspended sediment and bedload data, simple linear regression models, loads, elevation data, and FaSTMECH models for Rice Creek, Minnesota, 2010-2019: U.S. Geological Survey data release, https://doi.org/10.5066/P9SJIY32

  • Abdollahian, N., Jones, J.L., Ball, J.L., Wood, N.J., and Mangan, M.T., 2018, Data release for results of societal exposure to California's volcanic hazards (ver. 3.0, November 2019): U.S. Geological Survey data release, accessed February 10, 2020, at https://doi.org/10.5066/F7W66JRH.  

  • U.S. Geological Survey, 2020, BioData—Aquatic bioassessment data for the Nation: U.S. Geological Survey database, accessed February 20, 2020, at https://doi.org/10.5066/F77W698B.  

Example Data Citations for Non-USGS Data

  • Hall, D.K., and G. A. Riggs, 2016, MODIS/Terra snow cover daily L3 global 500m grid (ver. 6.0): National Snow and Ice Data Center Data Set MOD10A1, accessed February 2, 2019, at https://doi.org/10.5067/MODIS/MOD10A1.006. [Query parameters: Oct. 2007- Sep. 2008, 84°N, 75°W; 44°N, 10°W].

  • Ocean Networks Canada Society, 2019, Barkley Canyon upper slope fluorometer turbidity deployed 2019-05-16: Ocean Networks Canada Society dataset, accessed April 13, 2020, at https://doi.org/10.34943/fa04d675-3df2-4dc3-810b-cb365f7ec492. [Subset Query: 8297991].

For more examples of data citations see Data Citation FAQ 1.4

 

Still have questions?

USGS has documented frequently asked questions about data citation, including: 

  • Differences in citing static, versioned, and dynamic data

  • Who to include in the author list and when it is appropriate for the author list to change over time

  • How to cite a data product without a recommended data citation

 

What the U.S. Geological Survey Manual Requires: 

The USGS Survey Manual chapter SM 502.8 Fundamental Science Practices: Review and Approval of Scientific Data for Release requires that data approved for release must be assigned a persistent identifier, specifically a Digital Object Identifier (DOI) for scientific data obtained from the USGS registration agent, and be accompanied by a recommended citation.



SM 1100.5 - Authorship, Acknowledgments, and Credits in USGS Information Products states that authorship of USGS information products provides credit and assigns responsibility for information contained in the product. The senior author is responsible for acknowledging contributions and crediting cooperators from other agencies.

 

Recommended Viewing

 

References 

  • ESIP Data Preservation and Stewardship Committee, 2019, Data citation guidelines for earth science data, version 2: Earth Science Information Partners web page, accessed April 22, 2020, at https://doi.org/10.6084/m9.figshare.8441816