Data Management

Data Citation

Data citation refers to the process of citing a dataset in the same way that books or journal articles are referenced in research publications. In general data citation is a good practice that benefits the researcher, data repositories and stewards, the scientific community, and the general public.

Cite Your Own Data, Too!

Cite Your Own Data, Too!

It is important to cite other people's data when you use them, but it is also important to cite your own data to enable readers to locate and potentially reuse your data.

Expand Image

Digital Object Identifiers in Data Citations

Digital Object Identifiers in Data Citations

Assigning a digital object identifier to a dataset is important for ensuring that a data citation remains persistent.

Learn about DOIs

Why Cite Data? 

  • Citing datasets provides acknowledgement for those who create and manage data. It gives the researcher proper credit and serves as recognition of scholarly effort. It also gives credit to data stewards and repositories who manage the data for the long term.
  • Data citation creates accountability for creators and stewards of the dataset and reduces the danger of plagiarism once the dataset has been properly cited.
  • Data citation allows others to more easily locate and access a researcher's dataset for the purposes of replicating or verifying their results. Additionally, easy location and access can facilitate discovery and encourage possible reuse of the dataset.
  • Data citation creates a formalized system of recognition and reward to data producers as a citable contribution to the scientific community. Data citation allows the impact of the dataset to be easily tracked through publications that cite the dataset.
  • Data citation in publications can increase the transparency of data production as well as encourage the production of more high quality datasets.


Data Citation Standards 

In order to cite data properly, several institutions and organizations have created standards for citing datasets. The mechanics of citing datasets are generally similar to the citation of journal articles and other publications. The author(s), year, title, archive/distributer, and access date are the most obvious components of data citation.

However, datasets can be more difficult to cite because they can be more dynamic in terms of content and version. For example, a dataset can consist of multiple versions of the raw data, or it can be part of a larger dataset. The dataset itself can change over time as researchers modify or add more data. Therefore, a dataset needs a persistent identifier or locator that can be added to the citation in order to better track the dataset.

A typical data citation generally consists of seven elements:

  • Author or principal investigator
  • Release date
  • Title of the data
  • Version or edition number
  • Archive and/or distributer
  • Persistent locator/identifier [see Publish/Share > Digital Object Identifiers for more information]
  • Access date and time

If relevant, other elements can be included such as data format, 3rd party producer, subset of the data, name of editor or contributor, publication place, data within a larger work.


Best Practices to Support Data Citation 

  • Assign persistent identifiers with your datasets.
    • If possible, assign a new identifier with each new version of dataset.
  • Use applications that support metadata creation for your dataset.
    • Good metadata associated with a dataset is important for access and potential reuse.
    • See Describe > Metadata under "Tools" for more information.
  • Archive the dataset with journal publishers and data repositories during the publication process
  • When citing a dataset in a paper:
    • Use the citation style required by the editor or publisher. If there is no standard, follow a typical format and adapt it to match the style for textual publications.
    • Notify the data repository that holds the dataset so they can add a link to the dataset in your paper.
  • Encourage other data producers to cite their datasets and make their data available for reuse.



Example Data Citations for USGS Released Data

  • Moschetti, M.P., 2017, Database of earthquake ground motions from 3-D simulations on the Salt Lake City of the Wasatch fault zone, Utah: U.S. Geological Survey data release,
  • McLeod, J.M., Jelks, Howard, Pursifull, Sandra, and Johnson, N.A., 2016, Characterizing the early life history of an imperiled freshwater mussel (Ptychobranchus jonesi): U.S. Geological Survey data release,
  • Barber, L.B., Weber, A.K., LeBlanc, D.R., Hull, R.B., Sunderland, E.M., and Vecitis, C.D., 2017, Poly- and perfluoroalkyl substances in contaminated groundwater, Cape Cod, Massachusetts, 2014-2015 (ver. 1.1, March 24, 2017): U.S. Geological Survey data release,


Example Data Citation for Non-USGS Data

The following example of a dataset citation is from the Earth Science and Information Partners (ESIP).

  • Zwally, H.J., R. Schutz, C. Bentley, J. Bufton, T. Herring, J. Minster, J. Spinhirne, and R. Thomas. 2003. GLAS/ICESat L1A Global Altimetry Data V018, 15 October to 18 November 2003. National Snow and Ice Data Center. dataset accessed 2011-07-21 at doi:10.3334/NSIDC/gla01.


What the U.S. Geological Survey Manual Requires: 

The USGS Survey Manual chapter SM 502.8 Fundamental Science Practices: Review and Approval of Scientific Data for Release requires that data approved for release must be assigned a persistent identifier, specifically a Digital Object Identifier (DOI) for scientific data obtained from the USGS registration agent, and be accompanied by a recommended citation.

SM 1100.5 - Authorship, Acknowledgments, and Credits in USGS Information Products states that authorship of USGS information products provides credit and assigns responsibility for information contained in the product. The senior author is responsible for acknowledging contributions and crediting cooperators from other agencies.


Recommended Reading