An official website of the United States government
Here's how you know
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
Secure .gov websites use HTTPS
A lock () or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.
Data standards are the guidelines by which data are described and recorded. In order to share, exchange, combine and understand data, we must standardize the format as well as the meaning.
Who produces and ratifies data standards?
Data standards are produced by a consensus of subject matter experts and are ratified by a standards authority such as the International Organization for Standardization (ISO) and the Federal Geographic Data Committee (FGDC).
Data stewards and data managers can help determine the appropriate data standards to use for a project. Researchers are responsible for implementing the use of data standards in their projects.
Standards make it easier to create, share, and integrate data by ensuring that the data are represented and interpreted correctly. Standards also reduce the time spent cleaning and translating data. Cleansing “dirty data” is a common barrier encountered by scientists, taking 26% of data scientists’ on-the-job time (Anaconda, 2020). For example, when integrating datasets from different sources, each of which used a different format for their date variable (e.g., April 2, 2024, 04-02-24, 04/02/2024), it would be a time consuming task to interpret and convert the dates into a common format before integrating the data.
Dataset-level standards specify the scientific domain, structure, relationships, field labels, and parameter-level standards for the dataset as a whole. A dataset-level standard is normally documented with a data dictionary (link to data dictionary page). See below for examples of formal dataset-level standards:
Parameter-level standards define the format and units for a given parameter or field within a dataset and help users correctly interpret the values. Parameter-level standards should be adopted at the time of data collection, that is when values in a field are created or recorded. If standardization of a parameter in an existing dataset will result in any loss of original detail or information, a best practice is to retain the original parameter and add a separate field for the standardized parameter. The Darwin Core standard, for example, provides verbatim fields for this purpose.
Below are some examples of common earth and biological science data parameters and data standards.
Data Standard: IUPAC-IUGS common definition and convention on the use of the year as a derived unit of time (IUPAC Recommendations 2011): http://doi.org/10.1351/PAC-REC-09-01-22
* If ITIS does not address your nomenclature requirements, (contact the ITIS team (itiswebmaster@itis.gov)); and/or reference another appropriate taxonomic authority .
Table Caption: This excerpt from an occurrence record for a U.S. native bee species uses parameter-level standards recommended by the Darwin Core dataset-level standard (the full record can be viewed at https://www.gbif.org/occurrence/1456598984).
Data encoding and interface standards
Most dataset-level standards (see above) also offer guidance on how to encode data. Data encoding standards define the rules for structuring and organizing data for use in a given context. These standards ensure that when applications read data, the information and context is preserved (OGC, 2020a). Data encoding standards are generally associated with a file format (see File Formats page). Researchers should use universal, open source data encoding standards and long-term access, open file formats whenever possible. Character encoding standards, such as Unicode Transformation Format (UTF-8) ensure that characters in the data are correctly interpreted.
Below are some examples of open data encoding standards used in the earth and biological sciences. Abbreviations and acronyms are defined at the end of this section.
GeoTIFF and Cloud Optimized GeoTIFF: provides the rules for describing geographic image data using the TIFF file format
GeoJSON: defines the rules for describing geographic features using JSON.
NetCDF: supports electronic encoding of geospatial data, specifically digital geospatial information representing space and time-varying phenomena, using the HDF file format.
NARA RFC 4180: There is no single encoding standard for creating CSV files; however, the Library of Congress uses the NARA RFC 4180 as the encoding format specification to define the structure of CSV files.
OGC Web Map Service: allows users to remotely access georeferenced map images via HTTPS requests.
OGC Web Coverage Service: allows users to access online geospatial data in multiple raster-based data formats (e.g. GeoTiffs, .img, ENVI (.hdr) file types).
GML Web Feature Service: allows users to access online geospatial data at the feature level using formats such as Shapefile, GML, etc.
Abbreviations and Acronyms
CSV – comma separated values
ENVI - ENvironment for Visualizing Images
GML - Geography Markup Language
HTTPS – Hypertext Transfer Protocol Secure
JSON - JavaScript Object Notation
NARA RFC - National Archives and Records Administration Request For Comments
NetCDF - Network Common Data Form
OGC - Open Geospatial Consortium
TIFF - Tag Image File Format
Documenting data standards in metadata
Parameter-level and dataset-level data standards should be documented in the accompanying data dictionary and metadata record. For example, if following the Content Standard for Digital Geospatial Metadata (CSDGM), the dataset-level data standards can be documented in the Entity and Attribute Overview Description section of the metadata and the parameter-level data standards can be documented in the Entity and Attribute Detailed Description for each attribute described in the metadata record.
In any given Federal agency, there can be multiple groups of experts that produce standards and authorities that approve them, often based on the science topic. There is no single group that establishes or recommends standards for the USGS.
Standards often evolve out of communities of practice coming together and agreeing on common practices. Here is an example of an evolving best practice that hasn’t been formally ratified by a standards authority. Does your USGS project use community guidelines that you’d like us to link here? Contact GS_Data_Management@usgs.gov.
"The data collected and the techniques used by USGS scientists should conform to or reference national and international standards and protocols if they exist and when they are relevant and appropriate. For datasets of a given type, and if national or international metadata standards exist, the data are indexed with metadata that facilitate access and integration."
USGS Survey ManualChapter SM 502.6 - Fundamental Science Practices: Scientific Data Management Foundation specifies that a data management plan will include standards and intended actions as appropriate to the project for acquiring, processing, analyzing, preserving, publishing/sharing, describing, and managing the quality of, backing up, and securing the data holdings.