A data dictionary is used to catalog and communicate the structure and content of data, and provides meaningful descriptions for individually named data objects.
Data dictionaries store and communicate metadata about data in a database, a system, or data used by applications. A useful introduction to data dictionaries is provided in this video. Data dictionary contents can vary but typically include some or all of the following:
- A listing of data objects (names and definitions)
- Detailed properties of data elements (data type, size, nullability, optionality, indexes)
- Entity-relationship (ER) and other system-level diagrams
- Reference data (classification and descriptive domains)
- Missing data and quality-indicator codes
- Business rules, such as for validation of a schema or data quality
- Documentation - provide data structure details for users, developers, and other stakeholders
- Communication - equip users with a common vocabulary and definitions for shared data, data standards, data flow and exchange, and help developers gage impacts of schema changes
- Application Design - help application developers create forms and reports with proper data types and controls, and ensure that navigation is consistent with data relationships
- Systems Analysis - enable analysts to understand overall system design and data flow, and to find where data interact with various processes or components
- Data Integration - clear definitions of data elements provide the contextual understanding needed when deciding how to map one data system to another, or whether to subset, merge, stack, or transform data for a specific use
- Decision Making - assist in planning data collection, project development, and other collaborative efforts
For groups of people working with similar data, having a shared data dictionary facilitates standardization by documenting common data structures and providing the precise vocabulary needed for discussing specific data elements. Shared dictionaries ensure that the meaning, relevance, and quality of data elements are the same for all users. Data dictionaries also provide information needed by those who build systems and applications that support the data. Lastly, if there is a common, vetted, and documented data resource, it is not necessary to produce separate documentation for each implementation.
Examples of Shared USGS Data Dictionaries
- EarthExplorer USGS Landsat Data Dictionary
- U.S. Geological Survey Open-File Report 03-001: Data Dictionary
- Aerial Photo Single Frames Data Dictionary
- National Elevation Dataset (NED) Data Dictionary [PDF]
(Example only - updated NED Data Dictionary will be available soon)
- National Hydrography Dataset Data Dictionary
Examples of non-USGS Data Dictionaries
- Planetary Science Dictionary (NASA)
- MODIS Level 1B Products Data Dictionary (NASA)
- Data Dictionary for Organic Carbon Sorption and Decomposition in Selected Global Soils (ORNL)
- Human Health Risk Assessment Data Dictionary (ORNL)
- Climate and Forecast Conventions Standard Name Table
- Data Dictionary for the National Database of Deep-Sea Corals (NOAA)
- JPL Planetary Data System Data Dictionary
Plan ahead for storing data at the start of any project by developing a schema or data model as a guide to data requirements. As required and optional data elements are identified, add them to the data dictionary. When data structures change, update the dictionary. Try to use naming conventions appropriate to the system or subject area. The easiest path is to adopt and cite a data standard, thus avoiding the need to provide and manage your own documentation.
The Alaska Science Center Research Data Management Plan [PDF] has excellent examples of a Data Description Form and other forms to capture metadata before, during, and at the end of a project.
For both data reviewers and data users, the data dictionary can reveal potential credibility problems within the data. Poor table organization and object naming can severely limit data understandability and ease-of-use, incomplete data definitions can render otherwise stellar data virtually useless, and failure to keep the dictionary up to date with the actual data structures suggests a lack of data stewardship. Although getting critical feedback about their data may be initially troublesome for some data creators, developing good data design and description habits is worth the effort and ultimately benefits everyone who will use the data.
Learn more about naming conventions and find guides to writing column descriptions at Best Practices for Data Dictionary Definitions and Usage and Captain Obvious' Guide to Column Descriptions - Data Dictionary Best Practices.
Most database management systems (DBMS) have built-in, active data dictionaries and can generate documentation as needed (SQL Server, Oracle, mySQL). The same is true when designing data systems using CASE tools (Computer-aided software engineering). The open source Analyzer tool for MS Access can be used to document Access databases and Access-connected data (SQL Server, Oracle, and others). Finally, use the Data Dictionary - Blank Template for manually creating a simple 'data dictionary' in Excel.
For information on creating a data dictionary in a formal metadata file (Entity and Attribute section) refer to the Metadata page.
The USGS Survey Manual Chapter 502.7 – Fundamental Science Practices: Metadata for USGS Scientific Information Products Including Data requires that data metadata records include information such as who produced the data and why, methodologies and citations, collection and processing methods, definitions of entities and attributes, geographic location, and any access or use constraints, all of which facilitate evaluation of the data and information for use.
- Data Acquisition Methods - check the data dictionary when acquiring data from external sources
- Data and File Formats - capture file, table, and field names and properties in a data dictionary
- Data Modeling - gather data requirements and use design standards to help build data dictionaries
- Data Standards - use a standard that includes a fully defined data structure
- Data Templates - use a template for a predefined schema and data dictionary
- Domains - include domains (reference lists, lookup tables) as part of the dictionary information
- Naming Conventions - apply a consistent approach to create meaningful table and field names; consider a similar naming convention for files and folders
- Organize Files and Data - include the name and description of data files in the metadata and associate the file names with tables in the data dictionary
- DOI. 2008. Data Quality Management Guide [PDF].
- USGS Science Analytics and Synthesis (SAS) - Biocomplexity Thesaurus.
- Northwest Environmental Data-Network. Best Practices for Data Dictionary Definitions and Usage [PDF].
- Craven, T. University of Western Ontario. Thesaurus Construction: Welcome to the Introductory Tutorial on Thesaurus Construction
Examples, Tools and Templates
- Entity/Attribute metadata for: Knight, R.R., Cartwright, J.M., and Ladd, D.E., 2016, Streamflow and fish community diversity data for use in developing ecological limit functions for the Cumberland Plateau, northeastern Middle Tennessee and southwestern Kentucky, 2016: U.S. Geological Survey Data Release: http://dx.doi.org/10.5066/F7JH3J83.
- JPL, 2008, Planetary Science Data Dictionary, JPL D-7116, Rev. F (Corresponds to Database Build pdscat1r71), https://mirrors.asun.co/climate-mirror/pds/pds.nasa.gov/documents/psdd/PSDDmain_1r71.pdf.
- National Water Information System (NWIS). Search Criteria and Codes.
- USDA, Ag Data Commons Data Submission Manual v1.3. Data Dictionary Blank Template.
- 24 Data Dictionary Tools.