A data dictionary is used to catalog and communicate the structure and content of data, and provides meaningful descriptions for individually named data objects.
Data Dictionaries & Metadata
Data dictionary information can be used to fill in entity & attribute section or feature catalog of formal metadata. If you are working with data dictionary information within formal metadata, there are a number of tools that can help.
Data dictionaries store and communicate metadata about data in a database, a system, or data used by applications. A useful introduction to data dictionaries is provided in this video. Data dictionary contents can vary but typically include some or all of the following:
A listing of data objects (names and definitions)
Detailed properties of data elements (data type, size, nullability, optionality, indexes)
Entity-relationship (ER) and other system-level diagrams
Reference data (classification and descriptive domains)
Missing data and quality-indicator codes
Business rules, such as for validation of a schema or data quality
How Data Dictionaries are Used
Documentation - provide data structure details for users, developers, and other stakeholders
Communication - equip users with a common vocabulary and definitions for shared data, data standards, data flow and exchange, and help developers gage impacts of schema changes
Application Design - help application developers create forms and reports with proper data types and controls, and ensure that navigation is consistent with data relationships
Systems Analysis - enable analysts to understand overall system design and data flow, and to find where data interact with various processes or components
Data Integration - clear definitions of data elements provide the contextual understanding needed when deciding how to map one data system to another, or whether to subset, merge, stack, or transform data for a specific use
Decision Making - assist in planning data collection, project development, and other collaborative efforts
Data Dictionaries are for Sharing
For groups of people working with similar data, having a shared data dictionary facilitates standardization by documenting common data structures and providing the precise vocabulary needed for discussing specific data elements. Shared dictionaries ensure that the meaning, relevance, and quality of data elements are the same for all users. Data dictionaries also provide information needed by those who build systems and applications that support the data. Lastly, if there is a common, vetted, and documented data resource, it is not necessary to produce separate documentation for each implementation.
Plan ahead for storing data at the start of any project by developing a schema or data model as a guide to data requirements. As required and optional data elements are identified, add them to the data dictionary. When data structures change, update the dictionary. Try to use naming conventions appropriate to the system or subject area. The easiest path is to adopt and cite a data standard, thus avoiding the need to provide and manage your own documentation.
The Alaska Science Center Research Data Management Plan [PDF] has excellent examples of a Data Description Form and other forms to capture metadata before, during, and at the end of a project.
Data Dictionaries Can Reveal Poor Design Decisions
For both data reviewers and data users, the data dictionary can reveal potential credibility problems within the data. Poor table organization and object naming can severely limit data understandability and ease-of-use, incomplete data definitions can render otherwise stellar data virtually useless, and failure to keep the dictionary up to date with the actual data structures suggests a lack of data stewardship. Although getting critical feedback about their data may be initially troublesome for some data creators, developing good data design and description habits is worth the effort and ultimately benefits everyone who will use the data.
Most database management systems (DBMS) have built-in, active data dictionaries and can generate documentation as needed (SQL Server, Oracle, mySQL). The same is true when designing data systems using CASE tools (Computer-aided software engineering). The open source Analyzer tool for MS Access can be used to document Access databases and Access-connected data (SQL Server, Oracle, and others). Finally, use the Data Dictionary - Blank Template for manually creating a simple 'data dictionary' in Excel.
For information on creating a data dictionary in a formal metadata file (Entity and Attribute section) refer to the Metadata page.
Entity/Attribute metadata for: Knight, R.R., Cartwright, J.M., and Ladd, D.E., 2016, Streamflow and fish community diversity data for use in developing ecological limit functions for the Cumberland Plateau, northeastern Middle Tennessee and southwestern Kentucky, 2016: U.S. Geological Survey Data Release: https://doi.org/10.5066/F7JH3J83.