Data Management

Quality by Design: Recommended Practices

Having a plan for how to store, enter, edit, and manipulate data BEFORE data collection will save time and directly affect your ability to use those data.

<< Back to Manage Quality

  • Use a Data Template whenever possible that is designed to support the type of data you work with; share your templates with others
  • Make a data model or schema for your data to capture an overview of the data elements and their relationships
  • Make sure you have planned to accommodate all of the data you expect to collect
  • Consider dropping any data elements that you don't think you will be able to collect, or that you'd ever use (and save time by not having to manage them)
  • Classification and descriptive terms should be stored in separate tables (lookup or domain tables) and used where needed within the dataset
  • Utilize built-in quality controls such as table and row validation rules and enforced relationships between tables or files
  • For data columns (fields), specify those that are required vs optional, their data types (integer vs decimal), and acceptable value ranges - all these can prevent incorrect data from being entered
  • Give each data element a good description - sometimes the description is key to determining whether or not a value is valid (and in preparation for creating a data dictionary)

Quality by Design: Design Constraint Examples

  • Required vs Optional: For a dataset listing projects, the StartDate can be required (mandatory), but the EndDate must initially be optional as it can't be known or entered until the project is completed
  • Domain requirements: StateAbbrv field must use an official state postal abbreviation; species names must come from ITIS
  • Enforced relationships: In a Site-Sample-Observation dataset, don't enter a SiteCode in the Sample table that does not yet exist in your Site table
  • Data-type controls: make sure columns to hold date values are set for a Date datatype
  • Null value representation: missing values should be left Null or identified by a code in a separate field; missing value codes must be defined
  • Add a timestamp field to each row in your dataset, to record when the data were entered or updated