Having a plan for how to store, enter, edit, and manipulate data BEFORE data collection will save time and directly affect your ability to use those data.
- Use a Data Template whenever possible that is designed to support the type of data you work with; share your templates with others
- Make a data model or schema for your data to capture an overview of the data elements and their relationships
- Make sure you have planned to accommodate all of the data you expect to collect
- Consider dropping any data elements that you don't think you will be able to collect, or that you'd ever use (and save time by not having to manage them)
- Classification and descriptive terms should be stored in separate tables (lookup or domain tables) and used where needed within the data
- Utilize built-in quality controls such as table and row validation rules and enforced relationships between tables or files
- For data columns (fields), specify those that are required vs optional, their data types (integer vs decimal), and acceptable value ranges - all these can prevent incorrect data from being entered
- Avoid storing multiple data types in one column.
- Give each data element a good description - sometimes the description is key to determining whether or not a value is valid (and in preparation for creating a data dictionary)
Quality by Design: Design Constraint Examples
- Required vs Optional: For data listing projects, the StartDate can be required (mandatory), but the EndDate must initially be optional as it can't be known or entered until the project is completed
- Domain requirements: StateAbbrv field must use an official state postal abbreviation; species names must come from ITIS
- Enforced relationships: For Site-Sample-Observation data, don't enter a SiteCode in the Sample table that does not yet exist in your Site table
- If a numeric value requires a remark code such as > or <, add a separate remark field to store these numeric qualifiers instead of storing them in the column that should be reserved for numeric data types.
- Data-type controls: make sure columns to hold date values are set for a Date datatype
- Null value representation: missing values should be left Null or identified by a code in a separate field; missing value codes must be defined
- Add a timestamp field to each row in your data, to record when the data were entered or updated