Manage Quality

By Data Management

Data-quality management is a process where protocols and methods are employed to ensure that data are properly collected, handled, processed, used, and maintained at all stages of the scientific data lifecycle.

Quality Assurance (QA) & Quality Control (QC)

Yin yang symbol with quality assurance - prevention on one side and quality control - detection on the other

QA & QC are often used interchangeably, but they mean different things. QA refers to defect prevention, whereas QC refers to defect detection. Generally, QA is applied before and during data acquisition, whereas QC is applied after data are in hand.

What is a Data 'Defect'?

Map showing data points in California with one outlier off the coast of China

In a data context, a 'defect' is any data issue that negatively affects fitness for use, such as a numeric value error, incorrect classification term, gaps in data series, or failed data transformations.

Quality Assurance Plans
Quality Assurance (QA) - Preventing Data Issues
Quality Control (QC) - Detecting and Repairing Data Issues
Documenting Data Quality
Responsibilities for Data and Information Quality
What the U.S. Geological Survey Manual Requires
Recommended Reading

Quality Assurance Plans

Yes, you can plan ahead for high-quality data! A Quality Assurance Plan (QAP) is used to define the criteria and processes that will ensure and verify that data meet specific data-quality objectives throughout the Data Lifecycle. Some agencies and organizations require a QAP as part of a research proposal, before funding a project (for example, USEPA). Like the DMP, the QAP (if a separate document) would be revised as needed during a project timeline to reflect the reality of the data workflow and activities.

Quality Assurance (QA) - Preventing Data Issues

Preventing the creation of defective data is the most effective means of ensuring the ultimate quality of your data products and the research that depends upon that data. QA refers to utilizing written criteria, methods and processes that will ensure the production of data that meet a specified quality standard.

Quality Assurance (QA) - Preventing Data Issues: Recommended Practices and Examples

Quality by Design

Having a plan for how to store, enter, edit, and manipulate data BEFORE data collection will save time and directly affect your ability to use those data. By starting with a conceptual design (or schema) of the data you can ensure that you have considered all of the data you intend to store, the data types they represent, the relationships between different chunks of data, and the data domains that will support the primary data you collect.

Quality by Design: Recommended Practices and Design Constraint Examples

Domain Management and Reference Data

Terms used to classify or describe data elements can help or hurt the usefulness of the data. Data domains and Reference data are often implemented as lookup tables or drop-down boxes on forms and define the allowable values for an attribute. Terms that are descriptive (such as color and size) are relative, whereas terms that are used for classification are more discrete (ecoregion, land use category).

Domain Management: Recommended Practices and Examples

Quality Control (QC) - Detecting and Repairing Data Issues

Quality control (QC) of data refers to the application of methods or processes that determine whether data meet overall quality goals and defined quality criteria for individual values. In order to determine whether data are 'good' or 'bad' - or to what degree they are so - one must have a set of quality goals and specific criteria against which data are evaluated. Rapid data scanning methods can be used to tag records or sets of records that meet or fail to meet a particular criterion. Remember that QC is a partner to QA, because when errors are found, a way to prevent them via QA might also be revealed.

Quality Control (QC) - Detecting and Repairing Data Issues: Recommended Practices and Examples

Data Quality Assessment and Review

Project staff should perform periodic data-assessments during the project cycle to discover errors prior to project completion. These reviews do not need to be overly complicated, but instead serve as an opportunity to keep your data management plan, quality goals and metrics, and metadata up to date, and to generate documentation about adherence to your quality plan. Data from outside sources need to be assessed for quality issues prior to use. Real-time and streaming data processes include some level of quality control.

Data Quality Assessment and Review: Recommended Practices and References

Using Data Quality Indicators

The quality of individual measurement or observation data should not be hidden in metadata or documentation associated with the data. Rather, indicators of quality or usability can and should be stored with the data themselves in separate fields or columns. That allows potential data users to avoid validating unusual data that have already been justified, and to determine which values are fit for specific uses.

Using Data Quality Indicators: Examples

Documenting Data Quality

Describing your data, like managing quality, is a cross-cutting element of the USGS Science Data Lifecycle. In addition to using data quality indicators within your data, quality-management documentation may take the form of a QAP or sections within the DMP about specific quality goals and criteria, along with any quality assessment summaries and notes on massaging data to meet the content needs of your project. The FGDC metadata standard includes sections specifically reserved for Data Quality Information.

Documenting Data Quality: Considerations

Responsibilities for Data and Information Quality

Responsibilities for quality work and work products are reflected within the Code of Conduct for Department of Interior staff (poster), specifically to ensure the highest level of data quality in scientific and scholarly information products:

"I will be responsible for the quality of the data I use or create and the integrity of the conclusions, interpretations, and applications I make. I will adhere to appropriate quality assurance and quality control standards, and not withhold information because it might not support the conclusions, interpretations, and applications I make."

As stated in the USGS Information Quality Guidelines:

"The USGS provides unbiased, objective scientific information upon which other entities may base judgments. Since its inception in 1879, the USGS has maintained comprehensive internal and external procedures for ensuring the quality, objectivity, utility, and integrity of data, analyses, and scientific conclusions. ... Information Quality ... covers all information produced by the USGS in any medium, including data sets, web pages, maps, audiovisual presentations in USGS-published information products, or in publications of outside entities."

What the U.S. Geological Survey Manual Requires:

General Policies that apply to Data Quality within the USGS

USGS Fundamental Science Practices

References

DataONE Data Management Skillbuilding Hub.
Hook, Les A., Suresh K. Santhana Vannan, Tammy W. Beaty, Robert B. Cook, and Bruce E. Wilson. 2010. Best Practices for Preparing Environmental Data Sets to Share and Archive. Available online from Oak Ridge National Laboratory Distributed Active Archive Center, Oak Ridge, Tennessee, U.S.A.
A. D. Chapman, "Principles of Data Quality: Report for the Global Biodiversity Information Facility" (Global Biodiversity Information Facility, Copenhagen, 2004).