USGS Data Lifecycle Diagram
Data Management: Manage Quality
Protocols and methods must be employed to ensure that data are properly collected, handled, processed, used, and maintained at all stages of the scientific data lifecycle. This is commonly referred to as "QA/QC" (Quality Assurance/Quality Control). QA
focuses on building-in quality to prevent defects while QC
focuses on testing for quality (e.g., detecting defects). QA
makes sure you are doing the right things, the right way. QC
makes sure the results of what you've done are what you expected.
What is QA/QC?
- QA/QC should occur throughout the data lifecycle.
- Budget time and funds for QA/QC.
- Describe the QA/QC methods in the metadata, including any software tools.
- Record any modifications to the data.
- Identify and use the proper metadata standards.
- Have a review of the metadata, performed by someone not familiar with the data and format.
- Have two, but preferably more, people transcribe the same data and compare results; have another set of eyes compare the original record to the transcribed.
- Quality control practices will vary with the data type (including multi-media) and means of collection.
- Use tools and analysis to identify quality issues when possible.
- Have a system to identify data at different points in the QA/QC process as well as a means to identify and/or correct quality issues.
- Identify Best Practices at the beginning of a research project and assign responsibility for follow-through.
- Be aware of data contamination from sources outside the study, such as faulty instruments or error introduced during a conversion process.
- Build in quality control measures when possible.
Quality Assurance (QA) can be defined as a set of activities designed to ensure that a product or service meets specified requirements. Examples of Quality Assurance can include project audits and process checklists.
Quality Control (QC) can be defined as a set of activities designed to evaluate a developed work product. An example of Quality Control can be testing products for defects.
The main difference between Quality Assurance and Quality Control is that while QA is process oriented, QC is product oriented. QC focuses on testing for quality (e.g., detecting defects) while QA focuses on building-in quality to prevent defects. QA makes sure you are doing the right things, the right way. QC makes sure the results of what you've done are what you expected.
When do we need QA/QC?
When creating a Plan, decide on the quality level, method for measuring quality, and how often the quality should be evaluated.
When setting out to organize data, establish the data Acquisition acceptance criteria and the acceptance testing process.
When undertaking to Preserve your data, conduct ongoing data improvement.
- Develop a QA/QC plan.
- Document QA/QC procedures and follow them.
- Automate QA/QC where possible.
- QA Before Data Collection
- Define standards prior to collection of the data.
- Format - Decide the format of how the data will be collected.
- Will the data be collected by hand on paper? Electronically via an instrument? If the data are digital, what format will be used?
- Codes - Define what each code word means.
- Specify units of measurement.
- Metadata - Be sure to create metadata in unison with the data to be collected.
- Assign responsibility to a person over quality assurance.
QA/QC During Data Entry
- Double entry:
- Have two people independently enter the data.
- Use computer to check for agreement.
- Record a reading of the data and transcribe from the recording.
- Use a text-to-speech program to read the data back.
- Design an efficient storage system for the data.
- Minimize the number of times the data need to be entered. Instead, use reference mechanisms such as a relational database.
- Use consistent terminology.
- Reduce data to one piece of information per cell.
- Always document any modifications to the dataset. This avoids duplicate error checking.
QC After Data Entry
- Make sure data columns and rows line up properly.
- Look for missing or irregular data entries.
- Perform statistical summaries.
- Check for outliers. This is important because if outliers are found, their presence may be due to a mistake from some level of "contamination" from the data collection or data entry. The outlier may not be a mistake at all but it is still important to check outliers for quality control and assurance. Use the following methods to look for outliers.
- Graphical methods - Normal probability plots, regression, scatterplots.
- Subtract values from the mean.
What the U.S. Geological Survey Manual Says:
The USGS Manual Chapter 502.2 - Fundamental Science Practices: Planning and Conducting Data Collection and Research discusses documenting your quality-assurance procedures:
"Standard USGS methods are employed for distinct research activities that are conducted on a frequent or ongoing basis and for types of data that are produced in large quantities. Methods must be documented to describe the processes used and the quality-assurance procedures applied."
- DataONE education modules. Accessed June 13, 2012.
- D. Edwards, in Ecological Data: Design, Management and Processing, WK Michener and JW Brunt, Eds. (Blackwell, New York, 2000), pp. 70-91.
- R. B. Cook, R. J. Olson, P. Kanciruk, L. A. Hook, Best practices for preparing ecological datasets to share and archive. Bull. Ecol. Soc. Amer. 82, 138-141 (2001).
- A. D. Chapman, "Principles of Data Quality: Report for the Global Biodiversity Information Facility" (Global Biodiversity Information Facility, Copenhagen, 2004).