|
Chapter 6 - Evaluating Information Quality
Once a data system exists, the key to achieving and maintaining a high level
of data quality is to regularly assess all aspects of data quality to improve
the data collection and processing procedure or correct special problems with
the data as they arise. That can be accomplished by regular assessments of the
data collected, special studies of aspects of the data and the effectiveness
of the collection and processing methods, and running quality control of key
processes to control process quality and collect data quality information.
6.1 Data Quality Assessments
Principles
- "Data quality assessments" are data quality audits of data systems
and the data collection process.
- Data quality assessments are comprehensive reviews of the data system
to note to what degree the system follows these guidelines and to assess
sources of error and other potential quality problems in the data.
- The assessments are intended to help the data system owner to improve
data quality.
- The assessments will conclude with a report on findings and results.
Guidelines
- Since data users do not have the same access to or exposure to information
about the data system that its owners have, the data system owners should
perform data quality assessments.
- Data quality assessments should be undertaken periodically to ensure that
the quality of the information disseminated meets requirements.
- Data quality assessments should be used as part of a data system redesign
effort.
- Data users, including secondary data users, should be consulted to suggest
areas to be assessed, and to provide feedback on the usefulness of the data
products.
- Assessments should involve at least one member with a knowledge of data
quality who is not involved in preparing the data system information for
public dissemination.
- Findings and results of a data quality assessment should always be documented.
References
- General Accounting Office, Performance Plans: Selected Approaches for
Verification and Validation of Agency Performance Information, GAO/GGD-99-139
(July 1999).
6.2 Evaluation Studies
Principles
- Evaluation studies are focused experiments carried out to evaluate some
aspect of data quality.
- Many aspects of data quality cannot be assessed by examining end-product
data.
- Evaluation studies include re-measurement, independent data collection,
user surveys, collection method parallel trials (e.g., incentive tests),
census matching, administrative record matching, comparisons to other collections,
methodology testing in a cognitive lab, and mode studies.
- "Critical data systems" are systems that either contain data
identified as "influential" or provide input to DOT-level performance
measures.
Guidelines
- Critical data systems should have a program of evaluation studies to estimate
the extent of each aspect of non-sampling error periodically and after a
major system redesign.
- Critical data systems should periodically evaluate bias due to missing
data, coverage bias, measurement error, and user satisfaction.
- All data systems should conduct an evaluation study when there is evidence
that one or more error sources could be compromising key data elements enough
to make them fail to meet data requirements.
- All data systems should conduct an evaluation study if analysis of the
data reveals a significant problem, but the source is not obvious.
References
- General Accounting Office, Performance Plans: Selected Approaches for
Verification and Validation of Agency Performance Information, GAO/GGD-99-139
(July 1999).
- Office of Management and Budget, Statistical Policy Working Paper 31:
Measuring and Reporting Sources of Error in Surveys (July 2001).
- Lessler, J. and W. Kalsbeek. 1992. Nonsampling Error in Surveys.
New York, NY: Wiley.
6.3 Quality Control Processes
Principles
- Activities in survey collection and processing will add error to the data
to some degree. Therefore, each activity needs some form of quality control
process to prevent and/or correct error introduced during the activity.
- The more complex or tedious an activity is, the more likely error will
be introduced, and therefore, the more elaborate the quality control needs
to be.
- A second factor that will determine the level of quality control is the
importance of the data being processed.
- Data system activities that need extensive quality control are check-in
of paper forms, data entry from paper forms, coding, editing, and imputation.
- Quality control methods include 100% replication, as with key entry of
critical data, sample replication (usually used in a stable continuous process),
analysis of the data file before and after the activity, and simple reviews.
Guidelines
- Each activity should be examined for its potential to introduce error.
- The extent of quality control for each activity should be based on the
potential of the activity to introduce error combined with the importance
of the data.
- Data should be collected from the quality control efforts to indicate
the effectiveness of the quality control and to help determine whether it
should be changed.
- The quality control should be included in the documentation of methods
at each stage.
References
- Ott, E., E. Shilling, and D. Neubauer. 2000. Process Quality Control: Troubleshooting
and Interpretation of Data. New York, NY: McGraw-Hill.
6.4 Data Error Correction
Principles
- No data system is free of errors.
- Actions taken when evidence of data error comes to light are dependent
on the strength of the evidence, the impact that the potential error would
have on primary estimates produced by the data system, and the resources
required to verify and correct the problem.
Guidelines
- A standard process for dealing with possible errors in the data system
should exist and be documented.
- If a disseminated data file is "frozen" for practical reasons
(e.g., reproducibility and configuration management) when errors in the
data become known, the errors should be documented and accompany the data.
References
- Ott, E., E. Shilling, and D. Neubauer. 2000. Process Quality Control:
Troubleshooting and Interpretation of Data. New York, NY: McGraw-Hill.
|
|