|
|
How To... Review Data Quality
CDC Data Editing
CDC Editing of the Data for Completeness and Data Quality
The edit process at CDC begins with a transaction file by the
contributor that is a flat ASCII data file containing either PedNSS or
PNSS transaction records. The transaction file is received electronically
using the Secure Data Network (SDN) or on mailed diskettes, CD-Rom, tapes
or cartridges on a monthly or quarterly schedule. Once the transaction
file is received by CDC the automated editing process is initiated.
- First, the computer edit program counts the total number of
transaction records received and the record volume by the month and year
of the child’s initial date of visit for PedNSS and the mother’s
expected date of delivery (EDD) or last menstrual period (LMP) if EDD is
not available or infant’s birth date for PNSS.
- Next, the transaction file is edited for duplicate records and
errors in the critical fields of a record. Duplicate records and
critical errors are rejected by CDC and are not included in additional
editing for data quality.
- A duplicate transaction record is a record that is mostly or
entirely identical to another record in the same transaction file.
When duplicate records are identified the first reported record is
retained.
- A critical error is missing or invalid data in a field that is
considered critical for data analysis, that is, without it, the analysis
of the PedNSS or PNSS data is not possible. Therefore, records with
critical errors are rejected from the transaction file. Critical
fields are defined differently for PedNSS and PNSS, however, they
include fields such as state, substate, date of visit, and individual
identifier.
- Then, the fields in each record are edited for data quality. The
data quality edits are conducted in the following order: missing,
mis-codes,
Biologically Implausible Value (BIVs), cross-check errors, unusual data distributions, and low and high
standard deviation. When a field is identified with a data quality
problem, the records causing the problem are not included in the next
level of analysis to prevent a data quality problem from appearing in
more than one data quality problem category in the report. For example,
missing data are not included in any of the remaining data quality
edits.
The only exception to this rule is the completion code and record
linkage edit for PNSS. This edit is conducted after the missing data
edit and if a completion code and record linkage error are identified
the individual fields in the record are further edited until the next
data quality error in the field is found. So for PNSS, a field can have
a completion code and record linkage error as well as one additional
data quality error.
- The transaction file is added to the master file and edited for
duplicate records on the master file. If duplicate records are
identified, the record on the master file is replaced.
- Finally, the Periodic Summary of Record Volume and Data Quality
report is generated.
- The transaction file is then used to update the master
file of records for the contributor. The master file is a cleaned file that the CDC updates after editing each
transaction file from the contributor. The master file is saved for the
next transaction file and update.
See a graphic illustration of the CDC editing process.
back to top
|
|