|
||||||||||||||||
|
|
How To... Review Data Quality - Periodic Data
Quality
|
|
|||||||||||
The Summary section of the PNSS Periodic Summary of Record Volume and Data Quality Report summarizes the errors identified in the Data Quality Section of the report. The Summary includes a list of the types of data quality problems identified in the report and the number of fields with each type of data quality problem.
The data quality section includes:
Missing is used to measure the completeness of the data. The edit
criteria is a field with missing data on more than 10% of PedNSS records
and more than 20% of PNSS records.
If 100% of data are missing then ask the questions: a) is the information
being collected in clinics, b) is the computer information system
capturing the information, and c) is the data being extracted from the
computer information system and included in the transaction file?
If more than 10% or 20% but less than 100% of data are missing, this
indicates that the data are captured by the computer information system,
but not all clinics are collecting data or only some of the clinics
collect data some of the time. However, for PNSS, this may be the result of data not being selected and extracted from
the computer information system for all the appropriate record types
(complete, prenatal only, or postpartum only records)
Special Considerations:
Review list of PedNSS fields edited for Missing.
Review list of PNSS
fields edited for Missing.
Mis-codes are unacceptable data for a specific field. The edit criteria for miscode errors are:
Review list of PedNSS fields edited for Mis-codes.
Review list of PNSS fields edited for Mis-codes.
A biologically implausible value (BIV) is a data value beyond the range
considered to be biologically plausible. These BIVs represent values that
are rarely observed, generally fewer than 1 in 10,000 records (.0001% of
records) and therefore thought to be in error. When more than 3% of
records have a field with a BIV the field is reported as an error. CDC has
tried to develop a consistent definition for BIVs across the different
health indicators by using cut-off points that generally represent ± 4
standard deviations.
For example, the biologically plausible range of prenatal hemoglobin (Hb)
is 8–17 g/d, so the biologically implausible range is defined as < 8.0 g/dL
or >17 g/dL. Hemoglobin BIVs on the high side may in part reflect
hematocrits mistakenly entered in the Hb field. Similarly hematocrit (Hct)
BIVs on the low side may in part reflect Hb mistakenly entered in the Hct
field.
Often reporting and recording errors contribute to a high proportion of
records with BIVs in a particular field. The BIV cut-offs selected for the
edit criteria for each field or indicator were based on a review of PedNSS
and PNSS data and external data sources. Additional information about how
the cut-offs for BIVs were developed for each field that is edited is
provided below.
Review list of PedNSS fields edited and BIV
cutoffs.
Review list of PNSS fields edited and BIV
cutoffs.
Cross-check errors are coding inconsistencies between specific fields. The edit criteria for cross-check errors are:
Review list of PedNSS fields edited for Cross-Check Errors.
Review list of PNSS
fields edited for Cross-Check Errors.
Unusual data distributions are fields that have data following a
pattern that is not typical based on observations of national PedNSS and
PNSS data.
The edit criteria for unusual data distribution errors are:
Additional information about how the edits for unusual data
distributions were developed for each field that is edited is provided
below.
Review list of PedNSS fields and edit criteria for Unusual Data
Distributions.
Review list of PNSS fields and edit criteria for Unusual Data
Distributions.
Standard deviation (SD) is a measure of the amount of variation among
values such as hemoglobin or weight-for-height in a
population. Low or smaller standard deviation used to define data that are
more or less spread out (with more or less variation) than would be expected for
the population. High or larger standard deviation define data that is more
spread out than would be expected for the population.
In PNSS, the standard deviation of the prenatal hemoglobin (Hb)/hematocrit
(Hct)distribution compares the variability in the hemoglobin/hematocrit
measures reported in the PNSS to the variability observed for healthy iron
supplemented pregnant women measured in four European studies. Data from
the four studies are aggregated into a reference for hematologic status
during pregnancy. Because hemoglobin changes during pregnancy, and the PNSS
data reflect measures taken throughout pregnancy on iron supplemented and unsupplemented women, we expect greater variability in the PNSS data than
in the European reference (SD=0.9 g/dL hemoglobin value and SD= 2.5%
hematocrit concentration). Therefore, the expected SD in PNSS is 0.9 to
1.2 g/dl for hemoglobin and 2.5% to 3.5% for hematocrit concentration. The
cutoffs for low and high standard deviation were established slightly
outside these limits (Hb < 0.8 g/dL or > 1.3 g/dL and Hct < 2.4% or >
3.6%.)
In PedNSS, the standard deviation of the hemoglobin/hematocrit
distribution compares the variability in Hb/Hct measures reported to the
PedNSS to the variability observed for Hbs and Hcts measured among
children 1-5 years old in the Second National Health and Nutrition
Examination Survey (NHANES II). We do not expect the PedNSS standard
deviations to be identical to the Hb/Hct SD of NHANES (SD=0.8 g/dL
hemoglobin value and 2.3% hematocrit concentration). Therefore, the
expected SD in PedNSS is 0.8 to 1.1 for hemoglobin and 2.3% to 3.3% for
hematocrit concentration. The cutoffs for low and high standard deviations
were established slightly outside these limits (Hb < 0.7 g/dL or > 1.2 g/dL
and Hct < 2.2% or >3.4%.)
In PedNSS, the low and high standard deviation errors for growth
indicators including BMI-for-age, weight-for-length, weight-for-age and
height-age are identified only in the Annual Summary of Record
Volume and Data Quality report and will be discussed in that section.
Review list of PedNSS fields edited for Low or High Standard Deviation.
Review list of PNSS fields edited for Low or High Standard Deviation.
PNSS records contain prenatal and postpartum data that are recorded at
different times, i.e., during and after a pregnancy. Contributors are
expected to combine information from these two different time periods into
a single record. A completion code is assigned to a record to indicate
whether the record contains data from both time periods (prenatal or
postpartum) defined as a "complete record." Data from only the prenatal or
postpartum periods are therefore defined as "prenatal only" or "postpartum only"
records.
This data quality error identifies problems with:
Completion Code or Record Linkage Errors are errors that result in
incorrect data for the record type or insufficient data for the record
type, or duplicate field values on a record. The errors that are reported
include:
Errors of insufficient data for the record type are most likely the result of incorrect assignment of the Completion Codes. For example, a prenatal only record with less than 2 prenatal fields containing data is probably a postpartum only record that was incorrectly assigned the Completion Code of Prenatal Only rather than Postpartum Only.
Review list of PNSS fields edited for Completion Code or Record Linkage Errors.