Interactive Atlas of Reproductive Health: Statistical
Methods |
|
Statistical Stability
Rates
calculated and used in the interactive map and table displays
must pass a test of statistical stability. If any of the conditions
below apply, the rate is not calculated and a special code is
generated.
-
No counts in the denominator are coded as "no population
."
-
No counts in the numerator are coded as "no events."
-
Numerator and denominator together generate a relative standard
error (RSE) of greater than 30% are coded as "insufficient
data."
-
The rates are also screened for numerators less than five,
as an added safety precaution. Any rates with numerators
less than five and RSE less than or equal to 30% are also
coded as "insufficient data."
The purpose
for this action is to provide rates that are statistically stable, so
that trends over time and between geographic areas can be evaluated
with reasonable confidence.
Without the
availability of confidence intervals it is impossible to tell whether
one rate is statistically different from another. Therefore, the actual
rates are not displayed on the map itself. Actual rates can be obtained
by using the information icon for a single geographic area or by transferring
to the interactive table mode.
Rankings of
actual rates between geographic areas is strongly discouraged because
usually only the differences between the highest and the lowest rates
are potentially statistically significant.
Confidentiality
The chance of
generating analysis cells with very small numbers (less than five) grows
with the number of strata used in an analysis. Because of the
variability of subpopulation groups across a large land mass such as
the United States, it was desirable to generate the numbers for all
groups even though some areas of the country had very few or no members
of several of the groups.
Although no personal identifiers were used in creating the data tables
for the Interactive Atlas of Reproductive Health, because of
the risk of revealing individual identities from subgroup combinations
that lead to small cell numbers in certain geographic areas, it was
desirable to set a cutoff number to suppress small cell numbers that
may compromise individual confidentiality.
Statistical stability and confidentiality go hand in hand. An analysis
of infant mortality rates used to determine the RSE level for statistical
stability revealed that the numerator numbers required to generate an
RSE of less than 30% were well above the safety cutoff of five. Therefore
the confidentiality requirements were met by the requirement to ensure
statistical stability.
Missing Values
Demographic groups:
For records that have demographic group fields with missing values,
a special code is used to denote that the value is unknown. This way
the records can be included in the summary rates, and excluded from
the group rates.
Whenever
possible, records with missing values are dropped from both the numerator
and denominator of the rate. This is possible where numerator and denominator
information are drawn from the same record—linked data, and consequently
not possible when the numerator and denominator are drawn from different
records or databases—period data.
Indicators:
An indicator
may be based on one or more than one data element as in the case with
indexes. If any of the data elements necessary to define the indicator
is missing, then the record is not counted. In the case of linked data,
the record is dropped from both the numerator and denominator.
Status of Occurrence and
Residence
All indicators are analyzed on the basis of either the place of occurrence
of the event or the place of residency of the individual. Occurrence
and residence status are determined by the state FIPS codes for occurrence
and residency. Records coded as foreign are dropped from the analysis
including all summary statistics.
Users should note that when residence is used to define the records
included in a geographic area, records from neighboring areas whose
residence is recorded as the first area will be included in the analysis
dataset. For example, infants born in one state (Georgia) to a resident
of another state (Alabama) will be included in any area analysis set
of the other state (Alabama). This can have unusual results when a state
does not report all data elements. For example, if a certain state chooses
not to report a certain data element, the only records that will have
valid information for that element are records where the event occurs
in other states that do report the data element.
National Statistics
Traditionally statistics that describe the United States (US) as
a nation includes only the 50 states and the District of Columbia (DC).
For the purposes of the Interactive Atlas of Reproductive Health, national
statistics include data from the 50 states, DC and five US territories—American
Samoa, Guam, Northern Mariana Islands, Puerto Rico and the American
Virgin Islands.
The US territories were included—when they were available—to provide
as much information as possible in areas where it was important. National
statistics that include the territories are not significantly different
from statistics that exclude the territories, however there are minor
differences for some subpopulations. It should also be noted that data
from the territories is not as complete as that from the 50 states and
DC. For example, mortality statistics are not available for the Pacific
territories for the years 1995 through 1997. And even where the records
are available, many of the attributes are not. For example, Puerto Rico
does not report ethnicity.
Spatial Smoothing
Spatially smoothed area rates are spatial moving averages. In the Interactive
Atlas of Reproductive Health, the rate numerator is created by summing
events for each map feature (county) with events from all neighboring
map features (counties). The rate denominator is created by the summing
the feature populations. The summed numerator is then divided by the
summed denominator and multiplied by a factor of 10.x This process produces
spatially smoothed area rates. For the Interactive Atlas of Reproductive
Health, area "neighbors" are defined based solely on contiguity to the
target feature (as opposed to distance).
Geographic smoothing algorithms "borrow information" from neighboring
areas to stabilize results from sparsely populated areas. This reduces
the variability in the data, allowing patterns to emerge, but increases
the bias in the estimates for each small area. Consequently, the user
should not attempt to interpret the results for any single county. The
variance reduction, however, allows the user to identify and compare
clusters of counties with similar values.
Misclassification
Misclassification of information is defined as the incorrect reporting
of record attributes. Unlike missing data, misclassified data misrepresents
the true value of the record attribute. An example would be reporting
a female infant as a male. Misclassification rates vary by reporting
system and the attribute being reported.
The issue of misclassification is of particular concern in the case
of miscoded county locations. Local geographic information on vital
records and other public databases is used for many purposes including
the distribution of state and federal funds for community infrastructure
and development. Studies comparing address matched records to direct
coding on birth certificates have reported change rates as high as 9%
in some of the study areas. By 1997, most states (49 of 54) had already
begun using or investigating the use of address matching to improve
the accuracy of county codes in vital statistics data.1
Misclassification of record attributes other than geographic localities
is also a problem. Most misclassification is unintentional and results
from clerical errors or misinterpretation of handwritten documents.
Unintentional misclassification usually does not result in misrepresentation
of the data, since the error is random in nature, although it may dilute
the effect of an analysis. But some misclassification is either intentional
on the part of an individual such as denial of smoking during pregnancy,2
or systematic such as that caused by the use of default codes.3
Intentional and systematic misclassification can result in misrepresentation
of the data, and therefore care should be exercised when interpreting
any data element that is know to be prone to intentional or systematic
misclassification. Examples of intentional and systematic misclassification
are smoking during pregnancy and prenatal care visits respectively.
1 MacDorman M, Gay GA. State initiatives in geocoding
vital statistics data. J Public Health Manag Pract. 1999 Mar;5(2):91-3.
2 Stephanie J. Ventura, MA, Brady E. Hamilton, PhD,
T. J. Mathews, MS and Anjani Chandra, PhD. Trends and Variations in
Smoking During Pregnancy and Low Birth Weight: Evidence From the Birth
Certificate, 1990–2000.PEDIATRICS Vol. 111 No. 5 May 2003, pp. 1176-1180.
http://www.pediatrics.org/cgi/content/full/111/5/S1/1176*
3 Carter JT. Systematic Bias in the Reporting of Prenatal
Care Data on Birth Certificates in Georgia, 2001. Rollins School of
Public Health, Emory University (Unpublished work)
Limitations of the Data
Population Estimates: The populations used to calculate rate
in the atlas are based on estimates as of July 1 for 1991–1999 and April
1 for 1990 and 2000. The intercensal population estimates for 1991–1999
are consistent with the April 1, 2000, census. Due to the incompatibility
between the April 1, 2000, census, and birth certificates on the reporting
of race, it was necessary to “bridge” the population data from the census
to be consistent with that of the birth certificates. The bridged-race
population estimates were produced through a collaborative arrangement
between the National Center for Health Statistics and the U.S. Census
Bureau with support from the National Cancer Institute. It should be
noted that the bridged-race intercensal estimates for 1990–99 used on
this Web site and the NCHS Web site differ from the estimates on the
NCI Web site. NCI modified the Census Bureau’s estimates for the State
of Hawaii.
(NCHS,
http://www.cdc.gov/nchs/about/major/dvs/popbridge/datadoc.htm#inter1)
When earlier NCHS reports were published, it was noted that the rates
in those reports were generally larger than would be the case if 2000
census-based estimates were used. (NCHS,
http://www.cdc.gov/nchs/data/nvsr/nvsr51/nvsr51_12.pdf
pp1-4,
18-27.) The magnitude of the over estimates varies by population subgroup,
but the over estimates were particularly large for Hispanic and American
Indian population groups. For example, the fertility rate for 2001 for
Hispanic women in the originally published report (based on the 1990
census) is about 11 per-cent higher than the rate in the current report
(projected from the 2000 census). The differences between the 1990-and
2000-based fertility rates are negligible for non-Hispanic white women,
but are sizeable for non-Hispanic black women (3 percent in 2001), Asian
or Pacific Islander women (API) (7 percent), and American Indian women
(18 percent). The overall effect of the revised rates is that the range
in rates among population subgroups is somewhat smaller than indicated
by the previously published rates, mainly reflecting the lower revised
rates for Hispanic women. In addition to these differences by population
subgroup, the revised rates by age differ from the originally published
rates. The revised rates are notably lower than the originally published
rates for women aged 25—29 years.
Population estimates for the US territories are calculated using a linear
interpolation between the 2000 census, and the most recent previous
census available from the US Census Bureau.
Population estimates for teen age subgroups where the available age
grouping (15-19 year old women) is different from the required grouping
(15-17 and 18-19 year old women) is calculated by multiplying the single
5-year age group of 15-19 year old women by 3/5 and 2/5 to create 2
subgroups—15-17 year old women and 18-19 year old women.
Fetal Loss: Comparison of fetal loss rates across state boundaries
should be interpreted with caution. Past studies have shown that fetal
death reporting is inconsistent between states. All state vital record
registries use the WHO definition of a fetal death which was adopted
in 1950 to avoid confusion from terms such as miscarriage, stillbirth,
etc. However, there is evidence that fetal deaths continue to be underreported
in some areas. Also, the reports that are submitted are often incomplete
for medical and lifestyle risk factors leading to the possibility of
a selection bias being introduced into the dataset. In 1989, there was
a revision of both the standard birth certificate and fetal death report.
It was anticipated that use of checkboxes on vital statistics records
would encourage better reporting of specific risks and conditions listed.
Nevertheless, subsequent research has found that medical and lifestyle
risk factors continue to be underreported on the revised live birth
certificate and fetal death report. (NCHS,
http://www.cdc.gov/nchs/data/series/sr_20/sr20_031.pdf
Methods,
p2)
In most states, registration of a fetal death is only required for those
occurring after 20 weeks of gestation, however, a handful of states
require reporting of all products of conception. To obtain comparability
across the states, the Atlas of Reproductive Health database includes
only fetal death records coded with a valid gestational age of 22 weeks
or greater. A review of the fetal death records from 1995–1999 reveals
that only 4.8% of the records are missing valid values for gestational
age, however, the range by state is 0.1% to 51.3% indicating a high
degree of variability between states in the completeness of the report.
A second review of all records with valid gestational ages reveals that
35.6% of the records with valid gestational data fall into the 22 weeks
or greater category. However, the range by state was 6.0% to 100.0%,
highlighting the differences between the states in reporting requirements.
* |
Links to non-Federal organizations found at this site are provided solely as a service to our users. These links do not constitute an endorsement of these organizations or their programs by CDC or the Federal Government, and none should be inferred. The CDC is not responsible for the content of the individual organization Web pages found at these links.
|
|
To learn more about PDF
files and to download PDF files, you need Adobe Acrobat Reader software, which is available free of charge from Adobe.
The HTML version alters the format of the original printed document. Using the PDF version
will preserve the document's formatting and graphics.
|
Page last reviewed: 7/28/08
Page last modified: 1/29/07
Content source: Division
of Reproductive Health, National
Center for Chronic Disease Prevention and Health Promotion
|