Primary Navigation for the CDC Website
CDC en EspaƱol

Interactive Atlas of Reproductive Health: Statistical Methods

Statistical Stability

Rates calculated and used in the interactive map and table displays must pass a test of statistical stability. If any of the conditions below apply, the rate is not calculated and a special code is generated.

  • No counts in the denominator are coded as "no population ."

  • No counts in the numerator are coded as "no events."

  • Numerator and denominator together generate a relative standard error (RSE) of greater than 30% are coded as "insufficient data."

  • The rates are also screened for numerators less than five, as an added safety precaution. Any rates with numerators less than five and RSE less than or equal to 30% are also coded as "insufficient data."

The purpose for this action is to provide rates that are statistically stable, so that trends over time and between geographic areas can be evaluated with reasonable confidence.

Without the availability of confidence intervals it is impossible to tell whether one rate is statistically different from another. Therefore, the actual rates are not displayed on the map itself. Actual rates can be obtained by using the information icon for a single geographic area or by transferring to the interactive table mode.

Rankings of actual rates between geographic areas is strongly discouraged because usually only the differences between the highest and the lowest rates are potentially statistically significant.

Confidentiality

The chance of generating analysis cells with very small numbers (less than five) grows with the number of  strata used in an analysis. Because of the variability of subpopulation groups across a large land mass such as the United States, it was desirable to generate the numbers for all groups even though some areas of the country had very few or no members of several of the groups.

Although no personal identifiers were used in creating the data tables for the Interactive Atlas of Reproductive Health, because of the risk of revealing individual identities from subgroup combinations that lead to small cell numbers in certain geographic areas, it was desirable to set a cutoff number to suppress small cell numbers that may compromise individual confidentiality.

Statistical stability and confidentiality go hand in hand. An analysis of infant mortality rates used to determine the RSE level for statistical stability revealed that the numerator numbers required to generate an RSE of less than 30% were well above the safety cutoff of five. Therefore the confidentiality requirements were met by the requirement to ensure statistical stability.

Missing Values

Demographic groups:
For records that have demographic group fields with missing values, a special code is used to denote that the value is unknown. This way the records can be included in the summary rates, and excluded from the group rates.

Whenever possible, records with missing values are dropped from both the numerator and denominator of the rate. This is possible where numerator and denominator information are drawn from the same record—linked data, and consequently not possible when the numerator and denominator are drawn from different records or databases—period data.

Indicators:
An indicator may be based on one or more than one data element as in the case with indexes. If any of the data elements necessary to define the indicator is missing, then the record is not counted. In the case of linked data, the record is dropped from both the numerator and denominator.

Status of Occurrence and Residence

All indicators are analyzed on the basis of either the place of occurrence of the event or the place of residency of the individual. Occurrence and residence status are determined by the state FIPS codes for occurrence and residency. Records coded as foreign are dropped from the analysis including all summary statistics.

Users should note that when residence is used to define the records included in a geographic area, records from neighboring areas whose residence is recorded as the first area will be included in the analysis dataset. For example, infants born in one state (Georgia) to a resident of another state (Alabama) will be included in any area analysis set of the other state (Alabama). This can have unusual results when a state does not report all data elements. For example, if a certain state chooses not to report a certain data element, the only records that will have valid information for that element are records where the event occurs in other states that do report the data element.

National Statistics

Traditionally statistics that describe the United States (US) as a nation includes only the 50 states and the District of Columbia (DC). For the purposes of the Interactive Atlas of Reproductive Health, national statistics include data from the 50 states, DC and five US territories—American Samoa, Guam, Northern Mariana Islands, Puerto Rico and the American Virgin Islands.

The US territories were included—when they were available—to provide as much information as possible in areas where it was important. National statistics that include the territories are not significantly different from statistics that exclude the territories, however there are minor differences for some subpopulations. It should also be noted that data from the territories is not as complete as that from the 50 states and DC. For example, mortality statistics are not available for the Pacific territories for the years 1995 through 1997. And even where the records are available, many of the attributes are not. For example, Puerto Rico does not report ethnicity.

Spatial Smoothing

Spatially smoothed area rates are spatial moving averages. In the Interactive Atlas of Reproductive Health, the rate numerator is created by summing events for each map feature (county) with events from all neighboring map features (counties). The rate denominator is created by the summing the feature populations. The summed numerator is then divided by the summed denominator and multiplied by a factor of 10.x This process produces spatially smoothed area rates. For the Interactive Atlas of Reproductive Health, area "neighbors" are defined based solely on contiguity to the target feature (as opposed to distance).

Geographic smoothing algorithms "borrow information" from neighboring areas to stabilize results from sparsely populated areas. This reduces the variability in the data, allowing patterns to emerge, but increases the bias in the estimates for each small area. Consequently, the user should not attempt to interpret the results for any single county. The variance reduction, however, allows the user to identify and compare clusters of counties with similar values.

Misclassification

Misclassification of information is defined as the incorrect reporting of record attributes. Unlike missing data, misclassified data misrepresents the true value of the record attribute. An example would be reporting a female infant as a male. Misclassification rates vary by reporting system and the attribute being reported.

The issue of misclassification is of particular concern in the case of miscoded county locations. Local geographic information on vital records and other public databases is used for many purposes including the distribution of state and federal funds for community infrastructure and development. Studies comparing address matched records to direct coding on birth certificates have reported change rates as high as 9% in some of the study areas. By 1997, most states (49 of 54) had already begun using or investigating the use of address matching to improve the accuracy of county codes in vital statistics data.1

Misclassification of record attributes other than geographic localities is also a problem. Most misclassification is unintentional and results from clerical errors or misinterpretation of handwritten documents. Unintentional misclassification usually does not result in misrepresentation of the data, since the error is random in nature, although it may dilute the effect of an analysis. But some misclassification is either intentional on the part of an individual such as denial of smoking during pregnancy,2 or systematic such as that caused by the use of default codes.3 Intentional and systematic misclassification can result in misrepresentation of the data, and therefore care should be exercised when interpreting any data element that is know to be prone to intentional or systematic misclassification. Examples of intentional and systematic misclassification are smoking during pregnancy and prenatal care visits respectively.

1 MacDorman M, Gay GA. State initiatives in geocoding vital statistics data. J Public Health Manag Pract. 1999 Mar;5(2):91-3.

2 Stephanie J. Ventura, MA, Brady E. Hamilton, PhD, T. J. Mathews, MS and Anjani Chandra, PhD. Trends and Variations in Smoking During Pregnancy and Low Birth Weight: Evidence From the Birth Certificate, 1990–2000.PEDIATRICS Vol. 111 No. 5 May 2003, pp. 1176-1180. http://www.pediatrics.org/cgi/content/full/111/5/S1/1176*

3 Carter JT. Systematic Bias in the Reporting of Prenatal Care Data on Birth Certificates in Georgia, 2001. Rollins School of Public Health, Emory University (Unpublished work)

Limitations of the Data

Population Estimates: The populations used to calculate rate in the atlas are based on estimates as of July 1 for 1991–1999 and April 1 for 1990 and 2000. The intercensal population estimates for 1991–1999 are consistent with the April 1, 2000, census. Due to the incompatibility between the April 1, 2000, census, and birth certificates on the reporting of race, it was necessary to “bridge” the population data from the census to be consistent with that of the birth certificates. The bridged-race population estimates were produced through a collaborative arrangement between the National Center for Health Statistics and the U.S. Census Bureau with support from the National Cancer Institute. It should be noted that the bridged-race intercensal estimates for 1990–99 used on this Web site and the NCHS Web site differ from the estimates on the NCI Web site. NCI modified the Census Bureau’s estimates for the State of Hawaii.
(NCHS, http://www.cdc.gov/nchs/about/major/dvs/popbridge/datadoc.htm#inter1)

When earlier NCHS reports were published, it was noted that the rates in those reports were generally larger than would be the case if 2000 census-based estimates were used. (NCHS, http://www.cdc.gov/nchs/data/nvsr/nvsr51/nvsr51_12.pdf Adobe PDF logo pp1-4, 18-27.) The magnitude of the over estimates varies by population subgroup, but the over estimates were particularly large for Hispanic and American Indian population groups. For example, the fertility rate for 2001 for Hispanic women in the originally published report (based on the 1990 census) is about 11 per-cent higher than the rate in the current report (projected from the 2000 census). The differences between the 1990-and 2000-based fertility rates are negligible for non-Hispanic white women, but are sizeable for non-Hispanic black women (3 percent in 2001), Asian or Pacific Islander women (API) (7 percent), and American Indian women (18 percent). The overall effect of the revised rates is that the range in rates among population subgroups is somewhat smaller than indicated by the previously published rates, mainly reflecting the lower revised rates for Hispanic women. In addition to these differences by population subgroup, the revised rates by age differ from the originally published rates. The revised rates are notably lower than the originally published rates for women aged 25—29 years.

Population estimates for the US territories are calculated using a linear interpolation between the 2000 census, and the most recent previous census available from the US Census Bureau.

Population estimates for teen age subgroups where the available age grouping (15-19 year old women) is different from the required grouping (15-17 and 18-19 year old women) is calculated by multiplying the single 5-year age group of 15-19 year old women by 3/5 and 2/5 to create 2 subgroups—15-17 year old women and 18-19 year old women.

Fetal Loss: Comparison of fetal loss rates across state boundaries should be interpreted with caution. Past studies have shown that fetal death reporting is inconsistent between states. All state vital record registries use the WHO definition of a fetal death which was adopted in 1950 to avoid confusion from terms such as miscarriage, stillbirth, etc. However, there is evidence that fetal deaths continue to be underreported in some areas. Also, the reports that are submitted are often incomplete for medical and lifestyle risk factors leading to the possibility of a selection bias being introduced into the dataset. In 1989, there was a revision of both the standard birth certificate and fetal death report. It was anticipated that use of checkboxes on vital statistics records would encourage better reporting of specific risks and conditions listed. Nevertheless, subsequent research has found that medical and lifestyle risk factors continue to be underreported on the revised live birth certificate and fetal death report. (NCHS, http://www.cdc.gov/nchs/data/series/sr_20/sr20_031.pdf  Adobe PDF logo Methods, p2)

In most states, registration of a fetal death is only required for those occurring after 20 weeks of gestation, however, a handful of states require reporting of all products of conception. To obtain comparability across the states, the Atlas of Reproductive Health database includes only fetal death records coded with a valid gestational age of 22 weeks or greater. A review of the fetal death records from 1995–1999 reveals that only 4.8% of the records are missing valid values for gestational age, however, the range by state is 0.1% to 51.3% indicating a high degree of variability between states in the completeness of the report. A second review of all records with valid gestational ages reveals that 35.6% of the records with valid gestational data fall into the 22 weeks or greater category. However, the range by state was 6.0% to 100.0%, highlighting the differences between the states in reporting requirements.

Links to non-Federal organizations found at this site are provided solely as a service to our users. These links do not constitute an endorsement of these organizations or their programs by CDC or the Federal Government, and none should be inferred. The CDC is not responsible for the content of the individual organization Web pages found at these links.
 

 

To learn more about PDF Adobe PDF logo files and to download PDF files, you need Adobe Acrobat Reader software, which is available free of charge from Adobe. The HTML version alters the format of the original printed document. Using the PDF version will preserve the document's formatting and graphics.
 

Page last reviewed: 7/28/08
Page last modified: 1/29/07
Content source: Division of Reproductive Health, National Center for Chronic Disease Prevention and Health Promotion

rectangle border
Interactive Atlas of Reproductive Health
bullet Home
bullet GIS Concepts
bullet Atlas Update Log
bullet How to Use the Interactive Maps
bullet FAQ's
bullet Map Tools
bullet Reading the Maps and Tables
bullet Reproductive Health Indicators
bullet Demographic Groups
bullet Geographic Layers
bullet Statistical Methods
bullet Data Sources
bullet Glossary
bullet Abbreviations
rectangle border
Go to the RH interactive maps!

 

 


Reproductive Health related resources
bullet Reproductive Health Home
bullet Data and Statistics
bullet Publications and Products
bullet

Glossary

bullet Related Links

bullet Adolescent Reproductive Health
bullet Assisted Reproductive Technology
bullet Global Reproductive Health
bullet Maternal and Infant Health
bullet Refugee Reproductive Health
bullet Sudden Infant Death Syndrome
bullet Unintended Pregnancy
bullet Women's Reproductive Health

bullet Division of Reproductive Health

Contact Info
CDC/DRH
4770 Buford Hwy, NE
MS K-20
Atlanta, GA 30341-3717

Call: 1 (800) CDC-INFO
TTY: 1 (888) 232-6348
FAX: (770) 488-4760

bullet Contact Us

divider
  Home | Policies and Regulations | Disclaimer | e-Government | FOIA | Contact Us
Safer, Healthier People

Centers for Disease Control and Prevention, 1600 Clifton Rd, Atlanta, GA 30333, U.S.A
Tel: (404) 639-3311 / Public Inquiries: (404) 639-3534 / (800) 311-3435
USAGov LogoDHHS Department of Health
and Human Services