U.S. Census Bureau

Source and Accuracy of Estimates for the June 2002 CPS Microdata File on Fertility and Birth Expectations

Table of Contents

SOURCE OF DATA
  Basic CPS
  June 2002 Supplement
  Basic CPS Sample Design
  Sample Redesign
  Estimation Procedure

ACCURACY OF THE ESTIMATES
  Sampling Error
  Nonsampling Error
  Nonresponse
  Coverage
  A Nonsampling Error Warning
  Standard Errors and Their Use
  Standard Errors of Estimated Numbers
  Standard Errors of Estimated Percentages
  Standard Error of a Difference
  Standard Error of a Fertility Ratio
  Standard Error of a Ratio
  Standard Errors for Region, State and Nonmetropolitan Estimates

Tables

Table 1. CPS Coverage Ratios
Table 2. Parameters for Computation of Standard Errors for Labor Force Characteristics
Table 3. Parameters for Computations of Standard Errors for June 2002 Supplement Characteristics
Table 4. Parameters for Computation of Standard Errors for June 2002 Fertility Ratios
Table 5. State Factors to be Applied to 2002 Parameters
Table 6. Region Factors to be Applied to 2002 Parameters


Source and Accuracy of Estimates for the June 2002 CPS Microdata File on Fertility and Birth Expectations

SOURCE OF DATA

The data in this microdata file come from the June 2002 Current Population Survey (CPS). The June survey uses two sets of questions, the basic CPS and the supplement.

Basic CPS. The basic CPS collects primarily labor force data about the civilian noninstitutional population. Interviewers ask questions concerning labor force participation about each member 15 years old and over in every sample household.

June 2002 Supplement. In addition to the basic CPS questions, interviewers asked supplementary questions in June 2002 about fertility of women between 15 and 44 years of age.

Basic CPS Sample Design. The present CPS sample was selected from the 1990 Decennial Census files with coverage in all 50 states and the District of Columbia. The sample is continually updated to account for new residential construction. To obtain the sample, the United States was divided into 2,007 geographic areas. In most states, a geographic area consisted of a county or several contiguous counties. In some areas of New England and Hawaii, minor civil divisions are used instead of counties. These 2,007 geographic areas were then grouped into 754 strata, and one geographic area was selected from each stratum.

About 60,000 occupied households are eligible for interview every month out of these 754 areas. Interviewers are unable to obtain interviews at about 4,500 of these units. This occurs when the occupants are not found at home after repeated calls or are unavailable for some other reason.

The number of households that are eligible for interview in the basic CPS increased from 50,000 to 60,000 in July of 2001. This increase in the number of eligible households is due to the implementation of the State Children’s Health Insurance Program (SCHIP) sample expansion. The SCHIP sample expansion increased the monthly CPS sample in states with high sampling errors for low-income uninsured children. With the increase in eligible households, the number of units where interviewers were unable to obtain an interview increased from 3,200 to 4,500.

Sample Redesign. Since the introduction of the CPS, the Census Bureau has redesigned the CPS sample several times. These redesigns have improved the quality and accuracy of the data and have satisfied changing data needs. The most recent changes were completely implemented in July 1995.

Estimation procedure. This survey’s estimation procedure adjusts weighted sample results to agree with independent estimates of the civilian noninstitutional population of the United States by age, sex, race, Hispanic/non-Hispanic origin, and state of residence. The adjusted estimate is called the post-stratification ratio estimate. The independent estimates are calculated based on information from three primary sources:

The independent population estimates include some, but not all, undocumented immigrants.

ACCURACY OF THE ESTIMATES

A sample survey estimate has two possible types of error: sampling and nonsampling. The accuracy of an estimate depends on both types of error. The nature of the sampling error is known given the survey design. The full extent of the nonsampling error, however, is unknown.

Sampling Error. Since the CPS estimates come from a sample, they may differ from figures from a complete census using the same questionnaires, instructions, and enumerators. This possible variation in the estimates due to sampling error is known as "sampling variability."

Consequently, one should be particularly careful when interpreting results based on a relatively small number of cases or on small differences between estimates. The standard errors for CPS estimates, as calculated by methods described in "Standard Errors and Their Use", primarily indicate the magnitude of sampling error. They also partially measure the effect of some nonsampling errors in responses and enumeration, but do not measure systematic biases in the data. (Bias is the average over all possible samples of the differences between the sample estimates and the desired value.)

Nonsampling Error. All other sources of error in the survey estimates are collectively called nonsampling error. Sources of nonsampling errors include the following:

Two types of nonsampling error that can be examined to a limited extent are nonresponse and undercoverage.

Nonresponse. The effect of nonresponse cannot be measured directly, but one indication of its potential effect is the nonresponse rate. For the June 2002 basic CPS, the nonresponse rate was 7.2%. The nonresponse rate for the Fertility supplement was an additional 4.6%, for a total supplement nonresponse rate of 11.5%.

Coverage. The concept of coverage in the survey sampling process is the extent to which the total population that could be selected for sample covers the survey’s target population. CPS undercoverage results from missed housing units and missed persons within sample households. Overall CPS undercoverage is estimated to be about 8 percent. CPS undercoverage varies with age, sex, and race. Generally, undercoverage is larger for males than for females and larger for Blacks and other races combined than for Whites.

As described previously, ratio estimation to independent age-sex-race-Hispanic population controls partially corrects for the bias due to undercoverage. However, biases exist in the estimates to the extent that missed persons in missed households or missed persons in interviewed households have different characteristics from those of interviewed persons in the same age-sex-race-origin-state group.

A common measure of survey coverage is the coverage ratio, the estimated population before post-stratification divided by the independent population control. Table A shows CPS coverage ratios for age-sex-race groups for a typical month. The CPS coverage ratios can exhibit some variability from month to month. Other Census Bureau household surveys experience similar coverage.

Table 1.  CPS Coverage Ratios
Age Non-Black Black All Persons
M F M F M F Total
0-14 0.929 0.964 0.850 0.838 0.916 0.943 0.929
15 0.933 0.895 0.763 0.824 0.905 0.883 0.895
16-19 0.881 0.891 0.711 0.802 0.855 0.877 0.866
20-29 0.847 0.897 0.660 0.811 0.823 0.884 0.854
30-39 0.904 0.931 0.680 0.845 0.877 0.920 0.899
40-49 0.928 0.966 0.816 0.911 0.917 0.959 0.938
50-59 0.953 0.974 0.896 0.927 0.948 0.969 0.959
60-64 0.961 0.941 0.954 0.953 0.960 0.942 0.950
65-69 0.919 0.972 0.982 0.984 0.924 0.973 0.951
70+ 0.993 1.004 0.996 0.979 0.993 1.002 0.998
15+ 0.914 0.945 0.767 0.874 0.898 0.927 0.918
  0+ 0.918 0.949 0.793 0.864 0.902 0.931 0.921

A Nonsampling Error Warning. Since the full extent of the nonsampling error is unknown, one should be particularly careful when interpreting results based on small differences between estimates. Even a small amount of nonsampling error can cause a borderline difference to appear significant or not, thus distorting a seemingly valid hypothesis test. Caution should also be used when interpreting results based on a relatively small number of cases. Summary measures probably do not reveal useful information when computed on a base1 smaller than 75,000.

For additional information on nonsampling error including the possible impact on CPS data when known, refer to

Standard Errors and Their Use. The sample estimate and its standard error enable one to construct a confidence interval. A confidence interval is a range that would include the average result of all possible samples with a known probability. For example, if all possible samples were surveyed under essentially the same general conditions and using the same sample design, and if an estimate and its standard error were calculated from each sample, then approximately 90 percent of the intervals from 1.645 standard errors below the estimate to 1.645 standard errors above the estimate would include the average result of all possible samples.

A particular confidence interval may or may not contain the average estimate derived from all possible samples. However, one can say with specified confidence that the interval includes the average estimate calculated from all possible samples.

Standard errors may also be used to perform hypothesis testing. This is a procedure for distinguishing between population parameters using sample estimates. The most common type of hypothesis is that the population parameters are different. An example of this would be comparing the percentage of women between 15 and 44 years of age who had a child in 2002 that were in the labor force to the percentage of women between 15 and 44 years of age who had a child in 2000 that were in the labor force. An illustration of this is included in the following pages.

Tests may be performed at various levels of significance. A significance level is the probability of concluding that the characteristics are different when, in fact, they are the same. To conclude that two parameters are different at the 0.10 level of significance, the absolute value of the estimated difference between characteristics must be greater than or equal to 1.645 times the standard error of the difference.

The Census Bureau uses 90-percent confidence intervals and 0.10 levels of significance to determine statistical validity. Consult standard statistical textbooks for alternative criteria.

For information on calculating standard errors for labor force data from the CPS which involve quarterly or yearly averages, changes in consecutive quarterly or yearly averages, consecutive month-to-month changes in estimates, and consecutive year-to-year changes in monthly estimates, see "Explanatory Notes and Estimates of Error: Household Data" in the corresponding Employment and Earnings published by the Bureau of Labor Statistics.

Standard Errors of Estimated Numbers. The approximate standard error, sx, of an estimated number from this microdata file can be obtained using this formula:

Select to see text version of Formula 1. (1)

Here x is the size of the estimate and a and b are the parameters in Table 2 or 3 associated with the particular type of characteristic. When calculating standard errors from cross-tabulations involving different characteristics, use the set of parameters for the characteristic which will give the largest standard error.

Illustration

Suppose there were 3,210,000 unemployed women 15-44 years of age in the civilian labor force. Use the appropriate parameters from Table 2 and formula (1) to get

Number, x 3,210,000
a parameter -0.000033
b parameter 2,693
Standard error 91,000
90% conf. int. 3,060,000 to 3,360,000

The standard error is calculated as

Select to see text version of Standard Error 1.

the 90-percent confidence interval is calculated as 3,210,000 ± 1.645 × 91,000.

A conclusion that the average estimate derived from all possible samples lies within a range computed in this way would be correct for roughly 90 percent of all possible samples.

Standard Errors of Estimated Percentages. The reliability of an estimated percentage, computed using sample data for both numerator and denominator, depends on the size of the percentage and its base. Estimated percentages are relatively more reliable than the corresponding estimates of the numerators of the percentages, particularly if the percentages are 50 percent or more. When the numerator and denominator of the percentage are in different categories, use the parameter from Table 2 or 3 indicated by the numerator.

The approximate standard error, sx,p, of an estimated percentage can be obtained by use of the formula

Select to see text version of Formula 2. (2)

Here x is the total number of persons, families, households, or unrelated individuals in the base of the percentage, p is the percentage (0 ≤ p ≤ 100), and b is the parameter in Table 2 or 3 associated with the characteristic in the numerator of the percentage.

Illustration

Suppose that 6.1 percent of the 61,361,000 women 15-44 years old had a child in the last year. Use the appropriate parameter from Table 3 and formula (2) to get

Percentage, p 6.1
Base, x 61,361,000
b parameter 2,016
Standard error 0.14
90% conf. int. 5.9 to 6.3

The standard error is calculated as

Select to see text version of Standard Error 2.

The 90-percent confidence interval for the percentage of women 15-44 years old who had a child in the last year is calculated as 6.1 ± 1.645 × 0.14.

Standard Error of a Difference. The standard error of the difference between two sample estimates is approximately equal to

Select to see text version of Formula 3. (3)

where sx and sy are the standard errors of the estimates, x and y. The estimates can be numbers, percentages, ratios, etc. This will represent the actual standard error quite accurately for the difference between estimates of the same characteristic in two different areas, or for the difference between separate and uncorrelated characteristics in the same area. However, if there is a high positive (negative) correlation between the two characteristics, the formula will overestimate (underestimate) the true standard error.

Illustration

Suppose that of 3,766,000 women in 2002 between 15-44 years of age who had a child in the previous year, 2,056,000 or 54.6 percent were in the labor force, and of the 3,934,000 women in 2000 between 15-44 years of age who had a child in the previous year, 2,170,000 or 55.2 percent were in the labor force. Use the appropriate parameters from Table 2 and formulas (3) and (4) to get

X Y difference
Percentage, p 54.6 55.2 0.6
Number, x 3,766,000 3,934,000 -
b parameter 2,693 2,530 -
Standard error 1.33 1.26 1.83
90% conf. int. 52.4 to 56.8 53.1 to 57.3 -2.4 to 3.6

The standard error of the difference is calculated as

Select to see text version of Standard Error 3.

The 90-percent confidence interval around the difference is calculated as 0.6 ± 1.645 × 1.83. Since this interval includes zero, we cannot conclude with 90 percent confidence that the percentage of women between 15-44 years of age who had a child in 2002 who were in the labor force is different from the percentage of women between 15-44 years of age who had a child in 2000 who were in the labor force.

Standard error of a fertility ratio. The standard error of a fertility ratio is a function of the number of children ever born per 1,000 women and the number of women in a given category. The formula for the standard error of a fertility ratio is

Select to see text version of Formula 4. (4)

where a, b and c are the parameters from Table 4, x is the number of children ever born or expected per 1,000 women and y is the number of women, in thousands. This formula should be used when calculating standard errors for estimates involving the possibility of more than one event per woman, i.e., number of children ever born. For data involving at most one event per woman, convert the ratio to a percentage and use formula (2) and the parameters in Table 2 or 3 to calculate the standard errors.

Illustration

Suppose that 11,561,000 women 40-44 years old had 1,930 children ever born per 1,000 women. Use formula (4) and the parameters in Table 4 to get

Children Ever Born, x 1,930
Base, 1,000y 11,561,000
a parameter +0.0000013
b parameter 810
c parameter 1,479
Standard error 25
90% conf. int. 1,889 to 1,971

The standard error is calculated as

Select to see text version of Standard Error 4.

The 90-percent confidence interval is from 1,889 to 1,971 children ever born per 1,000 women (i.e., 1,930 ± 1.645 × 25). A conclusion that the average estimate derived from all possible samples lies within a range computed in this way would be correct for roughly 90 percent of all possible samples.

Standard Error of a Ratio. Certain estimates may be calculated as the ratio of two numbers. The standard error of a ratio, x/y, may be computed using

Select to see text version of Formula 5. (5)

The standard error of the numerator, sx, and that of the denominator, sy, may be calculated using formulas described earlier. In formula (5), r represents the correlation between the numerator and the denominator of the estimate.

For one type of ratio, the denominator is a count of families or households and the numerator is a count of persons in those families or households with a certain characteristic. If there is at least one person with the characteristic in every family or household, use 0.7 as an estimate of r. An example of this type is the mean number of children per family with children.

For all other types of ratios, r is assumed to be zero. If r is actually positive (negative), then this procedure will provide an overestimate (underestimate) of the standard error of the ratio. Examples of this type are the mean number of children per family and the poverty rate.

NOTE: For estimates expressed as the ratio of x per 100 y or x per 1,000 y, multiply formula (5) by 100 or 1,000, respectively, to obtain the standard error.

Illustration

Suppose there are 35,579,000 ever-married women 15-44 years old and 25,782,000 never-married women 15-44 years old. The ratio of ever-married women, x, to never-married women, y, is 1.38. Use the appropriate parameters from Table 3 and equations (1) and (5) to get

x y ratio
Estimate 35,579,000 25,782,000 1.38
a parameter -0.000022 -0.000022 -
b parameter 4,687 4,687 -
Standard error 373,000 326,000 0.02
90% conf. int. 34,965,000 to 36,193,000 25,246,000 to 26,318,000 1.35 to 1.41

Using formula (5) with r = 0, the estimate of the standard error is

Select to see text version of Standard Error 5.

The 90-percent confidence interval is calculated as 1.38 ± 1.645 × 0.02.

Standard errors for region, state and nonmetropolitan estimates. Multiply the parameters in Tables 2, 3 and 4 by the factors in Tables 5 and 6 to get state, region and nonmetropolitan parameters for labor force and fertility estimates.


Table 2. Parameters for Computation of Standard Errors for
Labor Force Characteristics: June 2002
Characteristic a b

Civilian Labor Force, Employed, and Not in Labor Force
  Total or White -0.000008 1,586
    Men -0.000035 2,927
    Women -0.000033 2,693
    Both sexes, 16 to 19 years -0.000244 3,005
  Black -0.000154 3,296
    Men -0.000336 3,332
    Women -0.000282 2,944
    Both sexes, 16 to 19 years -0.001531 3,296
  Hispanic ancestry -0.000187 3,296
    Men -0.000363 3,332
    Women -0.000380 2,944
    Both sexes, 16 to 19 years -0.001822 3,296

Unemployment
  Total or White -0.000017 3,005
    Men -0.000035 2,927
    Women -0.000033 2,693
    Both sexes, 16 to 19 years -0.000244 3,005
  Black -0.000154 3,296
    Men -0.000336 3,332
    Women -0.000282 2,944
    Both sexes, 16 to 19 years -0.001531 3,296
  Hispanic ancestry -0.000187 3,296
    Men -0.000363 3,332
    Women -0.000380 2,944
    Both sexes, 16 to 19 years -0.001822 3,296

Agricultural Employment
0.001345 2,989

Notes: These parameters are to be applied to basic CPS monthly labor force estimates. For foreign-born and noncitizen characteristics for Total and White, the a and b parameters should be multiplied by 1.3. No adjustment is necessary for foreign-born and noncitizen characteristics for Blacks and Hispanics.



Table 3. Parameters for Computation of Standard Errors for
June 2002 Supplement Characteristics
Characteristic Persons Households, etc.
a b a b

FERTILITY1
  Total or White -0.000033 2,016 (X) (X)
  Black -0.000224 2,016 (X) (X)
  Hispanic -0.000382 3,688 (X) (X)
  Asian/Pacific Islander -0.000548 2,016 (X) (X)

NUMBER OF BIRTHS
  Total or White -0.000060 3,676 (X) (X)
  Black -0.000408 3,670 (X) (X)
  Hispanic -0.000682 6,576 (X) (X)
  Asian/Pacific Islander -0.000997 3,670 (X) (X)

MARITAL STATUS, HOUSEHOLD & FAMILY CHARACTERISTICS
  Total or White -0.000022 4,687 -0.000009 1,860
  Black -0.000253 6,733 -0.000063 1,683
  Hispanic Origin -0.000464 11,347 -0.000116 2,836
  Asian/Pacific Islander -0.000616 6,733 -0.000154 1,683

INCOME
  Total or White -0.000010 2,207 -0.000009 2,016
  Black -0.000095 2,527 -0.000083 2,201
  Hispanic Origin -0.000174 4,259 -0.000152 3,709
  Asian/Pacific Islander -0.000231 2,527 -0.000201 2,201

EDUCATIONAL ATTAINMENT
  Total or White -0.000010 2,131 -0.000009 1,860
  Black -0.000091 2,410 -0.000063 1,683
  Hispanic Origin -0.000112 2,745 -0.000116 2,836
  Asian/Pacific Islander -0.000178 1,946 -0.000154 1,683

NATIVITY - Born in:
  Mexico, other N. America, S. America -0.000036 9,942 (X) (X)
  Europe -0.000021 5,712 (X) (X)
  Asia, Africa, Oceania -0.000034 9,310 (X) (X)
  United States -0.000017 4,687 (X) (X)



Table 4. Parameters for Computation of Standard Errors
for June 2002 Fertility Ratios
a b c
0.000013 810 1,479

Note: Multiply the parameters by 1.3 to get foreign born parameters.



Table 5. State Factors to be Applied to 2002 Parameters
State Factor
Alabama 0.94
Alaska 0.12
Arizona 1.15
Arkansas 0.64
California 1.49
Colorado 0.67
Connecticut 0.55
Delaware 0.18
District of Columbia 0.14
Florida 1.14
Georgia 1.70
Hawaii 0.26
Idaho 0.30
Illinois 1.08
Indiana 0.92
Iowa 0.51
Kansas 0.48
Kentucky 0.83
Louisiana 1.05
Maine 0.21
Maryland 0.93
Massachusetts 0.93
Michigan 1.05
Minnesota 0.81
Mississippi 0.73
Missouri 1.00
Montana 0.23
Nebraska 0.34
Nevada 0.35
New Hampshire 0.22
New Jersey 0.92
New Mexico 0.46
New York 1.00
North Carolina 1.09
North Dakota 0.13
Ohio 1.13
Oklahoma 0.72
Oregon 0.68
Pennsylvania 1.04
Rhode Island 0.16
South Carolina 0.83
South Dakota 0.13
Tennessee 1.35
Texas 1.37
Utah 0.62
Vermont 1.12
Virginia 1.32
Washington 1.11
West Virginia 0.34
Wisconsin 0.82
Wyoming 0.10



Table 6. Region Factors to be Applied to 2002 Parameters
Region Factor
Northeast 0.91
Midwest 0.93
South 1.14
West 1.15
Nonmetropolitan characteristics 1.50


1 subpopulation