Annual Demographic Survey (March Supplement)

Source and Accuracy of the Data for the March 1999
Current Population Survey Microdata File


Table of Contents

SOURCE OF DATA
Basic CPS
March Supplement
Sample design
Sample redesign
Estimation procedure

ACCURACY OF THE ESTIMATES
Sampling error
Nonsampling error
Nonresponse
Coverage
A Nonsampling error warning
Standard errors and their use
Estimating standard errors
Generalized Variance Parameters
Standard errors of estimated numbers
Standard errors of estimated percentages
Standard error of a difference
Standard error of an average for grouped data
Standard error of a ratio
Standard error of a median
Standard error of estimated per capita deficit
Accuracy of state estimates
Computation of standard errors for state estimates
Computation of a factor for groups of states
Computation of standard errors for data for combined years
Comparability of data

Tables
CPS Coverage Ratios
Parameters for Computation of Standard Errors for Labor Force Characteristics: March 1999
a and b Parameters for Standard Error Estimates for People and Families: March 1999
Factors for State Standard Errors and Parameters and State Populations: 1999

SOURCE OF DATA

The data for this survey came from the March 1999 Current Population Survey (CPS), conducted by the Census Bureau. The March survey uses two sets of questions, the basic CPS and the supplement.

Basic CPS. The monthly CPS collects primarily labor force data about the civilian noninstitutional population. Interviewers ask questions concerning labor force participation about each member 15 years old and over in every sample household.

March Supplement. In March 1999, the interviewers asked additional questions to supplement the basic CPS questions. These additional questions covered the following topics:

Sample design. The present CPS sample was selected from the 1990 Decennial Census files with coverage in all 50 states and the District of Columbia. The sample is continually updated to account for new residential construction. To obtain the sample, the United States was divided into 2,007 geographic areas. In most states, a geographic area consisted of a county or several contiguous counties. In some areas of New England and Hawaii, minor civil divisions are used instead of counties. These 2,007 geographic areas were then grouped into 754 strata, and one geographic area was selected from each stratum. About 50,000 occupied households are eligible for interview every month out of these 754 areas. Interviewers are unable to obtain interviews at about 3,200 of these units. This occurs when the occupants are not found at home after repeated calls or are unavailable for some other reason.

To obtain more reliable data for the Hispanic (Hispanics may be of any race) population, the March CPS sample was increased by about 2,500 eligible housing units. These housing units were interviewed the previous November and found to contain at least one sample person of Hispanic ancestry. In addition, the sample included people in the armed forces living off post or with their families on post.

Sample redesign. Since the introduction of the CPS, the Census Bureau has redesigned the CPS sample several times. These redesigns have improved the quality and accuracy of the data and have satisfied changing data needs. The most recent changes were phased in and implementation was completed in July 1995.

Estimation procedure. This survey's estimation procedure adjusts weighted sample results to agree with independent estimates of the civilian noninstitutional population of the United States by age, sex, race, Hispanic/non-Hispanic ancestry, and state of residence. The adjusted estimate is called the post-stratification ratio estimate. The independent estimates are calculated based on information from four primary sources:

The estimation procedure for the March supplement included a further adjustment so husband and wife of a household received the same weight. The independent population estimates include some, but not all, of undocumented immigrants.

ACCURACY OF THE ESTIMATES

A sample survey estimate has two types of error: sampling and nonsampling. The accuracy of an estimate depends on both types of error. The nature of the sampling error is known given the survey design. The full extent of the nonsampling error, however, is unknown.

Sampling error. Since the CPS estimates come from a sample, they may differ from figures from a complete census using the same questionnaires, instructions, and enumerators. This possible variation in the estimates due to sampling error is known as "sampling variability."

Nonsampling error. All other sources of error in the survey estimates are collectively called nonsampling error. Sources of nonsampling error include the following:

Two types of nonsampling error that can be examined to a limited extent are nonresponse and coverage.

Nonresponse. The effect of nonresponse cannot be measured directly, but one indication of its potential effect is the nonresponse rate. For the March 1999 basic CPS, the nonresponse rate was 7.9%. The nonresponse rate for the supplement was an additional 8.9%, for a total supplement nonresponse rate of 16.1%.

Coverage. The concept of coverage in the survey sampling process is the extent to which the total population that could be selected for sample "covers" the survey's target population. CPS undercoverage results from missed housing units and missed people within sample households. Overall CPS undercoverage is estimated to be about 8 percent. CPS undercoverage varies with age, sex, and race. Generally, undercoverage is larger for males than for females and larger for Blacks and other races combined than for Whites. As described previously, ratio estimation to independent age-sex-race-Hispanic population controls partially corrects for the bias due to undercoverage. However, biases exist in the estimates to the extent that missed people in missed households or missed people in interviewed households have different characteristics from those of interviewed people in the same age-sex-race-ancestry-state group.

.

Table 1. CPS Coverage Ratios

 

 

Non-Black



Black



All People

Age

M

F

M

F

M

F

Total

0-14

0.929

0.964

0.850

0.838

0.916

0.943

0.929

15

0.933

0.895

0.763

0.824

0.905

0.883

0.895

16-19

0.881

0.891

0.711

0.802

0.855

0.877

0.866

20-29

0.847

0.897

0.660

0.811

0.823

0.884

0.854

30-39

0.904

0.931

0.680

0.845

0.877

0.920

0.899

40-49

0.928

0.966

0.816

0.911

0.917

0.959

0.938

50-59

0.953

0.974

0.896

0.927

0.948

0.969

0.959

60-64

0.961

0.941

0.954

0.953

0.960

0.942

0.950

65-69

0.919

0.972

0.982

0.984

0.924

0.973

0.951

70+

0.993

1.004

0.996

0.979

0.993

1.002

0.998

15+

0.914

0.945

0.767

0.874

0.898

0.927

0.918

0+

0.918

0.949

0.793

0.864

0.902

0.931

0.921

A common measure of survey coverage is the coverage ratio, the estimated population before post-stratification divided by the independent population control. Table 1 shows CPS coverage ratios for age-sex-race groups for a typical month. The CPS coverage ratios can exhibit some variability from month to month. Other Census Bureau household surveys experience similar coverage.

A Nonsampling error warning. Since the full extent of the nonsampling error is unknown, one should be particularly careful when interpreting results based on small differences between estimates. Even a small amount of nonsampling error can cause a borderline difference to appear significant or not, thus distorting a seemingly valid hypothesis test. Caution should also be used when interpreting results based on a relatively small number of cases. Summary measures probably do not reveal useful information when computed on a base (subpopulation) smaller than 75,000.

For additional information on nonsampling error including the possible impact on CPS data when known, refer to Statistical Policy Working Paper 3, An Error Profile: Employment as Measured by the Current Population Survey, Office of Federal Statistical Policy and Standards, U.S. Department of Commerce, 1978 and Technical Paper 40, The Current Population Survey: Design and Methodology, Census Bureau, U.S. Department of Commerce.

Standard errors and their use. The sample estimate and its standard error enable one to construct a confidence interval. A confidence interval is a range that would include the average result of all possible samples with a known probability. For example, if all possible samples were surveyed under essentially the same general conditions and the same sample design, and if an estimate and its standard error were calculated from each sample, then approximately 90 percent of the intervals from 1.645 standard errors below the estimate to 1.645 standard errors above the estimate would include the average result of all possible samples.

A particular confidence interval may or may not contain the average estimate derived from all possible samples. However, one can say with specified confidence that the interval includes the average estimate calculated from all possible samples.

Standard errors may be used to perform hypothesis testing. This is a procedure for distinguishing between population parameters using sample estimates. The most common type of hypothesis is that the population parameters are different. An example of this would be comparing the percentage of Whites with a college education to the percentage of Blacks with a college education.

Tests may be performed at various levels of significance. A significance level is the probability of concluding that the characteristics are different when, in fact, they are the same. For example, to conclude that two parameters are different at the 0.10 level of significance, the absolute value of the estimated difference between characteristics must be greater than or equal to 1.645 times the standard error of the difference.

The Census Bureau uses 90 percent confidence intervals and 0.10 levels of significance to determine statistical validity. Consult standard statistical texts for alternative criteria.

Estimating standard errors. To estimate the standard error of a CPS estimate, the Census Bureau uses replicated variance estimation methods. These methods primarily measure the magnitude of sampling error. However, they do measure some effects of nonsampling error as well. They do not measure systematic biases in the data due to nonsampling error. (Bias is the average of the differences, over all possible samples, between the sample estimates and the desired value.)

Generalized Variance Parameters. Consider all of the possible estimates of characteristics of the population that are of interest to data users. Now consider all of the subpopulations such as racial groups, age ranges, etc. Finally, consider every possible comparison or ratio combination. The list would be completely unmanageable. Similarly, a list of standard errors to go with every estimate would be unmanageable.

Through experimentation, we have found that certain groups of estimates have similar relationships between their variances and expected values. We provide a generalized method for calculating standard errors for any of the characteristics of the population of interest. The generalized method uses parameters for groups of estimates. These parameters are in Table 2, for basic CPS monthly labor force estimates, and Table 3, for March supplement data, including the Hispanic supplement.

Standard errors of estimated numbers. The approximate standard error, sx , of an estimated number from this microdata file can be obtained using this formula:



Formula 1

Here x is the size of the estimate and a and b are the parameters in Table 2 or 3 associated with the particular type of characteristic. When calculating standard errors for numbers from cross-tabulations involving different characteristics, use the factor or set of parameters for the characteristic which will give the largest standard error.

For information on calculating standard errors for labor force data from the CPS which involve quarterly or yearly averages see "Explanatory Notes and Estimates of Error: Household Data" in Employment and Earnings, a monthly report published by the Bureau of Labor statistics.

Illustration No. 1

Suppose you want to calculate the standard error and a 90 percent confidence interval of the number of unemployed females in the civilian labor force when the number of unemployed females in the civilian labor force is about 3,182,000. Use Formula (1) and the appropriate parameters from Table 2 to get

 

Number, x



3,182,000

a parameter

-0.000018

b parameter

2,957

standard error

96,000

90% conf. int.

3,024,000 to 3,340,000

where the standard error is calculated as

and the 90 percent confidence interval is calculated as 3,182,000 +1.645 x 96,000.

A conclusion that the average estimate derived from all possible samples lies within a range computed in this way would be correct for roughly 90 percent of all possible samples.

Illustration No. 2

Suppose you want to calculate the standard error and a 90 percent confidence interval for the number of high school graduates aged 20 to 24 years old when they numbered about 15,191,000. Use the appropriate parameters from Table 3 and Formula (1) to get

 

Number, x



15,191,000

a parameter

-0.000011

b parameter

2,369

standard error

183,000

90% conf. int.

14,890,000 to 15,492,000

where the standard error is calculated as

and the 90 percent confidence interval is calculated as 15,191,000 + 1.645 x 183,000.

A conclusion that the average estimate derived from all possible samples lies within a range computed in this way would be correct for roughly 90 percent of all possible samples.

Standard errors of estimated percentages. The reliability of an estimated percentage, computed using sample data for both numerator and denominator, depends on the size of the percentage and its base. Estimated percentages are relatively more reliable than the corresponding estimates of the numerators of the percentages, particularly if the percentages are 50 percent or more. When the numerator and denominator of the percentage are in different categories, use the factor or parameter from Table 2 or 3 indicated by the numerator.

Alternatively, Formula (2) will provide more accurate results:



Formula 2

Here x is the total number of people, families, households, or unrelated individuals in the base of the percentage, p is the percentage (0 < p <100) and b is the parameter in Table 2 or 3 associated with the characteristic in the numerator of the percentage.

Illustration No. 3

Suppose you want to calculate the standard error and confidence interval for the percentage of high school graduates aged 20 to 24 who were Black when there were about 15,191,000 high school graduates aged 20 to 24 of which 14 percent were Black. Use the appropriate parameter from Table 3 and Formula (2) to get

 

Percentage, p



14.0

Base, x

15,191,000

b parameter

2,680

standard error

0.5

90% conf. int.

13.2 to 14.8

where the standard error is calculated as

and the 90 percent confidence interval for the percentage of high school graduates aged 20 to 24 who were Black is calculated as 14.0 + 1.645 x 0.5.

Standard error of a difference. The standard error of the difference between two sample estimates is approximately equal to



Formula 3

where sx and sy are the standard errors of the estimates, x and y. The estimates can be numbers, percentages, ratios, etc. This will represent the actual standard error quite accurately for the difference between estimates of the same characteristic in two different areas, or for the difference between separate and uncorrelated characteristics in the same area. However, if there is a high positive (negative) correlation between the two characteristics, the formula will overestimate (underestimate) the true standard error.

For information on calculating standard errors for labor force data from the CPS which involve differences in consecutive quarterly or yearly averages, consecutive month-to-month differences in estimates, and consecutive year-to-year differences in monthly estimates see "Explanatory Notes and Estimates of Error: Household Data" in Employment and Earnings, a monthly report published by the Bureau of Labor Statistics.

Illustration No. 4

Suppose you want to calculate the standard error and a 90 percent confidence interval for the difference in numbers between people 20 to 24 years old and people 25 to 29 years old that have completed four years of high school and no more when they numbered 5,717,000 and 5,754,000, respectively. Use the appropriate parameters from Table 3 and Formulas (1) and (3) to get

 

 

 

x



y



difference

Estimate

5,717,000

5,754,000

37,000

a parameter

-0.000011

-0.000011

-

b parameter

2,369

2,369

-

Standard error

115,000

115,000

163,000

90% conf. int.

5,528,000 to 5,906,000

5,565,000 to 5,943,000

-231,000 to 305,000

where the standard error of the difference is calculated as

and the 90 percent confidence interval around the difference is calculated as 37,000 + 1.645 x 163,000.

Since the 90 percent confidence interval contains zero, we cannot conclude, at the 10 percent significance level, that the number of people who completed four years of high school and no more is different for 20 to 24 year olds and 25 to 29 year olds.

Illustration No. 5

Suppose you want to calculate the standard error and a 90 percent confidence interval of the difference between the percentage of employed males and females age 20 and over employed in agriculture. Suppose of the 68,212,000 employed males age 20 and over, 2,468,000 or 3.6 percent were employed in agriculture, and of the 57,837,000 employed females age 20 and over, 894,000 or 1.5 percent were employed in agriculture. Use the appropriate parameters from Table 2 and Formulas (2) and (3) to get

 

x

y

difference

Percentage

3.6

1.5

2.1

Number, x

68,212,000

57,837,000

-

b parameter

2,825

2,582

-

Standard error

0.12

0.08

0.14

90% conf. int.

3.4 to 3.8

1.4 to 1.6

1.9 to 2.3

where the standard error of the difference is calculated as

and the 90 percent confidence interval around the difference is calculated as 2.1 + 1.645 x 0.14.

Since this interval does not include zero, we can conclude with 90 percent confidence that the percentage of agriculturally employed females age 20 and over is less than the percentage of agriculturally employed males age 20 and over.

Standard error of an average for grouped data. The formula used to estimate the standard error of an average for grouped data is



Formula 4

In this formula, y is the size of the base of the distribution and b is a parameter from Table 2 or 3. The variance, S2, is given by the following formula:



Formula 5

where , the average of the distribution, is estimated by



Formula 6

where