Source and Accuracy of Estimates for the October 2001 CPS Microdata File on School Enrollment

Table 1. CPS Coverage Ratios for October 2001
Table 2. Parameters for Computation of Standard Errors for Labor Force Characteristics: October 2001
Table 3. Parameters for Computation of Standard Erros for School Enrollment Characteristics: October 2001
Table 4. Year Factors for Non-School Enrollment Characteristics (1942-2000)
Table 4A. School Enrollment Year Factors (1942-2000)
Table 5. Regional Factors to Apply to 2001 Parameters

The data in this microdata file come from the October 2001 Current Population Survey (CPS). The Census Bureau conducts the survey every month, although this file has only October data. The October survey uses two sets of questions, the basic CPS and the supplement.

Basic CPS. The basic CPS collects primarily labor force data about the civilian noninstitutional population. Interviewers ask questions concerning labor force participation about each member fifteen years old and over in every sample household.

October Supplement. In addition to the basic CPS questions, interviewers asked supplementary questions in October about school enrollment for all household members three years old and over.

Sample Design. The present monthly CPS sample was selected from the 1990 Decennial Census files with coverage in all fifty states and the District of Columbia. The sample is continually updated to account for new residential construction. To obtain the sample, the United States was divided into 2,007 geographic areas. In most states, a geographic area consisted of a county or several contiguous counties. In some areas of New England and Hawaii, minor civil divisions are used instead of counties. These 2,007 geographic areas were then grouped into 754 strata and one geographic area was selected from each stratum for sample.

About 60,000 occupied households are eligible for interview every month out of the 754 strata. Interviewers are unable to obtain interviews at about 4,500 of these units. This occurs when the occupants are not found at home after repeated calls or are unavailable for some other reason.

The number of households that are eligible for interview in the basic CPS increased from 50,000 to 60,000 in July of 2001. This increase in the number of eligible households is due to the implementation of the State Children’s Health Insurance Program (SCHIP) sample expansion. The SCHIP sample expansion increased the monthly CPS sample in states with high sampling errors for low-income uninsured children. With the increase in eligible households, the number of units where interviewers were unable to obtain an interview increased from 3,200 to 4,500.

Sample Redesign. Since the introduction of the CPS, the Census Bureau has redesigned the CPS sample several times. These redesigns have improved the quality and accuracy of the data and have satisfied changing data needs. The most recent changes were phased in and implementation was completed in July 1995.

Estimation Procedure. This survey’s estimation procedure adjusts weighted sample results to agree with independent estimates of the civilian noninstitutional population of the United States by age, sex, race, Hispanic/non-Hispanic ancestry, and state of residence. The adjusted estimate is called the post-stratification ratio estimate. The independent estimates are calculated based on information from three primary sources:

The independent population estimates include some, but not all, unauthorized migrants.

A sample survey estimate has two types of error: sampling and nonsampling. The accuracy of an estimate depends on both types of error. The nature of the sampling error is known given the survey design. The full extent of the nonsampling error, however, is unknown.

Sampling Error. Since the CPS estimates come from a sample, they may differ from figures from a complete census using the same questionnaires, instructions, and enumerators. This possible variation in the estimates due to sampling error is known as "sampling variability." Standard errors, as calculated by methods described in "Standard Errors and Their Use" are primarily measures of sampling variability, although they may include some nonsampling error.

Nonsampling Error. All other sources of error in the survey estimates are collectively called nonsampling error. Sources of nonsampling error include the following:

Two types of nonsampling error that can be examined to a limited extent are nonresponse and undercoverage.

Nonresponse. The effect of nonresponse cannot be measured directly, but one indication of its potential effect is the nonresponse rate. For the October 2001 basic CPS, the nonresponse rate was 6.7%. The nonresponse rate for the October supplement was an additional 3.6%. These two nonresponse rates lead to a total nonresponse rate of 10.1%.

Coverage. The concept of coverage in the survey sampling process is the extent to which the total population that could be selected for sample "covers" the survey’s target population. CPS undercoverage results from missed housing units and missed people within sample households. Overall CPS undercoverage is estimated to be about 10 percent. CPS undercoverage varies with age, sex, and race. Generally, undercoverage is larger for males than for females and larger for Blacks than for non-Blacks.

The Current Population Survey weighting procedure uses ratio estimation whereby sample estimates are adjusted to independent estimates of the national population by age, race, sex and Hispanic ancestry. This weighting partially corrects for bias due to undercoverage, but biases may still be present when people who are missed by the survey differ from those interviewed in ways other than age, race, sex, and Hispanic ancestry. How this weighting procedure affects other variables in the survey is not precisely known. All of these considerations affect comparisons across different surveys or data sources.

A common measure of survey coverage is the coverage ratio, the estimated population before post-stratification divided by the independent population control. Table 1 shows CPS coverage ratios for age-sex-race groups for October 2001. The CPS coverage ratios can exhibit some variability from month to month. Other Census Bureau household surveys experience similar coverage.

Table 1. CPS Coverage Ratios for October 2001
Age	Non-Black		Black		All Persons
Age	M	F	M	F	M	F	Total
0-14	0.9296	0.9203	0.8483	0.8097	0.9165	0.9022	0.9095
15	1.0316	1.0072	0.9168	0.6240	1.0141	0.9464	0.9811
16-19	0.8734	0.9157	0.7870	0.8434	0.8603	0.9045	0.8820
20-29	0.8264	0.8675	0.7082	0.7945	0.8119	0.8565	0.8344
30-39	0.8767	0.9203	0.7881	0.8981	0.8666	0.9172	0.8923
40-49	0.9241	0.9531	0.8403	0.9066	0.9149	0.9471	0.9314
50-59	0.9317	0.9272	0.8788	0.8980	0.9266	0.9240	0.9253
60-64	0.9007	0.9323	0.9214	0.9804	0.9026	0.9374	0.9209
65-69	0.9133	0.9086	0.9557	0.9841	0.9172	0.9172	0.9172
70+	0.8986	0.9141	0.9669	0.9770	0.9033	0.9190	0.9127
15+	0.8934	0.9196	0.8199	0.8821	0.8854	0.9149	0.9007
0+	0.9012	0.9197	0.8285	0.8638	0.8924	0.9123	0.9026

Comparability of Data. Data obtained from the CPS and other sources are not entirely comparable. This results from differences in interviewer training and experience and in differing survey processes. This is an example of nonsampling variability not reflected in the standard errors. Therefore, caution should be used when comparing results from different sources.

A number of changes were made in data collection and estimation procedures beginning with the January 1994 CPS. The major change was the use of a new questionnaire. The questionnaire was redesigned to measure the official labor force concepts more precisely, to expand the amount of data available, to implement several definitional changes, and to adapt to a computer-assisted interviewing environment. The supplemental questions were also modified for adaptation to computer-assisted interviewing, although there were no changes in definitions and concepts. See Appendix C of Report P-60, No. 188 on "Conversion to a Computer Assisted Questionnaire" for a description of these changes and the effect they had on the data. Due to these and other changes, one should use caution when comparing estimates from data collected in 1994 and later years with estimates from earlier years.

Caution should also be used when comparing data from this microdata file, which reflects 2000 census-based population controls, with microdata files from October 1994-2000, which reflect 1990 census-based population controls. Microdata files from previous years reflect the latest available census-based population controls.

Although the change in population controls had relatively little impact on summary measures such as means, medians, and percentage distributions, it did have a significant impact on levels. For example, use of 2000 census-based population controls results in about a one-percent increase in the civilian noninstitutional population and in the number of families and households. Thus, estimates of levels for data collected in 2001 and later years will differ from those for earlier years by more than what could be attributed to actual changes in the population. These differences could be disproportionately greater for certain subpopulation groups than for the total population.

Caution should also be used when comparing Hispanic estimates over time. No independent population control totals for people of Hispanic ancestry were used before 1985.

Based on the results of each decennial census, the Census Bureau gradually introduces a new sample design for the CPS¹. During this phase-in period, CPS data are collected from sample designs based on different censuses. While most CPS estimates were unaffected by this mixed sample, geographic estimates are subject to greater error and variability. Users should exercise caution when comparing estimates across years for metropolitan/ nonmetropolitan categories.

A Nonsampling Error Warning. Since the full extent of the nonsampling error is unknown, one should be particularly careful when interpreting results based on small differences between estimates. Even a small amount of nonsampling error can cause a borderline difference to appear significant or not, thus distorting a seemingly valid hypothesis test. Caution should also be used when interpreting results based on a relatively small number of cases. Summary measures probably do not reveal useful information when computed on a base² smaller than 75,000.

For additional information on nonsampling error including the possible impact on CPS data when known, refer to

Standard Errors and Their Use. A number of approximations are required to derive, at a moderate cost, standard errors applicable to all the estimates in this microdata file. Instead of providing an individual standard error for each estimate, parameters are provided to calculate standard errors for various types of characteristics. These parameters are listed in Tables 2 and 3. Also, tables are provided that allow the calculation of parameters for prior years and parameters for U.S. regions. Tables 4 and 4A provide factors to derive prior year parameters; Table 5 provides factors to derive U.S. regional parameters.

The sample estimate and its standard error enable one to construct a confidence interval, a range that would include the average result of all possible samples with a known probability. For example, if all possible samples were surveyed under essentially the same general conditions and using the same sample design, and if an estimate and its standard error were calculated from each sample, then approximately 90 percent of the intervals from 1.645 standard errors below the estimate to 1.645 standard errors above the estimate would include the average result of all possible samples.

A particular confidence interval may or may not contain the average estimate derived from all possible samples. However, one can say with specified confidence that the interval includes the average estimate calculated from all possible samples.

Standard errors may also be used to perform hypothesis testing, a procedure for distinguishing between population parameters using sample estimates. One common type of hypothesis is that the population parameters are different. An example of this would be comparing the percentage of employed males 20 to 24 years old working part time to the percentage of employed females in the same age group who were part-time workers. An illustration of this is included in the following pages.

Tests may be performed at various levels of significance. A significance level is the probability of concluding that the characteristics are different when, in fact, they are the same. To conclude that two parameters are different at the 0.10 level of significance, the absolute value of the estimated difference between characteristics must be greater than or equal to 1.645 times the standard error of the difference.

The Census Bureau uses 90-percent confidence intervals and 0.10 levels of significance to determine statistical validity. Consult standard statistical textbooks for alternative criteria.

Estimating Standard Errors. To estimate the standard error of a CPS estimate, the Census Bureau uses replicated variance estimation methods. These methods primarily measure the magnitude of sampling error. However, they do measure some effects of nonsampling error as well. They do not measure systematic biases in the data due to nonsampling error. Bias is the average over all possible samples of the differences between the sample estimates and the true value.

Generalized Variance Parameters. Consider all the possible estimates of characteristics of the population that are of interest to data users. Now consider all of the subpopulations such as racial groups, age ranges, etc. Finally, consider every possible comparison or ratio combination. The list would be completely unmanageable. Similarly, a list of standard errors to go with every estimate would be unmanageable. Therefore, rather than providing an individual standard error for every possible estimate, we provide generalized variance parameters to allow calculation of standard errors.

Through experimentation, we have found that certain groups of estimates have similar relationships between their variances and expected values. We provide a generalized method for calculating standard errors for any of the characteristics of the population of interest. The generalized method uses generalized variance parameters for groups of estimates. Table 2 provides the labor force parameters, table 3 provides the school enrollment parameters, and tables 4 and 5 provide factors for use with the parameters.

Standard Errors of Estimated Numbers. The approximate standard error, sx, of an estimated number, with the exception of school enrollment estimates, from this microdata file can be obtained using the following formula:

Here x is the size of the estimate and a and b are the parameters in Table 2 associated with the particular type of characteristic. When calculating standard errors from cross-tabulations involving different characteristics, use the set of parameters for the characteristic which will give the largest standard error.

In October 2001, there were 3,900,000 unemployed men in the civilian labor force. Use the appropriate parameters from Table 2 and formula (1) to get the following:

The 90-percent confidence interval is calculated as 3,900,000 ± 1.645 x 104,321.

A conclusion that the average estimate derived from all possible samples lies within a range computed in this way would be correct for roughly 90 percent of all possible samples.

Standard Errors of Estimated School Enrollment Numbers. The approximate standard error, sx, of an estimated school enrollment number from this microdata file can be obtained using the following formula:

Here x is the size of the estimate, T is the total number of persons in a specific age group and b is the parameter in Table 3 associated with the particular type of characteristic. If T is not known, for Total or White use 100,000,000; for Blacks and Hispanic use 10,000,000. When calculating standard errors for numbers from cross-tabulations involving different characteristics, use the set of parameters for the characteristic which will give the largest standard error.

There were 4,100,000 three and four year olds enrolled in school and 7,900,000 children in that age group in October 2001. Use the appropriate b parameter from Table 3 and formula (2) to get the following:

The 90-percent confidence interval for this estimate is approximately 3,990,000 to 4,210,000 (i.e., 4,100,000 ± 1.645 x 69,553). Therefore, a conclusion that the average estimate derived from all possible samples lies within a range computed in this way would be correct for roughly 90 percent of all possible samples.

Standard Errors of Estimated Percentages. The reliability of an estimated percentage, computed using sample data for both numerator and denominator, depends on the size of the percentage and its base. Estimated percentages are relatively more reliable than the corresponding estimates of the numerators of the percentages, particularly if the percentages are 50 percent or more. When the numerator and denominator of the percentage are in different categories, use the parameter from Table 2 or 3 indicated by the numerator.

The approximate standard error, s_x,p, of an estimated percentage can be obtained by use of the following formula:

Here x is the total number of persons, families, households, or unrelated individuals in the base of the percentage, p is the percentage (0 ≤ p ≤ 100), and b is the parameter in Table 2 or 3 associated with the characteristic in the numerator of the percentage.

In October 2001, there were 16,000,000 persons aged 18 to 21, and 44.0 percent were enrolled in college. Use the appropriate parameter from Table 3 and formula (3) to get the following:

The 90 percent confidence interval for the estimated percentage of persons aged 18 to 21 in October 2001 enrolled in college is from 43.06 to 44.94 percent (i.e., 44.00 ± 1.645 x 0.57).

Standard Error of a Difference. The standard error of the difference between two sample estimates is approximately equal to the following:

where s_x and s_y are the standard errors of the estimates, x and y. The estimates can be numbers, percentages, ratios, etc. This will result in accurate estimates of the standard error of the same characteristic in two different areas, or for the difference between separate and uncorrelated characteristics in the same area. However, if there is a high positive (negative) correlation between the two characteristics, the formula will overestimate (underestimate) the true standard error.

Suppose that of the 6,850,000 employed men between 20-24 years of age in October 2000, 20.8 percent were part-time workers, and of the 6,400,000 employed women between 20-24 years of age, 34.6 percent were part-time workers. Use the appropriate parameters from Table 2 and formulas (3) and (4) to get the following:

The 90-percent confidence interval around the difference is calculated as 13.8 ± 1.645 x 1.29. Since this interval does not include zero, we can conclude with 90 percent confidence that the percentage of part-time women workers between 20-24 years of age is greater than the percentage of part-time men workers between 20-24 years of age.

Table 2. Parameters for Computation of Standard Errors for Labor Force Characteristics: October 2001
Characteristic	a	b
Civilian Labor Force, Employed, and Not in Labor Force
Total or White	-0.000008	1,586
Men	-0.000035	2,927
Women	-0.000033	2,693
Both sexes, 16 to 19 years	-0.000244	3,005

Black	-0.000154	3,296
Men	-0.000336	3,332
Women	-0.000282	2,944
Both sexes, 16 to 19 years	-0.001531	3,296

Hispanic ancestry	-0.000187	3,296
Men	-0.000363	3,332
Women	-0.000380	2,944
Both sexes, 16 to 19 years	-0.001822	3,296

Unemployment
Total or White	-0.000017	3,005
Men	-0.000035	2,927
Women	-0.000033	2,693
Both sexes, 16 to 19 years	-0.000244	3,005

Black	-0.000154	3,296
Men	-0.000336	3,332
Women	-0.000282	2,944
Both sexes, 16 to 19 years	-0.001531	3,296

Hispanic ancestry	-0.000187	3,296
Men	-0.000363	3,332
Women	-0.000380	2,944
Both sexes, 16 to 19 years	-0.001822	3,296

Agricultural Employment	0.001345	2,989

Notes:

These parameters are to be applied to basic CPS monthly labor force estimates. For foreign-born and noncitizen characteristics for Total and White, the a and b parameters should be multiplied by 1.3. No adjustment is necessary for foreign-born and noncitizen characteristics for Blacks and Hispanics.

Table 3. Parameters for Computation of Standard Errors for School Enrollment Characteristics: October 2001
Characteristics	Total or White b	Black b	Hispanic b
People
Persons Enrolled in School:
Total............................................................	2,131	2,410	2,744
Children 13 and under................................	2,453	2,775	3,159

Marital Status, Household and Family Characteristics, Health Insurance
Some household members..........................	4,687	6,733	11,347
All household members..............................	5,695	9,929	16,733

Families, Households, or Unrelated Individuals
Income, Earnings....................................................	2,016	2,201	3,709

Marital Status, Household and Family Characteristics, Educational Attainment, Population by Age and/or Sex...............................	1,860	1,683	2,836

We produced updated March 1994 educational attainment parameters directly from the March 1994 data. Using the updated March 1994 educational attainment parameters as a base, we also updated the October school enrollment parameters. Therefore, when calculating past educational attainment and school enrollment parameters, a separate set of year factors should be used.

Table 4 shows the prior year factors to apply to the Non-School Enrollment parameters while Table 4A shows prior year factors to apply to School Enrollment parameters.

Table 4. Year Factors for Non-School Enrollment Characteristics (1942-2000)
Time Period	Total/White	Black	Hispanic
January 1996 - October 2000	1.11	1.11	1.11
April 1989 - December 1995	1.03	1.03	1.03
April 1988 - March 1989	1.14	1.14	1.20
January 1985 - March 1988	0.96	0.96	0.96
January 1982 - December 1984	0.96	0.96	1.35
March 1973 - December 1981	0.86	0.86	1.20
January 1967 - February 1973	0.86	0.86	1.20
May 1956 - December 1966	1.29	1.29	1.81
August 1942 - April 1956	1.93	1.93	2.71

Table 4A. School Enrollment Year Factors (1942-2000)
Time Period	Total/White	Black	Hispanic
January 1996 - October 2000	1.11	1.11	1.11
March 1994 - December 1995	1.03	1.03	1.03
April 1989 - February 1994	1.19	1.42	2.10
April 1988 - March 1989	1.32	1.58	2.45
January 1985 - March 1988	1.11	1.33	1.97
January 1982 - December 1984	1.11	1.33	2.76
March 1973 - December 1981	0.99	1.19	2.46
January 1967 - February 1973	0.99	1.19	2.46
May 1956 - December 1966	1.49	1.78	3.69
August 1942 - April 1956	2.24	2.67	5.54

Table 5. Regional Factors to Apply to 2001 Parameters
Type of Characteristic	factor
U. S. Totals:	1.00
Regions:
Northeast	0.91
Midwest	0.93
South	1.14
West	1.15

¹For detailed information on the 1990 sample redesign, see the Department of Labor, Bureau of Labor Statistics report, Employment and Earnings, Volume 41 Number 5, May 1994

Number, x	3,900,000
a parameter	-0.000035
b parameter	2,927
Standard error	104,321
90% conf. int.	3,730,000 to 4,070,000

Number, x	4,100,000
Total, T	7,900,000
b parameter	2,453
Standard error	69,553
90% conf. int.	3,990,000 to 4,210,000

Percentage, p	44.0
Base, x	16,000,000
b parameter	2,131
Standard error	0.57
90% conf. int.	43.06 to 44.94

	x	y	difference
Percentage, p	20.8	34.6	13.8
Number, x	6,850,000	6,400,000	-
b parameter	2,927	2,693	-
Standard error	0.84	0.98	1.29
90% conf. int.	19.42 to 22.18	32.99 to 36.21	11.68 to 15.92