CPS Voting and Registration - 2000 Source and Accuracy Statement

Source and Accuracy of the Data for the November 2000
CPS Microdata File for Voting and Registration in the U.S.

SOURCE OF DATA

The data for this microdata file come from the November 2000 Current Population Survey (CPS). The November survey uses two sets of questions, the basic CPS and the supplement.

Basic CPS. The monthly CPS collects primarily labor force data about the civilian noninstitutional population. Interviewers ask questions concerning labor force participation about each member 15 years old and over in every sample household.

November 2000 supplement. In addition to the basic CPS questions, interviewers asked supplementary questions on voting and registration.

Sample design. The present CPS sample was selected from the 1990 Decennial Census files with coverage in all 50 states and the District of Columbia. The sample is continually updated to account for new residential construction. To obtain the sample, the United States was divided into 2,007 geographic areas. In most states, a geographic area consisted of a county or several contiguous counties. In some areas of New England and Hawaii, minor civil divisions are used instead of counties. These 2,007 geographic areas were then grouped into 754 strata, and one geographic area was selected from each stratum. About 50,000 occupied households are eligible for interview every month out of theses 754 areas. Interviewers are unable to obtain interviews at about 3,200 of these units. This occurs when the occupants are not found at home after repeated calls or are unavailable for some other reason.

Sample redesign. Since the introduction of the CPS, the Census Bureau has redesigned the CPS sample several times. These redesigns have improved the quality and accuracy of the data and have satisfied changing data needs. The most recent changes were phased in and implementation was completed in July 1995.

Estimation procedure. This survey's estimation procedure adjusts weighted sample results to agree with independent estimates of the civilian noninstitutional population of the United States by age, sex, race, Hispanic/non-Hispanic origin, and state of residence. The adjusted estimate is called the post-stratification ratio estimate. The independent estimates are calculated based on information from four primary sources:

The 1990 Decennial Census of Population and Housing.

An adjustment for undercoverage in the 1990 census.

Statistics on births, deaths, immigration, and emigration.

Statistics on the size of the Armed Forces.

The independent population estimates include some, but not all, undocumented immigrants.

ACCURACY OF THE ESTIMATES

A sample survey estimate has two possible types of error: sampling and nonsampling. The accuracy of an estimate depends on both types of error. The nature of the sampling error is known given the survey design. The full extent of the nonsampling error, however, is unknown.

Sampling error. Since the CPS estimates come from a sample, they may differ from figures from a complete census using the same questionnaires, instructions, and enumerators. This possible variation in the estimates due to sampling error is known as A sampling variability @ .

Nonsampling error. All other sources of error in the survey estimates are collectively called nonsampling error. Sources of nonsampling errors including the following:

Inability to get information about all sample cases.

Definitional difficulties.

Differences in interpretation of questions.

Respondent inability or unwillingness to provide correct information.

Respondent inability to recall information.

Errors made in data collection, such as recording and coding data.

Errors made in processing the data.

Errors made in estimating values for missing data.

Failure to represent all units with the sample.

Two types of nonsampling error that can be examined to a limited extent are nonresponse and coverage.

Nonresponse. The effect of nonresponse cannot be measured directly, but one indication of its potential effect is the nonresponse rate. For the November 2000 basic CPS, the nonresponse rate was 7.54% . The nonresponse rate for the supplement was an additional 5.8%, for a total supplement nonresponse rate of 12.90%.

Coverage. The concept of coverage in the survey sampling process is the extent to which the total population that could be selected for sample Acovers@ the survey = s target population. CPS undercoverage results from missed housing units and missed people within sample households. Overall CPS undercoverage is estimated to be about 8 percent. CPS undercoverage varies with age, sex, and race. Generally, undercoverage is larger for males than for females and larger for Blacks and other races combined than for Whites. As described previously, ratio estimation to independent age-sex-race-Hispanic population controls partially corrects for the bias due to undercoverage. However, biases exist in the estimates to the extent that missed people in missed households or missed people in interviewed households have different characteristics from those of interviewed persons in the same age-sex-race-ancestry-state group.

Table 1. CPS Coverage Ratios
	Non-Black		Black		All Persons
Age	M	F	M	F	M	F	Total
0-14	0.929	0.964	0.850	0.838	0.916	0.943	0.929
15	0.933	0.895	0.763	0.824	0.905	0.883	0.895
16-19	0.881	0.891	0.711	0.802	0.855	0.877	0.866
20-29	0.847	0.897	0.660	0.811	0.823	0.884	0.854
30-39	0.904	0.931	0.680	0.845	0.877	0.920	0.899
40-49	0.928	0.966	0.816	0.911	0.917	0.959	0.938
50-59	0.953	0.974	0.896	0.927	0.948	0.969	0.959
60-64	0.961	0.941	0.954	0.953	0.960	0.942	0.950
65-69	0.919	0.972	0.982	0.984	0.924	0.973	0.951
70+	0.993	1.004	0.996	0.979	0.993	1.002	0.998
15+	0.914	0.945	0.767	0.874	0.898	0.927	0.918
0+	0.918	0.949	0.793	0.864	0.902	0.931	0.921

A common measure of survey coverage is the coverage ratio, the estimated population before post-stratification divided by the independent population control. Table 1 shows CPS coverage ratios for age-sex-race groups for a typical month. The CPS coverage ratios can exhibit some variability from month to month. Other Census Bureau household surveys experience similar coverage.

A nonsampling error warning. Since the full extent of the nonsampling error is unknown, one should be particularly careful when interpreting results based on small differences between estimates. Even a small amount of nonsampling error can cause a borderline difference to appear significant or not, thus distorting a seemingly valid hypothesis test. Caution should also be used when interpreting results based on a relatively small number of cases. Summary measures probably do not reveal useful information when computed on a base (subpopulation) smaller than 75,000.

For additional information on nonsampling error including the possible impact on CPS data when known, refer to

Statistical Policy Working Paper 3, An Error Profile: Employment as Measured by the Current Population Survey, Office of Federal Statistical Policy and Standards, U.S. Department of Commerce, 1978

Technical Paper 63, The Current Population Survey: Design and Methodology, Bureau of the Census, U.S. Department of Commerce, 2000.

Standard errors and their use. The sample estimate and its standard error enable one to construct a confidence interval. A confidence interval is a range that would include the average result of all possible samples with a known probability. For example, if all possible samples were surveyed under essentially the same general conditions and the same sample design, and if an estimate and its standard error were calculated from each sample, then approximately 90 percent of the intervals from 1.645 standard errors below the estimate to 1.645 standard errors above the estimate would include the average result of all possible samples.

A particular confidence interval may or may not contain the average estimate derived from all possible samples. However, one can say with specified confidence that the interval includes the average estimate calculated from all possible samples.

Standard errors may be used to perform hypothesis testing. This is a procedure for distinguishing between population parameters using sample estimates. The most common type of hypothesis is that the population parameters are different. An example of this would be comparing the percentage of Whites with a college education to the percentage of Blacks with a college education.

Tests may be performed at various levels of significance. A significance level is the probability of concluding that the characteristics are different when, in fact, they are the same. For example, to conclude that two parameters are different at the 0.10 level of significance, the absolute value of the estimated difference between characteristics must be greater than or equal to 1.645 times the standard error of the difference.

The Census Bureau uses 90-percent confidence intervals and 0.10 levels of significance to determine statistical validity. Consult standard statistical texts for alternative criteria.

Estimating standard errors. To estimate the standard error of a CPS estimate, the Census Bureau uses replicated variance estimation methods. These methods primarily measure the magnitude of sampling error. However, they do measure some effects of nonsampling error as well. They do not measure systematic biases in the data due to nonsampling error. (Bias is the average of the differences, over all possible samples, between the sample estimates and the desired value.)

Generalized Variance Parameters. Consider all of the possible estimates of characteristics of the population that are of interest to data users. Now consider all of the subpopulations such as racial groups, age ranges, etc. Finally, consider every possible comparison or ratio combination. The list would be completely unmanageable. Similarly, a list of standard errors to go with every estimate would be unmanageable.

Through experimentation, we have found that certain groups of estimates have similar relationships between their variances and expected values. We provide a generalized method for calculating standard errors for any of the characteristics of the population of interest. The generalized method uses generalized variance parameters for groups of estimates. These parameters are in Table 2, for basic CPS monthly labor force estimates, and Tables 3 through 9, for November supplement data.

Standard errors of estimated numbers. The approximate standard error, s_x, of an estimated number from this microdata file can be obtained using this formula:

Here x is the size of the estimate and a and b are the parameters in Table 2 through 9 associated with the particular type of characteristic. When calculating standard errors for numbers from cross-tabulations involving different characteristics, use the factor or set of parameters for the characteristic which will give the largest standard error.

Illustration

In November 2000, there were 2,957,000 unemployed men in the civilian labor force. Use the appropriate parameters from Table 2 and formula (1) to get:

Number, x	2,957,000
a parameter	-0.000035
b parameter	2,927
Standard error	91,000
90% conf. int.	2,807,000 to 3,107,000

The standard error is calculated as

The 90-percent confidence interval is calculated as 2,957,000 "1.645 H91,000

A conclusion that the average estimate derived from all possible samples lies within a range computed in this way would be correct for roughly 90- percent of all possible samples.

Standard errors of estimated percentages. The reliability of an estimated percentage, computed using sample data from both numerator and denominator, depends on both the size of the percentage and its base. Estimated percentages are relatively more reliable than the corresponding estimates of the numerators of the percentages, particularly if the percentages are 50 percent or more. When the numerator and denominator of the percentage are in different categories, use the parameter from Table 2 through 9 indicated by the numerator.

The approximate standard error, s_{x, p}, of an estimated percentage can be obtained by use of the formula:

Here x is the total number of people, families, households, or unrelated individuals in the base of the percentage, p is the percentage (0 # p # 100), and b is the parameter in Table 2 through 9 associated with the characteristic in the numerator of the percentage.

Illustration

In November 2000, out of 13,590,000 people with an elementary school education, 24.7 percent reported voting. Use the appropriate parameter from Table 3 and formula (2) to get:

Percentage, p	24.7
Base, x	13,590,000
b parameter	3,274
Standard error	0.7
90% conf. int.	23.5 to 25.9

The standard error is calculated as

the 90-percent confidence interval of the percentage of people with an elementary school education who reported voting is calculated as 24.7 " 1.645 H 0.7

Standard error of a difference. The standard error of the difference between two sample estimates is approximately equal to

where s_x and s_y are the standard errors of the estimates, x and y. The estimates can be numbers, percentages, ratios, etc. This will represent the actual standard error quite accurately for the difference between estimates of the same characteristic in two different areas, or for the difference between separate and uncorrelated characteristics in the same area. However, if there is a high positive (negative) correlation between the two characteristics, the formula will overestimate (underestimate) the true standard error.

Illustration

Out of 6,810,000 men who had an elementary school education, 1,690,000 or 24.8 percent had voted, and of the 6,779,000 women who had an elementary school education, 1,670,000 or 24.6 percent had voted. Use the appropriate parameters from Table 2 and formulas (2) and (3) to get

	x	y	difference
Percentage, p	24.8	24.6	0.2
Number, x	6,810,000	6,779,000	-
b parameter	3,274	3,274	-
Standard error	0.9	0.9	1.3
90% conf. int.	23.2 to 26.4	23.0 to 26.2	-1.9 to 2.3

The standard error of the difference is calculated as

The 90-percent confidence interval around the difference is calculated as 0.2 " 1.645 H 1.3. Since this interval does include zero, we cannot conclude, at the 10 percent significance level, that the percentage of women with an elementary school education who voted is different from the percentage of men with an elementary school education who voted.

Comparability of data. Data obtained from the CPS and other sources are not entirely comparable. This results from differences in interviewer training and experience and in differing survey processes. This is an example of nonsampling variability not reflected in the standard errors. Therefore, caution should be used when comparing results from different sources.

A number of changes were made in data collection and estimation procedures beginning with the January 1994 CPS. The major change was the use of a new questionnaire. The questionnaire was redesigned to measure the official labor force concepts more precisely, to expand the amount of data available, to implement several definitional changes, and to adapt to a computer-assisted interviewing environment. The March supplemental income questions were also modified for adaptation to computer-assisted interviewing, although there were no changes in definitions and concepts. Due to these and other changes, one should use caution when comparing estimates from data collected before 1994 with estimates from data collected in 1994 and later.

Caution should also be used when comparing data from this microdata file, which reflects 1990 census-based population controls, with microdata files from March 1993 and earlier years, which reflect 1980 census-based population controls. Although this change in population controls had relatively little impact on summary measures such as averages, medians, and percentage distributions, it did have a significant impact on levels. For example, use of 1990 based population controls results in about a 1 percent increase in the civilian noninstitutional population and in the number of families and households. Thus, estimates of levels for data collected in 1994 and later years will differ from those for earlier years by more than what could be attributed to actual changes in the population. These differences could be disproportionately greater for certain subpopulation groups than for the total population.

Caution should also be used when comparing Hispanic estimates over time. No independent population control totals for people of Hispanic ancestry were used before 1985.

Based on the results of each decennial census, the Census Bureau gradually introduces a new sample design for the CPS. During this phase-in period, CPS data are collected from sample designs based on different censuses. While most CPS estimates were unaffected by this mixed sample, geographic estimates are subject to greater error and variability. Users should exercise caution when comparing estimates across years for metropolitan/ nonmetropolitan categories.

Note when using small estimates. Because of the large standard errors involved, summary measures probably do not reveal useful information when computed on a base smaller than 75,000.

Take care in the interpretation of small differences. Even a small amount of nonsampling error can cause a borderline difference to appear significant or not, thus distorting a seemingly valid hypothesis test.

Technical Assistance. If you require assistance or additional information, please contact the Demographic Statistical Methods Division via e-mail at DSMD_S&A@census.gov.

Table 2. Parameters for Computation of Standard Errors for Labor Force Characteristics: November 2000
Characteristic	a	b
Civilian Labor Force, Employed, and Not in Labor Force
Total or White	-0.000008	1,586
Men	-0.000035	2,927
Women	-0.000033	2,693
Both sexes, 16 to 19 years	-0.000244	3,005

Black	-0.000154	3,296
Men	-0.000336	3,332
Women	-0.000282	2,944
Both sexes, 16 to 19 years	-0.001531	3,296

Hispanic ancestry	-0.000187	3,296
Men	-0.000363	3,332
Women	-0.000380	2,944
Both sexes, 16 to 19 years	-0.001822	3,296

Unemployment
Total or White	-0.000017	3,005
Men	-0.000035	2,927
Women	-0.000033	2,693
Both sexes, 16 to 19 years	-0.000244	3,005

Black	-0.000154	3,296
Men	-0.000336	3,332
Women	-0.000282	2,944
Both sexes, 16 to 19 years	-0.001531	3,296

Hispanic ancestry	-0.000187	3,296
Men	-0.000363	3,332
Women	-0.000380	2,944
Both sexes, 16 to 19 years	-0.001822	3,296

Agricultural Employment	0.001345	2,989

Notes: These parameters are to be applied to basic CPS monthly labor force estimates. For foreign-born and noncitizen characteristics for Total and White, the a and b parameters should be multiplied by 1.3. No adjustment is necessary for foreign-born and noncitizen characteristics for Blacks and Hispanics.

Table 3. Parameters for Computation of Standard Errors for Voting and Registration in November 2000: Total or White Persons¹
Characteristic	a	b
Voting, registration, reasons for not voting or registering (includes breakdowns by: Citizenship, Household relationship, Family heads by presence of children, Marital status, Duration of residence, Tenure, Education level, Family income of persons, Occupation group)	-0.000016	3,274

Characteristics of all persons, Voting and nonvoting:
Marital status	-0.000019	5,211
Education of persons	-0.000011	2,369
Education of family head	-0.000010	2,068
Persons by family income	-0.000023	4,901
Duration of residence tenure	-0.000019	5,211

Household relationships, Voting and nonvoting:
Head, spouse of head	-0.000010	2,068
Nonrelative or other relative of head	-0.000019	5,211

¹For Foreign Born parameters, multiply the appropriate parameter by 1.3.

Table 4. Parameters for Computation of Standard Errors for Voting and Registration in November 2000: Black Persons
Characteristic	a	b
Voting, registration, reasons for not voting or registering (includes breakdowns by: Citizenship, Household relationship, Family heads by presence of children, Marital status, Duration of residence, Tenure, Education level, Family income of persons, Occupation group)	-0.000199	4,799

Characteristics of all persons, Voting and nonvoting:
Marital status	-0.000210	7,486
Education of persons	-0.000103	2,680
Education of family head	-0.000072	1,871
Persons by family income	-0.000216	5,611
Duration of residence tenure	-0.000210	7,486

Household relationships, Voting and nonvoting:
Head, spouse of head	-0.000072	1,871
Nonrelative or other relative of head	-0.000210	7,486

Table 5. Parameters for Computation of Standard Errors for Voting and Registration in November 2000: Hispanic Persons
Characteristic	a	b
Voting, registration, reasons for not voting or registering (includes breakdowns by: Citizenship, Household relationship, Family heades by presence of children, Marital status, Duration of residence, Tenure, Education level, Family income of persons, Occupation group)	-0.000357	8,088

Characteristics of all persons, Voting and nonvoting:
Marital status	-0.000377	12,616
Education of persons	-0.000131	3,052
Education of family head	-0.000136	3,153
Persons by family income	-0.000407	9,456
Duration of residence tenure	-0.000377	12,616

Household relationships, Voting and nonvoting:
Head, spouse of head	-0.000144	3,153
Nonrelative or other relative of head	-0.000576	12,616

Table 6. Parameters for Computation of Standard Errors for Voting and Registration in November 2000 : Asians or Pacific Islanders
Characteristic	a	b
Voting, registration, reasons for not voting or registering (includes breakdowns by: Citizenship, Household relationship, Family heads by presence of children, Marital status, Duration of residence, Tenure, Education level, Family income of persons, Occupation group)	-0.000514	5,231

Characteristics of all persons, Voting and nonvoting:
Marital status	-0.000540	7,486
Education of persons	-0.000208	2,164
Education of family head	-0.000180	1,871
Persons by family income	-0.000540	5,611
Duration of residence tenure	-0.000540	7,486

Household relationships, Voting and nonvoting:
Head, spouse of head	-0.000180	1,871
Nonrelative or other relative of head	-0.000540	7,486

Table 7. State Voting and Registration Parameters

State	a	b

Alabama	-0.000970	3,307
Alaska	-0.001139	491
Arizona	-0.000878	3,176
Arkansas	-0.000988	1,932
California	-0.000167	4,223
Colorado	-0.000973	3,045
Connecticut	-0.001311	3,274
Delaware	-0.001231	720
Dist. of Col.	-0.001257	524
Florida	-0.000267	3,176
Georgia	-0.000769	4,584
Hawaii	-0.001286	1,146
Idaho	-0.000925	884
Illinois	-0.000345	3,274
Indiana	-0.000999	4,518
Iowa	-0.001059	2,325
Kansas	-0.001066	2,128
Kentucky	-0.000977	3,012
Louisiana	-0.000946	3,110
Maine	-0.001217	1,211
Maryland	-0.001135	4,518
Massachusetts	-0.000554	2,652
Michigan	-0.000405	3,045
Minnesota	-0.001003	3,634
Mississippi	-0.001002	2,095
Missouri	-0.001077	4,485
Montana	-0.000951	655
Nebraska	-0.001099	1,375
Nevada	-0.001034	1,441
New Hampshire	-0.001339	1,244
New Jersey	-0.000427	2,685
New Mexico	-0.000997	1,310
New York	-0.000206	2,914
North Carolina	-0.000529	3,078
North Dakota	-0.001099	524
Ohio	-0.000388	3,339
Oklahoma	-0.000940	2,390
Oregon	-0.001082	2,816
Pennsylvania	-0.000339	3,143
Rhode Island	-0.001311	982
South Carolina	-0.001096	3,307
South Dakota	-0.001015	557
Tennessee	-0.001024	4,387
Texas	-0.000264	3,962
Utah	-0.000929	1,408
Vermont	-0.001253	589
Virginia	-0.000921	4,846
Washington	-0.001088	4,813
West Virginia	-0.000883	1,277
Wisconsin	-0.001002	4,027
Wyoming	-0.001065	393

Table 8. Census Division Voting and Registration Parameters

Division	a	b

New England	-0.000194	2,020
Middle Atlantic	-0.000086	2,563
East North Central	-0.000093	3,162
West North Central	-0.000227	3,242
South Atlantic	-0.000088	3,371
East South Central	-0.000257	3,312
West South Central	-0.000157	3,582
Mountain	-0.000182	2,361
Pacific	-0.000115	3,865

Table 9. Census Region Voting and Registration Parameters

Region	a	b

Northeast	-0.000060	2,423
Midwest	-0.000066	3,184
South	-0.000046	3,424
West	-0.000075	3,492
All Except South	-0.000023	3,037

CPS Voting and Registration Supp - 2000 Methodology and Documentation Page

CPS Main Page

Source: U.S. Census Bureau
Author: Thomas Moore III-Census/DSMD
Contact: (cpshelp@info.census.gov) CPS Help-Census/DSD/CPSB
Last revised: February 07, 2002
URL: http://www.bls.census.gov/cps/vote/2000/ssrcacc.htm

Source and Accuracy of the Data for the November 2000 CPS Microdata File for Voting and Registration in the U.S.

CPS Voting and Registration Supp - 2000 Methodology and Documentation Page

CPS Main Page

Source and Accuracy of the Data for the November 2000
CPS Microdata File for Voting and Registration in the U.S.