Supplements

Source and Accuracy Statement for the November 1998 CPS Microdata File for Voting and Registration in the U.S.


SOURCE OF DATA

The data for this microdata file come from the November 1998 Current Population Survey (CPS). This month's survey uses two sets of questions, the basic CPS and the supplement. The Bureau of the Census conducts the basic CPS every month and asks supplementary questions during certain months.

Basic CPS. The basic CPS collects primarily labor force data about the civilian noninstitutional population. Interviewers ask questions concerning labor force participation about each member 15 years old and over in every sample household.

November 1998 supplement. In addition to the basic CPS questions, interviewers asked supplementary questions on voting and registration.

Sample Design. The CPS sample includes coverage in all 50 states and the District of Columbia. The Census Bureau continually updates the sample to account for new residential construction. The Census Bureau divides the United States into 2,007 geographic areas. In most states, a geographic area consists of a county or several contiguous counties. In some areas of New England and Hawaii, the Census Bureau uses minor civil divisions instead of counties. We select a total of 754 geographic areas for sample. About 50,000 occupied households are eligible for interview every month. Field representatives are unable to obtain interviews at about 3,200 of these units. This occurs when the occupants are not found at home after repeated calls or are unavailable for some other reason.

Since the introduction of the CPS, the Bureau of the Census has redesigned the CPS sample several times. These redesigns have improved the quality and accuracy of the data and have satisfied changing data needs. The Census Bureau completely implemented the most recent changes in July 1995.

Estimation procedure. This survey's estimation procedure adjusts weighted sample results to agree with independent estimates of the civilian noninstitutional population of the United States by age, gender, race, Hispanic/non-Hispanic origin, and state of residence. This adjustment is called the post-stratification ratio estimate. The independent estimates are based on:

The independent population estimates include some, but not all, undocumented immigrants.

ACCURACY OF THE ESTIMATES

Since the CPS estimates come from a sample, they may differ from figures from a complete census using the same questionnaires, instructions, and enumerators. A sample survey estimate has two possible types of error: sampling and nonsampling. The accuracy of an estimate depends on both types of error, but the full extent of the nonsampling error is unknown. Consequently, one should be particularly careful when interpreting results based on a relatively small number of cases or on small differences between estimates. The standard errors for CPS estimates primarily indicate the magnitude of sampling error. They also partially measure the effect of some nonsampling errors in responses and enumeration, but do not measure systematic biases in the data. (Bias is the average over all possible samples of the differences between the sample estimates and the desired value.)

Nonsampling variability. We can attribute nonsampling errors to several sources including the following:

For the November 1998 basic CPS, the nonresponse rate was 6.3% and for the supplement the nonresponse rate was an additional 3.6% for a total supplement nonresponse rate of 9.7%.

CPS undercoverage results from missed housing units and missed people within sample households. Compared to the level of the 1990 Decennial Census, overall CPS undercoverage is about 8 percent. Undercoverage varies with age, gender, and race. Generally, undercoverage is larger for males than for females and larger for Blacks and other races combined than for Whites. The post-stratification ratio estimate described previously partially corrects for bias due to undercoverage. However, biases exist in the estimates to the extent that missed people in missed households or missed people in interviewed households have different characteristics from those of interviewed people in the same age-gender-race-origin-state group.

A common measure of survey coverage is the coverage ratio, the estimated population before the post-stratification ratio estimate divided by the independent population control. Table A shows CPS coverage ratios for age-sex-race groups for a typical month. The CPS coverage ratios can exhibit some variability from month to month, but these are a typical set of coverage ratios.

Table A. CPS Coverage Ratios
Non-Black Black All Persons
Age M F M F M F Total
0-14 0.929 0.964 0.850 0.838 0.916 0.943 0.929
15 0.933 0.895 0.763 0.824 0.905 0.883 0.895
16-19 0.881 0.891 0.711 0.802 0.855 0.877 0.866
20-29 0.847 0.897 0.660 0.811 0.823 0.884 0.854
30-39 0.904 0.931 0.680 0.845 0.877 0.920 0.899
40-49 0.928 0.966 0.816 0.911 0.917 0.959 0.938
50-59 0.953 0.974 0.896 0.927 0.948 0.969 0.959
60-64 0.961 0.941 0.954 0.953 0.960 0.942 0.950
65-69 0.919 0.972 0.982 0.984 0.924 0.973 0.951
70+ 0.993 1.004 0.996 0.979 0.993 1.002 0.998
15+ 0.914 0.945 0.767 0.874 0.898 0.927 0.918
0+ 0.918 0.949 0.793 0.864 0.902 0.931 0.921

For additional information on nonsampling error including the possible impact on CPS data when known, refer to Statistical Policy Working Paper 3, An Error Profile: Employment as Measured by the Current Population Survey, Office of Federal Statistical Policy and Standards, U.S. Department of Commerce, 1978 and Technical Paper 40, The Current Population Survey: Design and Methodology, Bureau of the Census, U.S. Department of Commerce.

Comparability of data. Data obtained from the CPS and other sources are not entirely comparable. This results from differences in interviewer training and experience and in differing survey processes. This is an example of nonsampling variability not reflected in the standard errors. Use caution when comparing results from different sources.

A number of changes were made in data collection and estimation procedures beginning with the January 1994 CPS. The major change was the use of a new questionnaire. The Bureau of Labor Statistics redesigned questionnaire to measure the official labor force concepts more precisely, to expand the amount of data available, to implement several definitional changes, and to adapt to a computer-assisted interviewing environment. The Census Bureau also modified the supplemental questions for adaptation to computer-assisted interviewing, but did not change definitions and concepts. Because of these and other changes, one should use caution when comparing estimates from data collected in 1994 and later years with estimates from earlier years.

Data users should also use caution when comparing estimates from this microdata file (which reflects 1990 census-based population controls) with estimates for 1993 and earlier years (which reflect 1980 census-based population controls). This change in population controls had relatively little impact on summary measures such as means, medians, and percentage distributions. It did have a significant impact on levels. For example, 1990 based population controls caused about a 1-percent increase in the civilian noninstitutional population and in the number of families and households. Thus, estimates of levels for data collected in 1994 and later years will differ from those for earlier years by more than what could be attributed to actual changes in the population. These differences could be disproportionately greater for certain subpopulation groups than for the total population.

For more information on the introduction of the new questionnaire, the modernized data collection methods, and the introduction of new population controls based on the 1990 census, see "Revisions in the Current Population Survey Effective January 1994" in the February 1994 issue of Employment and Earnings published by the Bureau of Labor Statistics.

Since no independent population control totals for persons of Hispanic origin were used before 1985, compare Hispanic estimates over time cautiously.

Note when using small estimates. Because of the large standard errors involved, summary measures (such as medians and percent distributions) probably do not reveal useful information when computed on a base smaller than 75,000. Take care in the interpretation of small differences. For instance, even a small amount of nonsampling error can cause a borderline difference to appear significant or not, thus distorting a seemingly valid hypothesis test.

Sampling variability. Sampling variability is variation that occurred by chance because a sample was surveyed rather than the entire population. Standard errors, as calculated below, are primarily measures of sampling variability, but they may include some nonsampling error.

Standard errors and their use. The Census Bureau had to make a number of approximations to derive, at a moderate cost, standard errors applicable to estimates from this microdata file. Instead of providing an individual standard error for each estimate, we have provided two parameters, a and b, to calculate standard errors for each type of characteristic. These parameters are in Tables B through I.

The sample estimate and its standard error enable one to construct a confidence interval, a range that would include the average result of all possible samples with a known probability. For example, if all possible samples were surveyed under essentially the same general conditions and using the same sample design, and if an estimate and its standard error were calculated from each sample, then approximately 90 percent of the intervals from 1.645 standard errors below the estimate to 1.645 standard errors above the estimate would include the average result of all possible samples.

A particular confidence interval may or may not contain the average estimate derived from all possible samples. However, one can say with specified confidence that the interval includes the average estimate calculated from all possible samples.

Data users may also use standard errors to perform hypothesis testing. This is a procedure for distinguishing between population parameters using sample estimates. One common type of hypothesis is that two population parameters are different. An example of this would be comparing the number of men who were part-time workers with the number of women who were part-time workers.

The Census Bureau uses 90-percent confidence intervals and 0.10 levels of significance to determine statistical validity. Consult standard statistical textbooks for alternative criteria.

For information on calculating standard errors for labor force data from the CPS which involve quarterly or yearly averages, changes in consecutive quarterly or yearly averages, consecutive month-to-month changes in estimates, and consecutive year-to-year changes in monthly estimates; see "Explanatory Notes and Estimates of Error: Household Data" in the corresponding Employment and Earnings published by the Bureau of Labor Statistics.

Standard errors of estimated numbers. One can obtain the approximate standard error, sx, of an estimated number from this microdata file by using the formula:

Formula 1

Here x is the size of the estimate and a and b are the parameters in Tables B through F associated with the particular type of characteristic. When calculating standard errors from cross-tabulations involving different characteristics, use the set of parameters for the characteristic which will give the largest standard error.

Illustration

Suppose there were 2,516,000 unemployed men in the civilian labor force. Use the appropriate parameters from Table B and Formula 1 to get

Number, x 2,516,000
a parameter -0.000018
b parameter 2,957
Standard error 85,600
90% conf. int. 2,375,200 to 2,656,800

The standard error is calculated as

The 90- percent confidence interval is calculated as 2,516,000 ± 1.645 × 85,600.

A conclusion that the average estimate derived from all possible samples lies within a range computed in this way would be correct for roughly 90 percent of all possible samples.

Standard errors of estimated percentages. The reliability of an estimated percentage, computed using sample data from both numerator and denominator, depends on both the size of the percentage and its base. Estimated percentages are relatively more reliable than the corresponding estimates of the numerators of the percentages, particularly if the percentages are 50 percent or more. When the numerator and denominator of the percentage are in different categories, use the parameter from one of the parameter tables (Tables B through F) indicated by the numerator.

One can obtain the approximate standard error, sx.p, of an estimated percentage by using the formula

Formula 2

Here x is the total number of people, families, households, or unrelated individuals in the base of the percentage, p is the percentage (0 <= p <= 100), and b is the parameter in Table B or C associated with the characteristic in the numerator of the percentage.

Illustration - Voting and Registration

Suppose that of 13,453,000 people with an elementary school education, 24.0 percent reported voting. Use the appropriate parameter from Table C and formula (2) to get

Percentage, p 24.0
Base, x 13,453,000
b parameter 3,274
Standard error 0.7
90% conf. int. 22.9 to 25.1

The standard error is calculated as

the 90-percent confidence interval of the percentage of people with an elementary school education who reported voting is calculated as 24.0 ± 1.645 × 0.7.

Standard error of a difference. The standard error of the difference between two sample estimates is approximately equal to

where sx and sy are the standard errors of the estimates, x and y. The estimates can be numbers, percentages, ratios, etc. This will represent the actual standard error quite accurately for the difference between estimates of the same characteristic in two different areas, or for the difference between separate and uncorrelated characteristics in the same area. However, if there is a high positive (negative) correlation between the two characteristics, the formula will overestimate (underestimate) the true standard error.

Illustration

Suppose there were 2,516,000 unemployed men 20 years of age or older and 2,333,000 unemployed women 20 years of age or older. Use the appropriate parameters from Table B and Formulas 2 and 3 to get

x y difference
Number 2,516,000 2,333,000 183,000
a parameter -.000018 -.000018 -
b parameter 2,957 2,957 -
Standard error 85,600 82,500 118,900
90% conf. int. 2,375,200 to
2,656,800
2,197,300 to
2,468,700
-12,600 to
379,800

The standard error of the difference is calculated as

The 90-percent confidence interval around the difference is calculated as 183,000 ± 1.645 × 118,900. Since this interval includes zero, we can not conclude with 90 percent confidence that the number of unemployed men is greater than the number of unemployed women.

Table B. Parameters for Computation of Standard Errors for Labor Force
Characteristics November 1998

Characteristic a b
Labor Force and Not In Labor Force Data Other than Agricultural Employment and Unemployment
- Total 1 -0.000018 2,985
- - Men 1 -0.000033 2,764
- - Women -0.000030 2,530
- - Both sexes, 16 to 19 years -0.000172 2,545
- White 1 -0.000020 2,985
- - Men -0.000037 2,767
- - Women -0.000034 2,527
- - Both sexes, 16 to 19 years -0.000204 2,550
- Black -0.000125 3,139
- - Men -0.000302 2,931
- - Women -0.000183 2,637
- - Both sexes, 16 to 19 years -0.001295 2,949
- Hispanic origin -0.000206 3,896
Not In Labor Force (use only for Total, Total Men, and White) +0.000006 829
Agricultural Employment
- Total or White +0.000782 3,049
- - Men +0.000858 2,825
- - Women or both sexes, 16 to 19 years -0.000025 2,582
- Black -0.000135 3,155
- Hispanic origin
- - Total or Women +0.011857 2,895
- - Men or both sexes, 16 to 19 years +0.015736 1,703
Unemployment
- Total or White -0.000018 2,957
- Black -0.000212 3,150
- Hispanic origin -0.000102 3,576
1 For Foreign Born parameters, multiply the appropriate parameter by 1.3.

Table C. Parameters for Computation of Standard Errors for Voting and Registration in November 1998: Total or White Persons1
Characteristic a b
Voting, registration, reasons for not voting or registering (includes breakdowns by: citizenship, household relationship, Family heads by presence of children, marital status, duration of residence, tenure, education level, family income of persons, occupation group) -0.000017 3,274
Characteristics of all persons,
Voting and nonvoting:
- Marital status -0.000028 5,203
- Education of persons -0.000015 2,753
- Education of family head -0.000011 2,065
- Persons by family income -0.000026 4,901
- Duration of residence tenure -0.000028 5,203
Household relationships, voting and nonvoting:
- Head, spouse of head -0.000011 2,065
- Nonrelative or other relative of head -0.000028 5,203
1For Foreign Born parameters, multiply the appropriate parameter by 1.3.

Table D. Parameters for Computation of Standard Errors for Voting and Registration in November 1998: Black Persons
Characteristic a b
Voting, registration, reasons for not voting or registering (includes breakdowns by: citizenship, household relationship, Family heads by presence of children, marital status, duration of residence, tenure, education level, family income of persons, occupation group) -0.000222 4,799
Characteristics of all persons,
Voting and nonvoting:
- Marital status -0.000346 7,474
- Education of persons -0.000173 3,729
- Education of family head -0.000087 1,868
- Persons by family income -0.000260 5,611
- Duration of residence tenure -0.000346 7,474
Household relationships, voting and nonvoting:
- Head, spouse of head -0.000087 1,868
- Nonrelative or other relative of head -0.000346 7,474

Table E. Parameters for Computation of Standard Errors for Voting and Registration in November 1998: Hispanic Persons
Characteristic a b
Voting, registration, reasons for not voting or registering (includes breakdowns by: citizenship, household relationship, Family heads by presence of children, marital status, duration of residence, tenure, education level, family income of persons, occupation group) -0.000375 8,088
Characteristics of all persons,
Voting and nonvoting:
- Marital status -0.000583 12,596
- Education of persons -0.000291 6,284
- Education of family head -0.000146 3,168
- Persons by family income -0.000438 9,456
- Duration of residence tenure -0.000583 12,596
Household relationships, voting and nonvoting:
- Head, spouse of head -0.000146 3,148
- Nonrelative or other relative of head -0.000583 12,596

Table F. Parameters for Computation of Standard Errors for Voting and Registration in November 1998: Asian or Pacific Islanders
Characteristic a b
Voting, registration, reasons for not voting or registering (includes breakdowns by: citizenship, household relationship, Family heads by presence of children, marital status, duration of residence, tenure, education level, family income of persons, occupation group) -0.000665 5,231
Characteristics of all persons,
Voting and nonvoting:
- Marital status -0.001036 8,147
- Education of persons -0.000517 4,065
- Education of family head -0.000259 2,037
- Persons by family income -0.000778 6,117
- Duration of residence tenure -0.001036 8,147
Household relationships, voting and nonvoting:
- Head, spouse of head -0.000259 2,037
- Nonrelative or other relative of head -0.001036 8,147

Table G. State Voting and Registration Parameters
State a b
Alabama -0.001029 3,307
Alaska -0.001200 491
Arizona -0.001058 3,176
Arkansas -0.001042 1,932
California -0.000181 4,223
Colorado -0.001116 3,045
Connecticut -0.001290 3,274
Delaware -0.001341 720
Dist. of Col. -0.001152 524
Florida -0.000295 3,176
Georgia -0.000874 4,584
Hawaii -0.001331 1,146
Idaho -0.001087 884
Illinois -0.000369 3,274
Indiana -0.001040 4,518
Iowa -0.001096 2,325
Kansas -0.001141 2,128
Kentucky -0.001039 3,012
Louisiana -0.000988 3,110
Maine -0.001272 1,211
Maryland -0.001184 4,518
Massachusetts -0.000568 2,652
Michigan -0.000426 3,045
Minnesota -0.001079 3,634
Mississippi -0.001071 2,095
Missouri -0.001137 4,485
Montana -0.001031 655
Nebraska -0.001156 1,375
Nevada -0.001327 1,441
New Hampshire -0.001441 1,244
New Jersey -0.000439 2,685
New Mexico -0.001092 1,310
New York -0.000207 2,914
North Carolina -0.000577 3,078
North Dakota -0.001129 524
Ohio -0.000397 3,339
Oklahoma -0.000990 2,390
Oregon -0.001197 2,816
Pennsylvania -0.000338 3,143
Rhode Island -0.001292 982
South Carolina -0.001200 3,307
South Dakota -0.001082 557
Tennessee -0.001109 4,387
Texas -0.000295 3,962
Utah -0.001095 1,408
Vermont -0.001333 589
Virginia -0.000988 4,846
Washington -0.001208 4,813
West Virginia -0.000887 1,277
Wisconsin -0.001062 4,027
Wyoming -0.001136 393

Table H. Census Division Voting and
Registration Parameters
Division a b
New England -0.000198 2,020
Middle Atlantic -0.000087 2,563
East North Central -0.000097 3,162
West North Central -0.000241 3,242
South Atlantic -0.000096 3,371
East South Central -0.000275 3,312
West South Central -0.000172 3,582
Mountain -0.000213 2,361
Pacific -0.000125 3,865

Table I. Census Region Voting and
Registration Parameters
Region a b
Northeast -0.000061 2,423
Midwest -0.000069 3,184
South -0.000050 3,424
West -0.000083 3,492
All Except South -0.000024 3,037


CPS Voting and Registration Supp - 1998 Methodology and Documentation Page

CPS Main Page


Source: U.S. Census Bureau
Author: Thomas Moore III-Census/DSMD
Contact: (cpshelp@info.census.gov) CPS Help-Census/DSD/CPSB
Last revised: July 28, 1999
URL: http://www.bls.census.gov/cps/vote/1998/ssrcacc.htm