U.S. Department of LaborBureau of Labor Statistics Geographic ProfileBLS HomeWhat's NewBLS ContactsSearch BLS
Accessibility Information Geographic Profile of Employment and Unemployment, 2003 Bulletin 2591
Geographic Profile, 2003

Appendix B: Sampling and Estimation Procedures and Sampling Error Tables



Contents
Regions
States
Metro Areas/Cities
Appendixes
LAUS Home

Appendix B Tables: (PDF 21K)

The estimates presented in this bulletin are based on annual averages of monthly data obtained from the Current Population Survey (CPS), a sample survey of the civilian noninstitutional population. The survey is conducted each month by the U.S. Census Bureau for the Bureau of Labor Statistics, and provides comprehensive data on the labor force, employed, and unemployed, including such characteristics as age, sex, race, Hispanic or Latino ethnicity, occupation, and industry. The survey also provides data on the characteristics of those not in the labor force.

Each month, trained interviewers collect information from a scientifically selected sample about 60,000 eligible households, designed to represent the civilian noninstitutional population. This sample includes approximately 10,000 additional eligible households to the regular CPS sample added to meet the requirements of the State Children’s Health Insurance Program (SCHIP) legislation. The SCHIP legislation requires that the Census Bureau improve State estimates of the number of children who live in low-income families and lack health insurance. These estimates are obtained from the Annual Demographic Supplement to the CPS. In September 2000, the Census Bureau began expanding the monthly CPS sample in 31 States and the District of Columbia based on the reliability of the March 2000 estimate of low-income children without health insurance. Selected respondents in the 60,000 eligible households are interviewed to obtain information about the employment status of each household member 16 years of age and over. The information that is collected pertains to a "reference week," usually the calendar week (Sunday to Saturday) that includes the 12th of the month, with actual interviewing occurring during the week following the reference week--known as the "survey week."

Sampling procedures

The 2003 sample encompasses 754 sample areas, with coverage in every State and the District of Columbia. It is based to a large extent on information about the distribution of the population as reported in the 1990 decennial census. (A redesigned 1990 census-based sample was phased in during the April 1994 through July 1995 period.) These areas were selected by dividing the entire area of the United States into 2,007 primary sampling units (PSUs). With some minor exceptions, a PSU consists of a county or number of contiguous counties. Most metropolitan areas constitute separate PSUs.

To improve the efficiency of the sample, the 2,007 PSUs are grouped into strata within each State. Those PSUs that are in a stratum by themselves are called "self-representing" and are generally the most populous in each State. Other strata are formed by combining PSUs that are similar in such characteristics as population growth, proportion of blacks and Hispanics, and occupation/industry and age/sex distributions. PSUs selected from these strata are "non-self-representing," because each one chosen represents the entire stratum. One PSU is selected from each stratum, with the probability of selection proportionate to the relative population size of the PSU.

In States with SCHIP sample, the self-representing PSUs are the same for both the regular CPS and SCHIP. In most States, the same non-self-representing sample PSUs are in the sample for both the regular CPS and SCHIP; however, to improve the reliability of the SCHIP estimates in Maine, Maryland, and Nevada, the SCHIP non-self-representing PSUs are selected independent of the regular CPS sample PSUs, with replacement. The methodology for stratification of PSUs for SCHIP in these States is similar to the other stratifications, except that the stratification variable used is the number of people under age 18 with household income below 200% of poverty.

Within each of the selected PSUs, the number of households to be enumerated each month is determined in two steps. First, a sample of the unit’s census enumeration districts (EDs) is selected using the population size probability selection procedure. EDs are administrative units and contain, on average, about 300 households. Second, clusters of approximately four addresses (contiguous wherever possible) are selected to be enumerated within each designated ED.

Part of the sample is changed, or rotated, each month. A given rotation group is in the sample for 4 consecutive months, leaves the sample during the following 8 months, and then returns for another 4 consecutive months. A primary reason for rotating the sample is to minimize the lack of cooperation that may result from interviewing a constant panel indefinitely. The rotation plan provides for three-fourths of the sample to be identical from one month to the next and one-half to be identical with that from the same month a year earlier.

Estimating methods

Under the estimating methods used in the CPS, all of the results for a given month become available simultaneously and are based on returns from the entire sample of respondents. The estimation procedure involves weighting the data from each sample person by the inverse of the probability of the person being in the sample. This gives a rough measure of the number of actual persons that each sample person represents. Through a series of estimation steps (outlined below), the selection probabilities are adjusted for noninterviews and survey undercoverage; data from previous months are incorporated into the estimates through the composite estimation procedure.

1. Noninterview adjustment. The weights for all interviewed households are adjusted to the extent needed to account for occupied sample households for which no information was obtained because of absence, impassable roads, refusals, or unavailability of the respondents for other reasons. This noninterview adjustment is made separately for clusters of similar sample areas that are usually, but not necessarily, contained within a single State. Similarity of sample areas is based on metropolitan area status and size. Within each cluster, there is a further breakdown by residence. The proportion of sample households not interviewed averages about 7 to 8 percent, depending upon a number of factors, including weather and vacations.

2. Ratio estimates. The distribution of the population selected for the sample may differ somewhat, by chance, from that of the population as a whole in such characteristics as age, race, sex, and State of residence. Because these characteristics are closely correlated with labor force participation and other principal measurements made from the sample, the survey estimates can be substantially improved when weighted appropriately by the known distribution of these population characteristics. This is accomplished through four stages of adjustment, as follows:

    a. First-stage ratio adjustment. The purpose of the first-stage ratio adjustment is to reduce the contribution to the variance of the sample state-level estimates arising from the sampling of PSUs. That is, the variance that still would be associated with the state-level estimates even if the survey included all households in every sample PSU. This is called between-PSU variance. For some States, the between-PSU variance makes up a relatively large proportion of the total variance, while the relative contribution of the between-PSU variance at the national level is generally quite small. There are several factors to be considered in determining what information to use in applying the first-stage adjustment. The information must be available for each PSU, correlated with as many of the statistics of importance from the CPS as possible, and reasonably stable over time so that the gain from the ratio adjustment procedure does not deteriorate. The basic labor force categories (unemployed, nonagricultural employed, etc.) could be used; however, this information probably would fail the stability criterion. The distribution of the population by race (black alone/non-black alone) by age groups 0-15 and 16+ satisfies all three criteria.

    The use of black alone/non-black alone categories compensates for the fact that the racial composition of a non-self-representing (NSR) sample PSU could differ substantially from the racial composition of the stratum it is representing. This adjustment is not necessary for self-representing (SR) PSUs since they represent only themselves. Adjustment factors are computed for the two race categories for each State containing NSR PSUs. The black alone and non-black alone cells are collapsed within a State when a cell meets one of four sample criteria. As a result of these criteria, the first-stage ratio adjustment actually is used (i.e., does not collapse to 1.0) in less than half of the States.

    b. National coverage adjustment. A national coverage adjustment was added to the CPS weighting process for 2003. The purpose of the national coverage adjustment is to correct for interactions between race and ethnicity that are not addressed in the second-stage weighting (see d. below). Research has shown that the under-coverage of certain race-ethnicity combinations (e.g., non-black Hispanic) cannot be corrected with second-stage adjustment alone. The national coverage adjustment also helps to speed the convergence of the second-stage adjustment, resulting in fewer iterations to reach the final national controls. The national coverage adjustment factors are based on independently derived estimates of the population. Person records are grouped into four pairs based on month-in-sample (MIS). MIS 1 and 5, 2 and 6, 3 and 7, and 4 and 8 form the four pairs. Each MIS pair is then adjusted to age/sex/race/ethnicity population controls--using between two and twenty-eight age cells depending on which of the six major coverage groups (black alone non-Hispanic, white alone non-Hispanic, white alone Hispanic, non-white alone Hispanic, Asian alone non-Hispanic, and residual race non-Hispanic) is being adjusted, by sex.

    c. State coverage adjustment. In addition to a national coverage adjustment, a State coverage adjustment also was added to the CPS weighting process for 2003. The purpose of the State coverage adjustment is to adjust for State differences in sex/age/race coverage. Research has shown that estimates of characteristics of certain race groups (e.g., blacks) can differ greatly from the controls if a State coverage adjustment is not used. However, unlike the national coverage adjustment, the State coverage adjustment slows the convergence of the second-stage ratio adjustment process. The State coverage adjustment is based on independently derived estimates of the population. Except for the District of Columbia, person records for non-black alone are grouped into four pairs based on month-in-sample (MIS)--with the same MIS pairings (1/5, 2/6, 3/7, and 4/8) used as in the national coverage adjustment. Person records for black alone for all States and non-black alone for the District of Columbia are formed at the state level with all months in sample combined. For the black alone component of the adjustment, States are adjusted using a varying number of age/sex/race cells based on the expected number of sample records in each age/sex cell. For example, for non-black alone, all States except the District of Columbia are adjusted for three age groupings (0-15, 16-44, and 45+), by sex. Each cell is adjusted to independent age/sex/race population controls in each State.

    d. Second-stage ratio adjustment. The second-stage ratio adjustment is performed to decrease the variance of the vast majority of the CPS sample estimates. Because the labor force status of individuals in the general population is correlated with their specific geographic and demographic identification, (e.g., teenagers and unemployment, or rural married women and labor force participation), the variance of the labor force estimates can be reduced by controlling the CPS sample estimates to independent estimates of selected geographic and demographic population categories. The procedure also is believed to reduce the bias due to coverage errors. The procedure adjusts the weights for the sample to estimates within each month-in-sample pair to control the sample estimates for a number of geographic and demographic subgroups of the population to ensure that these sample-based estimates of the population match independent population controls for each of these categories. These independent population controls are updated each month. Three sets of controls are used: (1) the civilian noninstitutional population for the 50 States and the District of Columbia by sex and age (0-15, 16-44, and 45+); (2) the national civilian noninstitutional population for 36 Hispanic and 36 non-Hispanic age-sex categories; and (3) the total national civilian noninstitutional population for 56 white, 36 black, and 26 residual race age-sex categories.

    The adjustment is done separately for each month-in-sample pair (1/5, 2/6, 3/7, and 4/8). Because adjusting the weights to match one set of controls can cause differences in other controls, an iterative process is used to simultaneously control all variables. Successive iterations begin with the weights as adjusted by all previous iterations. A total of ten iterations are performed, which result in (virtual) consistency between the sample estimates and the population controls.

    The independent population controls used for the CPS are produced by the Census Bureau’s Population Division. The CPS population controls are based on a demographic framework of population accounting. Under this framework, time series of population estimates and projections are anchored by the latest decennial census enumerations, with populations for dates since the latest decennial census derived by the estimation, or projection, of population change. In the simplest terms, estimates of population change are derived by adjusting the resident population as enumerated in the latest decennial census for births, deaths, and net migration, using information from a variety of data sources. Estimates of the resident population are adjusted to represent the civilian noninstitutional population 16 years of age and over (the eligible CPS population) by subtracting estimates of the number of residents under 16 years of age, the number of residents in the Armed Forces, and the number of residents that are institutionalized.

3. Composite estimation procedure. The last step in the preparation of most CPS estimates makes use of a composite estimation procedure. The composite estimate consists of a weighted average of two factors: (1) the second-stage ratio estimate based on the entire sample from the current month and (2) the composite estimate for the previous month, plus an estimate of the month-to-month change based on the six rotation groups common to both months. In addition, a bias adjustment term is added to the weighted average to account for relative bias associated with month-in-sample estimates. The compositing procedure results in a reduction in sampling error beyond that which is achieved after the two stages of ratio adjustment.

Effective with the release of January 1998 data, BLS implemented a new composite estimation method for the CPS. The new technique provides increased operational simplicity for micro-data users and allows optimization of compositing coefficients for different labor force categories. Under the new procedure, weights are derived for each record which, when aggregated, produce estimates consistent with those produced by the composite estimator. Under the previous procedure, composite estimation was performed at the macro level. The composite estimator for each tabulated cell was a function of the aggregated weights for sample persons contributing to that cell in current and prior months. The different months of data were combined together using compositing coefficients. Thus, micro-data users needed several months of data to compute composite estimates. To ensure consistency, the same coefficients had to be used for all estimates. The values of the coefficients selected were much closer to optimal for unemployment than for employment or labor force values. The new composite weighting method involves two steps: (1) the computation of composite estimates for the main labor force categories, classified by important demographic characteristics, and (2) the adjustment of the micro-data weights, through a series of ratio adjustments, to agree with these composite estimates, thus incorporating the effect of composite estimation into the micro-data weights. Under this procedure, the sum of the composite weights of all sample persons in a particular labor force category equals the composite estimate of the level for that category. Thus, to produce a composite estimate for a particular month, a data user needs simply to access the micro-data file for that (single) month and compute a weighted sum. The new composite weighting approach also improves the accuracy of labor force estimates by using different compositing coefficients for different labor force categories. The weighting adjustment method assures additivity while allowing variation in compositing coefficients.

Reliability of the estimates

The estimates in this bulletin are based upon a sample of the population rather than a complete count. Therefore, they may differ from the figures that would have been obtained if it had been possible to take a complete census using the same questionnaire and procedures as are used in the CPS. There are two types of errors in an estimate based on a sample survey--sampling and nonsampling. The sampling error tables provided later in this appendix indicate the magnitude of the sampling error. They also partially measure the effect of some nonsampling errors in response and enumeration, but do not measure any systematic biases in the data.

Sampling variability. The standard error is primarily a measure of sampling variability, that is, the variation that occurs by chance because a sample rather than the entire population is surveyed. The sample estimate and its standard error enable one to construct confidence intervals, that is, ranges that would include the average result of all possible samples with a known probability. For example, if all possible samples were selected, each of these samples were surveyed under essentially the same conditions using the same sample design, and an estimate and its estimated standard error were calculated from each sample, then the following would occur:

  1. Approximately 68 percent of the intervals from 1 standard error below the estimate to 1 standard error above the estimate would include the average result of all possible samples.
  2. Approximately 90 percent of the intervals from 1.645 standard errors below the estimate to 1.645 standard errors above the estimate would include the average result of all possible samples.
  3. Approximately 95 percent of the intervals from 2 standard errors below the estimate to 2 standard errors above the estimate would include the average result of all possible samples.

The error of a sample estimate varies inversely with the size of the sample and directly with the size of the estimate. Hence, an estimate for a subgroup constituting a small proportion of a population will tend to have a larger error relative to its size than will an estimate for a larger subgroup.

Reliability standards

The CPS sample design takes into consideration both national and State reliability. For the State data, a minimum reliability standard is set: An expected maximum coefficient of variation (CV) on the level of total unemployment of 8 percent annually. This is calculated based on a 6-percent unemployment rate. Because each State's design must meet the reliability standard, the CPS sampling rate differs by State. (The sampling rate is the proportion of all households that are selected for the sample.) Generally, the smaller the State population, the higher the sampling rate. The average State sampling rates range roughly from 1 in every 200 households to 1 in every 2,500 households in each stratum within the State.

Publication standards for State and area CPS data

To achieve comparability of the data for regions, divisions, States, metropolitan areas, and cities for publication purposes, a unique requirement for minimum labor force, employment, or unemployment was developed for each area. This requirement is based on the known differences in sampling rates among these areas. Before estimates are published for a specific category (such as Hispanic unemployment in a particular State), a predetermined "critical cell" must meet a 50-percent CV requirement. As a result of this requirement, minimum bases for publication have been developed for each area. Table B-1 lists the minimum necessary base for publication of data in each of the regions, divisions, States, the District of Columbia, metropolitan areas, and cities appearing in this bulletin.

Estimates are not shown when they do not meet the minimum base for the State or area listed in table B-1. In tables showing the labor force status of the population, that is, the employed and unemployed, publishability is determined by whether the labor force level exceeds the minimum base for unemployment in table B-1. If the labor force level is less than the unemployment minimum base, all data elements--labor force, employment, unemployment, and unemployment rate--are suppressed. In all other tables, the determining factor is whether the size of the base of the distribution exceeds the minimum base for employment or unemployment separately, depending on whether the table presents a distribution of employment or unemployment for the area or population subgroup. For example, in the percent distribution of unemployed persons by reason table, the entire line of data will be suppressed if the total unemployment is less than the unemployment minimum base. If a subgroup appears in the table (such as by sex or race), the subgroup also will be suppressed if the total for that reason does not meet the minimum base. Data are not published for any cell with a level of fewer than 500 persons or less than 0.05 percent of the total for a given characteristic.

Using the sampling error tables

Tables B-2 through B-5 provide sampling errors for use in constructing 90-percent confidence intervals (approximately 1.645 standard errors) for major labor force characteristics. They are approximations and thus indicate the order of magnitude of the sampling error rather than the precise amount of the possible error in an estimate. Illustrations on the use of these tables are provided below. In all cases, the computations present the estimated levels in thousands of persons.

Sampling error of an estimated number. Table B-5 shows that an estimate of 50,000 unemployed persons in North Carolina will have an absolute sampling error of 10,000, or a relative sampling error of 20 percent (10,000/50,000). In comparison, an estimate of 100,000 unemployed persons in North Carolina has an absolute sampling error of 14,000, yielding a relative sampling error of 14 percent (14,000/100,000). A statement that unemployment for some group in North Carolina is between 40,000 and 60,000 in the first instance, and between 86,000 and 114,000 in the second, can be made with approximately 90-percent confidence.

This can be interpreted as follows: if one were to draw all possible samples and make an estimate from each sample (using the same methods and techniques) and construct an interval around each estimate (using the sampling errors shown in the tables), then 90 percent of these intervals would contain the average value of all possible samples.

To convert a sampling error from 90-percent confidence, as displayed in the tables, to 68-percent confidence (1 standard error), multiply the sampling error shown in the tables by 0.63. To convert the sampling error from 90- to 95-percent confidence (approximately 2 standard errors) multiply the sampling error by 1.23. For the example given above, the sampling error at 90-percent confidence was 10,000. At 68-percent confidence, the error would be about 6,300 (10,000 x 0.63). At 95-percent confidence, the error would be about 12,300 (10,000 x 1.23).

Sampling error of a difference. To compute the error of a difference from the tables, an additional step is required. If, for instance, one wishes to know whether a change in the unemployment rate from one year to the next in a particular area for a particular population group is statistically significant, or whether the difference in the unemployment rate between two areas or population groups is statistically meaningful, the significance of the difference needs to be computed. (Differences between estimates for 2 consecutive years may be influenced, to some extent, by the redesign of the CPS concepts, questionnaire, and collection procedures, such as that which occurred in 1994.)

As noted above, differences can take two general forms: (1) Differences between population groups and/or geographic areas; or (2) differences for the same population group and geographic area over time. Either type of difference can be calculated using the following formula, and noting the limiting covariance assumption discussed below.

    SEd = (( SE12 + SE22 ) - 2C x ( SE1 x SE2 ))1/2
    where:

      SEd = the sampling error of the difference.
      SE1 = the sampling error of one group or year.
      SE2 = the sampling error of another group or year.
      C = the covariance (or relationship) term.

The SE1 and SE2 can be found in the appropriate table of Geographic Profile for each year if the comparison is between different years, because the size of the samples and, consequently, sampling errors may differ from year to year. Values for the covariance, or "C" term, for employment and unemployment for differences between consecutive years are as follows: For labor force or employment levels, C = 0.58; for unemployment levels or rates, C = 0.37. It is important to note that these "C" terms are usable only for calculating the sampling error of a difference for over-the-year change for the same geographic area and population group.

Covariance terms for the relationship between different population groups or geographic areas in this bulletin are not available. When calculating sampling errors for differences between two different population groups or geographic areas, a "C" term of zero must be assumed. The effect of this assumption is: (1) If the relationship between two groups, areas, or years (differences for nonconsecutive years) is small, the "C" term can legitimately be ignored and the sampling errors will not be adversely affected, or (2) if there is a strong positive relationship between the two groups, areas, or years (differences for consecutive years), then the error computed without a "C" term will be overstated. This could lead one to erroneously state that a difference or change was not statistically significant when, in fact, it was. When there is a strong relationship over time for a labor force characteristic such as employment (that is, people tend to remain employed from one year to the next), the importance of using a "C" term when calculating the sampling error of a difference over time increases greatly.

The following example illustrates how to calculate a sampling error for a difference. Suppose one wished to know whether a hypothetical difference between the unemployment level of 250,000 for a particular population group in California and an unemployment level of 200,000 for the same group in New York was statistically significant at 90-percent confidence. Table B-5 gives the error for an unemployment level of 250,000 in California as approximately 24,000 and the error for an unemployment level of 200,000 in New York as 17,000. Using the formula described above without the "C" term produces the following results:

    SE1 = 24; SE2 = 17
    SE12 + SE22 = 865
    SEd = (( SE12 + SE22 ))1/2 = 29

Because each State's sample is independent, there is no measurable correlation between the two estimates and a "C" term of zero can be assured. Thus, the error of the difference is approximately 29,000. Because the actual difference (50,000) is greater than the error of the difference, it can be stated, with 90-percent confidence, that the difference in the unemployment level is attributable to factors other than sampling variability alone.

Sampling errors for unemployment rates. Unemployment rates and error ranges for these rates are provided in tables 1, 14, and 27. This information can be used to derive a sampling error for an unemployment rate if one is needed. The error range is a 90-percent confidence interval around the unemployment rate. By subtracting the estimated unemployment rate from the upper bound of the range (or the lower bound of the range from the estimated unemployment rate), the sampling error for that rate can be obtained. This sampling error can then be used in the above formula for computing the sampling error of a difference, or for whatever purpose the user chooses.

Interpolation and extrapolation. Although sampling errors are listed for selected levels of employment and unemployment in tables B-2 through B-5, users may wish to know the sampling error for an estimate whose value is not listed. To derive such a sampling error, it is necessary to use interpolation or extrapolation.

For example, in order to derive the sampling error for the 2003 total unemployment level for men in Washington, it is necessary to use interpolation because table B-5 contains no sampling error for an unemployment level estimate of 137,000. The following formula and accompanying example show how to interpolate for this estimate:

    SE = {[(A-G) / (F-G)] x (X-Y)} + Y
    where:
      SE = the sampling error for the estimated value.
      A = the estimated value (137,000).
      F = the table value (200,000) immediately above the estimated value.
      G = the table value (100,000) immediately below the estimated value.
      X = the sampling error of F (20,000).
      Y = the sampling error of G (14,000).

    SE = {[(137 - 100) / (200 - 100)] x (20 - 14)} + 14
    SE = ( 0.37 x 6 ) + 14
    SE = 2.22 + 14
    SE = 16

If the sample-based estimate lies outside the boundaries of the error tables, extrapolation can be used to approximate the sampling error. The formula for extrapolation is the same as that for interpolation; however, the "F" term becomes the highest value in the table and the "G" term becomes the next highest value.

Derivation of sampling errors

The State and area sampling errors are developed using a generalized regression procedure and are not based on sample data for each individual area, population group, or labor force characteristic. As with all sampling error tables produced for CPS State and area data, a number of approximations are required in order to derive sampling errors that would apply to a wide variety of items. As a result, these sampling errors indicate the order of magnitude of a sampling error rather than a precise sampling error for any specific item. The sampling error tables are derived from standard error equations and special parameters developed by the Bureau of Labor Statistics. These parameters are available upon request from the Division of Local Area Unemployment Statistics, Room 4675, 2 Massachusetts Avenue NE, Washington, DC 20212-0001. Telephone: (202) 691-6406.

Tables B-2 through B-5 can be used for estimates pertaining to any race/ethnic group. As noted, the sampling errors are based on a generalized regression procedure and are approximate. Generally, the degree of precision in these tables is slightly greater for whites (and the total of all race/ethnic groups) than it is for blacks or Hispanics.

 Top of Page




Other Publications:
CWC Online | Handbook of Methods | Issues in Labor Statistics | MLR Online
MLR: The Editor's Desk | Occupational Outlook Handbook

Additional information:
Local Area Unemployment Statistics Home | BLS Home Page

E-Mail: gpinfo@bls.gov
Last Updated: November 6, 2007