Technical Notes to Establishment Survey Data Published in Employment and Earnings

THE SAMPLE

Design

The CES sample is a stratified, simple random sample of worksites, clustered by UI account number. The UI account number is a major identifier on the BLS longitudinal database of employer records, which serves as both the sampling frame and the benchmark source for the CES employment estimates. The sample strata, or subpopulations, are defined by State, industry, and employment size, yielding a State-based design. The sampling rates for each stratum are determined through a method known as optimum allocation, which distributes a fixed number of sample units across a set of strata to minimize the overall variance, or sampling error, on the primary estimate of interest. The total nonfarm employment level is the primary estimate of interest, and the CES sample design gives top priority to measuring it as precisely as possible, or, in other words, minimizing the statistical error around the statewide total nonfarm employment estimates.

Frame and sample selection. The Longitudinal Data Base (LDB) is the universe from which BLS draws the CES sample. The LDB contains data on the roughly 8.9 million U.S. business establishments covered by UI, representing nearly all elements of the U.S. economy. The Quarterly Census of Employment and Wages (QCEW) program collects these data from employers, on a quarterly basis, in cooperation with State Workforce Agencies (SWAs). The LDB contains employment and wage information from employers, as well as name, address, and location information. It also contains identification information such as Unemployment Insurance (UI) account number and reporting unit or worksite number.

The LDB contains records of all employers covered under the Unemployment Insurance tax system. That system covers 98 percent of all employers in the 50 States, the District of Columbia, Puerto Rico, and the Virgin Islands. There are a few sections of the economy that are not covered, including the self-employed, unpaid family workers, railroads, religious organizations, small agricultural employers, and elected officials. Data for employers generally are reported at the worksite level. Employers who have multiple establishments within a State usually report data for each individual establishment. The LDB tracks establishments over time and links them from quarter to quarter.

Permanent Random Numbers (PRNs) have been assigned to all UI accounts on the sampling frame. As new units appear on the frame, random numbers are assigned to those units as well. As records are linked across time, the PRN is carried forward in the linkage.

The CES sample is stratified by State, industry, and size. Stratification groups population members together for the purpose of sample allocation and selection. The strata, or groups, are composed of homogeneous units. With 13 industries and 8 size classes, there are 104 total allocation cells per State. The sampling rate for each stratum is determined through a method known as optimum allocation. Optimum allocation minimizes variance at a fixed cost or minimizes cost for a fixed variance. Under the CES probability design, a fixed number of sample units for each State is distributed across the allocation strata in such a way as to minimize the overall variance, or sampling error, of the total State employment level. The number of sample units in the CES probability sample was fixed according to available program resources. The optimum allocation formula places more sample in cells for which data cost less to collect, cells that have more units, and cells that have a larger variance.

During the first quarter of each year, a new sample is drawn from the LDB. Annual sample selection helps keep the CES survey current with respect to employment from business births and business deaths. In addition, the updated universe files provide the most recent information on industry, size, and metropolitan area designation.

After all out-of-scope records are removed, the sampling frame is sorted into allocation cells. Within each allocation cell, units are sorted by MSA and by the size of the MSA, defined as the number of UI accounts in that MSA. As the sampling rate is uniform across the entire allocation cell, implicit stratification by MSA ensures that a proportional number of units are sampled from each MSA. Some MSAs may have too few UI accounts in the allocation cell; these MSAs are collapsed and treated as a single MSA. Within each selection cell, the units are sorted by PRN, and units are selected according to the specified sample selection rate. The number of units selected randomly from each selection cell is equal to the product of the sample selection rate and the number of eligible units in the cell, plus any carryover from the prior selection cell. The result is rounded to the nearest whole number. Carryover is defined as the amount that is rounded up or down to the nearest whole number.

As a result of the cost and workload associated with enrolling new sample units, all units remain in the sample a minimum of two years. To insure all units meet this minimum requirement, BLS has established a "swapping in" procedure. The procedure allows units to be swapped into the sample that were newly selected during the previous sample year and not reselected as part of the current probability sample. The procedure removes a unit within the same selection cell and places the newly selected unit from the previous year back into the sample.

Selection Weights. Once the sample is drawn, sample selection weights are calculated based on the number of UI accounts actually selected within each allocation cell. The sample selection weight is approximately equal to the inverse of the probability of selection, or the inverse of the sampling rate. It is computed as:

Sample selection weight   =   Nh / nh

where:

Nh  =   the number of noncertainty UI accounts within the allocation cell that are eligible for sample selection

nh   =   the number of noncertanity UI accounts selected within the allocation cell

Sample Rotation. Sample rotation eases the burden on respondents who have been participating in the survey for an extended time period. A 20 percent rotation is utilized in selection cells with weights greater than 2.00. Units that rotate out of the sample will not be reselected as part of the sample for three years. In an effort to keep units from moving back into the sample after a single year, a "swap out" procedure has been established. The "swap out" procedure typically removes units from the current sample that had been rotated out of the sample within the last three years and replaces them with eligible units from the same selection cell.

The sample rotation procedure was not applied to the current sample. Instead, a sample redistribution was applied in order to accommodate the re-allocation of a sample across the individual states. As a result of sample redistribution, approximately 76 percent of the Current Employment Statistics sample for the private industries overlaps from the previous sample to the current sample.

Frame maintenance and sample updates. Due to the dynamic economy, there is a constant cycle of business births and deaths. A semi-annual update is performed during the third quarter of each year. This update selects units from the population of births and other units not previously eligible for selection and includes them as part of the sample. Updated location, contact, and administrative information is provided for all establishments that were selected in the annual sample selection.

Subsampling.  The primary enrollment of new establishments takes place in BLS Regional Office Data Collection Centers (DCCs). After the sample has been sent to the DCCs, interviewers enroll the selected establishments. While the UI account is the sample unit, interviewers attempt to collect the data for all individual establishments, within a UI account.

For multiple-worksite UI accounts, it is sometimes necessary to subsample employers. This occurs when:

- the company cannot report for all worksites from a central location;

- the company cannot provide an aggregate report for the entire UI account;

- there are too many individual worksites to make it practical to contact each of them.

With subsampling of a smaller number of worksites, both interviewer workload and respondent burden are reduced without significantly reducing the accuracy of the estimates, but this technique will result in a small increase in variance. In the event that a UI account is subsampled, weight adjustments are made to reflect each of the worksites' probability of selection.

Sample Coverage

Table 2-Ca shows the latest benchmark employment levels and the approximate proportion of total universe employment coverage at the total nonfarm and major industry supersector levels. The coverage for individual industries within the supersectors may vary from the proportions shown.

CES Sample by Industry

The sample distribution by industry reflects the goal of minimizing the sampling error on the total nonfarm employment estimate, while also providing for reliable employment estimates by industry. Sample coverage rates vary by industry as a result of building a design to meet these goals (See Table 2-Ca). For example, the Manufacturing and Leisure and Hospitality industries are of similar size. Manufacturing has about 14 million employee while Leisure and Hospitality has about 13 million employees. However their relative sample sizes are different. Manufacturing has 17,000 sample units with a total of about 3.5 million employees while Leisure and Hospitality has many more sample units, about 33,000 sample units but covers only about 2.0 million employees. The Manufacturing sample therefore covers about 25 percent of all employment in manufacturing while the Leisure and Hospitality sample covers about 16 percent of all employment in that industry. The differences are linked in part to the fact that Manufacturing is characterized by a much larger average firm size than Leisure and Hospitality. These types of differences do not cause a bias in the CES employment estimates because of the use of industry sampling strata and sampling weights which ensure each firm is properly represented in the estimates.

Note on Government Sampling - The CES government sample is not part of the program’s probability - based design. The program is able to achieve a very high level of universe employment coverage (about 70 percent) by obtaining full payroll employment counts for many government agencies, thus a probability-based sample design is not necessary for this industry. The high coverage rate virtually assures a high degree of reliability for the government employment estimates. The large government sample does not bias the total nonfarm employment estimates because it is used to estimate only the government portion of total nonfarm employment. The probability sample is used to estimate employment for all private industries. The private and government estimates are summed to derive total nonfarm employment estimates.

CES Sample by Employment Size Class

The employment universe that the CES sample is estimating for is highly skewed as shown by Table 2-Cb. The largest UI accounts comprise only 0.2 percent of all UI accounts but contain approximately 27 percent of total nonfarm employment. Therefore it is very efficient to sample these UIs with certainty - by sampling only 0.2 percent of the UIs, the survey can cover 27 percent of total universe employment. Conversely the smallest size class (0-9 employees) contains nearly 74 percent of all UIs but only about 11 percent of total nonfarm employment; therefore it is efficient to sample these UIs at a much lower rate. Sampling larger firms at a higher rate than smaller firms is a standard technique commonly used in business establishment surveys.

Table 2-Cc shows the distribution of the active CES sample units. A much greater proportion of large than small UIs are selected; however that does not create a bias in either the sample or the estimates made from the sample. Each sample unit selected is assigned a weight based on its probability of selection, which ensures that all firms of its size are properly represented in the estimates. UIs with 1000 or more employees are selected with certainty and assigned a weight of 1 meaning they represent only themselves in the estimates. Conversely, a UI in a smallest firm stratum where 1 in every 100 firms are selected is assigned a weight of 100, because it represents itself and 99 other firms that were not sampled. The use of sample weights in the estimation process prevents a large (or small) firm bias in the estimates.

Reliability

The establishment survey, like other sample surveys, is subject to two types of error, sampling and nonsampling error. The magnitude of sampling error, or variance, is directly related to the size of the sample and the percentage of universe coverage achieved by the sample. The establishment survey sample covers over one-third of total universe employment; this yields a very small variance on the total nonfarm estimates. Measurements of error associated with sample estimates are provided in tables 2-D through 2-F.

Benchmark revision as a measure of survey error. The sum of sampling and nonsampling error can be considered total survey error. Unlike most sample surveys which publish sampling error as their only measure of error, the CES can derive an annual approximation of total error, on a lagged basis, because of the availability of the independently derived universe data. While the benchmark error is often used as a proxy measure of total error for the CES survey estimate, it actually represents the difference between two employment estimates derived from separate statistical processes (i.e., the CES sample process and the UI administrative process) and thus reflects the net of the errors present in each program. Historically, the benchmark revision has been very small for total nonfarm employment. Over the past decade, percentage benchmark error has averaged 0.2 percent, with an absolute range from less than 0.05 percent to 0.6 percent.

Revisions between preliminary and final data. First preliminary estimates of employment, hours, and earnings, based on less than the total sample, are published immediately following the reference month. Final revised sample-based estimates are published 2 months later when nearly all the reports in the sample have been received. Table 2-D presents the root-mean-square error, the mean percent, and the mean absolute percent revision over the past 5 years between the preliminary and final employment estimates.

Revisions of preliminary hours and earnings estimates are normally not greater than 0.1 of an hour for weekly hours and 2 cents for hourly earnings, at the total private level, and may be slightly larger for the more detailed industry groupings.

Variance estimation. The estimation of sample variance for the CES survey is accomplished through use of the method of Balanced Half Samples (BHS). This replication technique uses half samples of the original sample and calculates estimates using those subsamples. The sample variance is calculated by measuring the variability of the subsample estimates. The weighted link estimator is used to calculate both estimates and variances. The sample units in each cell - where a cell is based on State, industry, and size classification - are divided into two random groups. The basic BHS method is applied to both groups. The subdivision of the cells is done systematically, in the same order as the initial sample selection. Weights for units in the half sample are multiplied by a factor of 1 + where weights for units not in the half sample are multiplied by a factor of 1 - . Estimates from these subgroups are calculated using the estimation formula described above.

The formula used to calculate CES variances is as follows:

,

where

-      is the half-sample estimator

- = 1/2

-k  is the number of half-samples

- is the original full sample estimates

Appropriate uses of sampling variances. Variance statistics are useful for comparison purposes, but they do have some limitations. Variances reflect the error component of the estimates that is due to surveying only a subset of the population, rather than conducting a complete count of the entire population. However, they do not reflect nonsampling error, such as response errors, and bias due to nonresponse. The variances of the over-the-month change estimates are very useful in determining when changes are significant at some level of confidence. Variance statistics for first closing are available in Table 2-F. In addition, second and third closing variances are available upon request.

Sampling errors. The sampling errors shown for all private industries and total nonfarm have been calculated for estimates that follow the benchmark employment revision by a period of 16 to 20 months. The errors are presented as median values of the observed error estimates. These estimates have been estimated using the method of Balanced Half Samples (BHS) with the probability sample data and sample weights assigned at the time of sample selection.

Illustration of the use of table 2-E. Table 2-E provides a reference for relative standard errors of three major series developed from the CES - estimates of the numbers of all employees (AE), of average hourly earnings (AHE), and of average weekly hours (AWH) within the same industry. The standard errors of differences between estimates in 2 non-overlapping industries are calculated as

          since the two estimates are independent.

The errors are presented as relative standard errors (standard error divided by the estimate and expressed as a percent). Multiplying the relative standard error by its estimated value gives the estimate of the standard error.

Suppose that the level of all employees for financial activities in a given month is estimated at 7,819,000. The approximate relative standard error of this estimate (0.5 percent) is provided in table 2-E. A 90-percent confidence interval would then be the interval:

7,819,000 +/- (1.645*.005*7,819,000) = 7,819,000 +/- 64,311 = 7,754,689 to 7,883,311

 

Illustration of the use of table 2-F. Table 2-F provides a reference for the standard errors of 1-, 3-, and 12-month changes in AE, AHE, and AWH. The errors are presented as standard errors of the changes. Suppose that the over-the-month change in AHE from January to February in coal mining is $0.11. The standard error for a 1-month change for coal mining from the table is $0.35. The interval estimate of the over-the-month change in AHE that will include the true over-the-month change with 90-percent confidence is calculated:

$0.11 +/- (1.645*$0.35) = $0.11 +/- $0.58 = [-$0.47, $0.69]

The true value of the over-the-month change is in the interval -$0.47 to $0.69. Because this interval includes $0.00 (no change), the change of $0.11 shown is not significant at the 90-percent confidence level. Alternatively, the estimated change of $0.11 does not exceed $0.58 (1.645 * $0.35); therefore, one could conclude from these data that the change is not significant at the 90-percent confidence level.

STATISTICS FOR STATES AND AREAS

(Tables B-7, B-14, and B-18)

As described above, State agencies in cooperation with BLS collect and prepare State and area employment, hours, and earnings data. These statistics are based on the same establishment reports used by BLS, however, BLS uses the full CES sample to produce monthly national employment estimates, while each State agency uses its portion of the sample to independently develop a State employment estimate. The CES area statistics relate to metropolitan areas. Definitions for all areas are published each year in the issue of Employment and Earnings that contains State and area annual averages (usually the May issue). Changes in definitions are noted as they occur. Estimates for States and areas are produced using two methods. The majority of State and area estimates are produced using direct sample based estimation. However, published area and industry combinations (domains) that do not have a large enough sample to support estimation using only sample responses have been estimated using a small domain model.

Small Domain Model.  

The small domain model consists of a weighted sum of three different relative over-the-month change estimates, , , . These three relative over-the-month estimates are then weighted based on the variance of each of the three estimates. The larger the variance of each estimate relative to the other variances, the smaller the weight. The resulting estimate of current month employment  is defined as:

 

where  =  current month t employment estimate for domain ia defined by the intersection of industry i and area a.

 = current month relative over-the-month change estimate based on available sample responses for domain ia.

 = current month weight assigned to based on the variances of , , and . The weights and cestn239 are defined similarly.

 = current month relative over-the-month change estimate based on time series forecasts using historical universe employment counts for domain ia. These historical universe employment counts are available from January 1990 to 12 months prior to the current month t.

 = current month relative over-the-month change estimate based on a synthetic estimate of the relative change that uses all sample responses in the State that includes area a, for industry i.

 = previous month employment estimate for domain ia from the small domain model.

It is possible that for a given industry i and area a, one or even two of the inputs to the model are assigned weights of 0. The reasons for assigning a weight of 0 to a model input are due to concerns regarding the stability of the inputs. For example, if or has five or fewer responses, then it is assigned a weight of 0.  If exhibits an unstable variance or has extremely poor model fit, then it may also be assigned a weight of 0. In these cases, the small domain model estimate may be based on only one or two of the three described inputs.

Sampling errors are not applicable to the estimates made using the small domain models. The measure available to judge the reliability of these modeled estimates is their performance over past time periods compared with the universe values for those time periods. These measures are useful, however, it is not certain that the past performance of the modeled estimates accurately reflects their current performance.

It should also be noted that extremely small estimates of 2000 employees or less are potentially subject to large percentage revisions that are caused by occurrences such as the relocation of one or two businesses, or a change in the activities of one or two businesses. These are non-economic classification changes that relate to the activity or location of businesses and will be present for sample based estimates as well as the model based estimates. Error measures for state and area estimates are available on the BLS website at www.bls.gov/sae/790stderr.htm.

Caution in aggregating State data. The national estimation procedures used by BLS are designed to produce accurate national data by detailed industry; correspondingly, the State estimation procedures are designed to produce accurate data for each individual State. State estimates are not forced to sum to national totals nor vice versa. Because each State series is subject to larger sampling and nonsampling errors than the national series, summing them cumulates individual State level errors and can cause distortion at an aggregate level. This has been a particular problem at turning points in the U.S. economy, when the majority of the individual State errors tend to be in the same direction. Due to these statistical limitations, the Bureau does not compile or publish a "sum-of-State" employment series. Additionally, BLS cautions users that such a series is subject to a relatively large and volatile error structure, particularly at turning points.

 

Last Modified Date: February 1, 2008