Technical Notes to Establishment Survey Data

The Sample
- Design
- Sample coverage
Reliability
Statistics for States and areas
- Small domain model

The Sample

Design

The Current Employment Statistics (CES) sample is a stratified, simple random sample of worksites, clustered by Unemployment Insurance (UI) account number. The UI account number is a major identifier on the BLS longitudinal database of employer records, which serves as both the sampling frame and the benchmark source for the CES employment estimates. The sample strata, or subpopulations, are defined by State, industry, and employment size, yielding a State-based design. The sampling rates for each stratum are determined through a method known as optimum allocation, which distributes a fixed number of sample units across a set of strata to minimize the overall variance, or sampling error, on the primary estimate of interest. The Total nonfarm employment level is the primary estimate of interest, and the CES sample design gives top priority to measuring it as precisely as possible, or minimizing the statistical error around the statewide Total nonfarm employment estimates.

Frame and sample selection. The Longitudinal Data Base (LDB) is the universe from which BLS draws the CES sample. The LDB contains data on the roughly 9 million U.S. business establishments covered by UI, representing nearly all elements of the U.S. economy. The Quarterly Census of Employment and Wages (QCEW) program collects these data from employers on a quarterly basis in cooperation with State Workforce Agencies (SWAs). The LDB contains employment and wage information from employers, as well as name, address, and location information. It also contains identification information such as UI account number and reporting unit or worksite number.

The LDB contains records of all employers covered under the UI tax system. That system covers 97 percent of all employers in the 50 States, the District of Columbia, Puerto Rico, and the Virgin Islands. There are a few sections of the economy that are not covered, including the self-employed, unpaid family workers, railroads, religious organizations, small agricultural employers, and elected officials. Data for employers generally are reported at the worksite level. Employers who have multiple establishments within a State usually report data for each individual establishment. The LDB tracks establishments over time and links them from quarter to quarter.

Permanent Random Numbers (PRNs) have been assigned to all UI accounts on the sampling frame. As new units appear on the frame, random numbers are assigned to those units as well. As records are linked across time, the PRN is carried forward in the linkage.

The CES sample is stratified by State, industry, and size. Stratification groups population members together for the purpose of sample allocation and selection. The strata, or groups, are composed of homogeneous units. With 13 industries and 8 size classes, there are 104 total allocation cells per State. The sampling rate for each stratum is determined through a method known as optimum allocation. Optimum allocation minimizes variance at a fixed cost or minimizes cost for a fixed variance. Under the CES probability design, a fixed number of sample units for each State is distributed across the allocation strata in such a way as to minimize the overall variance, or sampling error, of the total State employment level. The number of sample units in the CES probability sample was fixed according to available program resources. The optimum allocation formula places more sample in cells for which data cost less to collect, cells that have more units, and cells that have a larger variance.

In the fall of each year, a new sample is drawn from that year’s first quarter LDB data. Annual sample selection helps keep the CES survey current with respect to employment from business births and business deaths. In addition, the updated universe files provide the most recent information on industry, size, and metropolitan area designation. About a full year separates the sample draw and the sample implementation to allow time for enrollment and collection of selected units. Enrollment of the selected units begins immediately following the sample draw and collection begins immediately following enrollment. Preliminary estimates for January through December 2012 will be made using the sample selected from the 2010 LDB data.

After all out-of-scope records are removed, the sampling frame is sorted into allocation cells. Within each allocation cell, units are sorted by metropolitan statistical area (MSA) and by the size of the MSA, defined as the number of UI accounts in that MSA. As the sampling rate is uniform across the entire allocation cell, implicit stratification by MSA ensures that a proportional number of units are sampled from each MSA. Some MSAs may have too few UI accounts in the allocation cell; these MSAs are collapsed and treated as a single MSA. Within each selection cell, the units are sorted by PRN, and units are selected according to the specified sample selection rate. The number of units selected randomly from each selection cell is equal to the product of the sample selection rate and the number of eligible units in the cell plus any carryover from the prior selection cell. The result is rounded to the nearest whole number. Carryover is defined as the amount that is rounded up or down to the nearest whole number.

As a result of the cost and workload associated with enrolling new sample units, all units remain in the sample a minimum of two years. To insure all units meet this minimum requirement, BLS has established a "swapping in" procedure. The procedure allows units to be swapped into the sample that were newly selected during the previous sample year and not reselected as part of the current probability sample. The procedure removes a unit within the same selection cell and places the newly selected unit from the previous year back into the sample. Approximately 68 percent of the CES sample for the private industries overlaps from the previous sample to the current sample.

Selection weights. Once the sample is drawn, sample selection weights are calculated based on the number of UI accounts actually selected within each allocation cell. The sample selection weight is approximately equal to the inverse of the probability of selection, or the inverse of the sampling rate. It is computed as:

Sample selection weight = N_h / n_h

where:

N_h = the number of noncertainty UI accounts within the allocation cell that are eligible for sample selection

n_h = the number of noncertainty UI accounts selected within the allocation cell

Frame maintenance and sample updates. Due to the dynamic economy, there is a constant cycle of business births and deaths. A semi-annual update is performed during the summer each year drawing from the previous year’s third quarter LDB data. This update selects units from the population of births and other units not previously eligible for selection and includes them as part of the sample. Updated location, contact, and administrative information is provided for all establishments that were selected in the annual sample selection.

Subsampling. The primary enrollment of new establishments takes place in BLS Regional Office Data Collection Centers (DCCs). After the sample has been sent to the DCCs, interviewers enroll the selected establishments. While the UI account is the sample unit, interviewers attempt to collect the data for all individual establishments within a UI account.

For multiple-worksite UI accounts, it is sometimes necessary to subsample worksites. This occurs when:

the company cannot report for all worksites from a central location;
the company cannot provide an aggregate report for the entire UI account;
there are too many individual worksites to make it practical to contact each of them.

With subsampling of a smaller number of worksites, both interviewer workload and respondent burden are reduced without significantly reducing the accuracy of the estimates, but this technique will result in a small increase in variance. In the event that a UI account is subsampled, weight adjustments are made to reflect each of the worksites' probability of selection.

Sample coverage

Table 2-Ca shows the latest benchmark employment levels and the approximate proportion of total universe employment coverage at the Total nonfarm and major industry levels. The coverage for individual industries within the supersectors may vary from the proportions shown.

CES sample by industry. The sample distribution by industry reflects the goal of minimizing the sampling error on the Total nonfarm employment estimate, while also providing for reliable employment estimates by industry. Sample coverage rates vary by industry as a result of building a design to meet these goals (See Table 2-Ca). For example, Manufacturing and Leisure and hospitality industries are of similar size. Manufacturing has about 11.6 million employees while Leisure and hospitality has 12.9 million employees. However their relative sample sizes are different. Manufacturing has about 14,800 sample units with a total of 2.9 million employees while Leisure and hospitality has many more sample units, about 52,900 sample units but covers only about 2.4 million employees. The Manufacturing sample therefore covers about 25 percent of all employment in Manufacturing while the Leisure and hospitality sample covers about 19 percent of all employment in that industry. The differences are linked in part to the fact that Manufacturing is characterized by a much larger average firm size than Leisure and hospitality. These types of differences do not cause a bias in the CES employment estimates because of the use of industry sampling strata and sampling weights which ensure each firm is properly represented in the estimates.

Note on Government sampling - The CES Government sample is not part of the program's probability-based design. The program is able to achieve a very high level of universe employment coverage (68 percent) by obtaining full payroll employment counts for many government agencies, thus a probability-based sample design is not necessary for this industry. The high coverage rate virtually assures a high degree of reliability for the Government employment estimates. The large Government sample does not bias the Total nonfarm employment estimates because it is used to estimate only the Government portion of Total nonfarm employment. The probability sample is used to estimate employment for all Private industries. The Private and Government estimates are summed to derive Total nonfarm employment estimates.

CES sample by employment size class. The employment universe that the CES sample is estimating for is highly skewed as shown by Table 2-Cb. The largest UI accounts comprise only 0.2 percent of all UI accounts but contain approximately 28 percent of Total nonfarm employment. Therefore it is very efficient to sample these UIs with certainty - by sampling only 0.2 percent of the UIs, the survey can cover 28 percent of total universe employment. Conversely the smallest size class (0-9 employees) contains nearly 71 percent of all UIs but only about 10 percent of Total nonfarm employment; therefore it is efficient to sample these UIs at a much lower rate. Sampling larger firms at a higher rate than smaller firms is a standard technique commonly used in business establishment surveys.

Table 2-Cc shows the distribution of the active CES sample units. A much greater proportion of large than small UIs are selected; however that does not create a bias in either the sample or the estimates made from the sample. Each sample unit selected is assigned a weight based on its probability of selection, which ensures that all firms of its size are properly represented in the estimates. UIs with a large number of employees are selected with certainty and assigned a weight of one, meaning they represent only themselves in the estimates. Conversely, a UI in a smallest firm stratum where 1 in every 100 firms are selected is assigned a weight of 100, because it represents itself and 99 other firms that were not sampled. The use of sample weights in the estimation process prevents a large (or small) firm bias in the estimates.

Reliability

The establishment survey, like other sample surveys, is subject to two types of error, sampling and nonsampling error. The magnitude of sampling error, or variance, is directly related to the size of the sample and the percentage of universe coverage achieved by the sample. The establishment survey sample covers over one-third of total universe employment; this yields a very small variance on the Total nonfarm estimates. Measurements of error associated with sample estimates are provided in Table 2-D and the all employees, production employees, and women employees standard error tables.

Benchmark revision as a measure of survey error. The sum of sampling and nonsampling error can be considered total survey error. Unlike most sample surveys which publish sampling error as their only measure of error, the CES can derive an annual approximation of total error, on a lagged basis, because of the availability of the independently derived universe data. While the benchmark error is often used as a proxy measure of total error for the CES survey estimate, it actually represents the difference between two employment estimates derived from separate statistical processes (i.e., the CES sample process and the UI administrative process) and thus reflects the net of the errors present in each program. Historically, the benchmark revision has been small for Total nonfarm employment. Over the past decade, percentage benchmark error has averaged 0.3 percent, with an absolute range from 0.1 percent to 0.7 percent.

Revisions between preliminary and final data. First preliminary estimates of employment, hours, and earnings, based on less than the total sample, are published immediately following the reference month. Final revised sample-based estimates are published two months later when nearly all the reports in the sample have been received. Table 2-D presents the root-mean-square error, the mean percent, and the mean absolute percent revision over the past five years between the preliminary and final employment estimates.

Revisions of preliminary hours and earnings estimates are normally not greater than 0.1 of an hour for weekly hours and 1 cent for hourly earnings, at the Total private level, and may be slightly larger for the more detailed industry groupings.

Variance estimation. The estimation of sample variance for the CES survey is accomplished through use of the method of Balanced Half Samples (BHS). This replication technique uses half samples of the original sample and calculates estimates using those subsamples. The sample variance is calculated by measuring the variability of the subsample estimates. The weighted link estimator is used to calculate both estimates and variances. The sample units in each cell — where a cell is based on State, industry, and size classification — are divided into two random groups. The basic BHS method is applied to both groups. The subdivision of the cells is done systematically, in the same order as the initial sample selection. Weights for units in the half sample are multiplied by a factor of 1 + where weights for units not in the half sample are multiplied by a factor of 1 - . Estimates from these subgroups are calculated using the estimation formula described above.

The formula used to calculate CES variances is as follows:

where

is the half-sample estimator
k is the number of half samples
is the original full-sample estimates.

Appropriate uses of sampling variances. Variance statistics are useful for comparison purposes, but they do have some limitations. Variances reflect the error component of the estimates that is due to surveying only a subset of the population, rather than conducting a complete count of the entire population. However, they do not reflect nonsampling error, such as response errors, and bias due to nonresponse. The variances of the over-the-month change estimates are very useful in determining when changes are significant at some level of confidence. Variance statistics for first and second closings are available for all employees, production employees, and women employees. In addition, third closing variances are available upon request.

Sampling errors. The sampling errors shown for all private industries and Total nonfarm have been calculated for estimates that follow the benchmark employment revision by a period of 16 to 20 months. The errors are presented as median values of the observed error estimates. These estimates have been estimated using the method of Balanced Half Samples (BHS) with the probability sample data and sample weights assigned at the time of sample selection.

Illustration of the use of relative standard error tables. All employees, production employees, and women employees standard error tables provide a reference for relative standard errors of all major series developed from the CES. The standard errors of differences between estimates in two non-overlapping industries are calculated as

since the two estimates are independent.

The errors are presented as relative standard errors (standard error divided by the estimate and expressed as a percent). Multiplying the relative standard error by its estimated value gives the estimate of the standard error.

Suppose that the level of all employees for Financial activities in a given month at first closing is estimated at 7,819,000. The approximate relative standard error of this estimate (0.5 percent) is provided in Table 2-E. A 90-percent confidence interval would then be the interval:

7,819,000 +/- (1.645*.005*7,819,000) = 7,819,000 +/- 64,311 = 7,754,689 to 7,883,311

Illustration of the use of standard error tables. All employees, production employees, and women employees standard error tables provide a reference for the standard errors of 1-, 3-, and 12-month changes in the employment, hours, and earnings series. The errors are presented as standard errors of the changes. Suppose that the over-the-month change in all employee average hourly earnings (AHE) from January to February in Coal mining at second closing is $0.11. The standard error for a 1-month change for Coal mining from the table is $0.34. The interval estimate of the over-the-month change in AHE that will include the true over-the-month change with 90-percent confidence is calculated:

$0.11 +/- (1.645*$0.34) = $0.11 +/- $0.56 = [-$0.45, $0.67]

The true value of the over-the-month change is in the interval -$0.45 to $0.67. Because this interval includes $0.00 (no change), the change of $0.11 shown is not significant at the 90-percent confidence level. Alternatively, the estimated change of $0.11 does not exceed $0.56 (1.645 * $0.34); therefore, one could conclude from these data that the change is not significant at the 90-percent confidence level.

Statistics for States and areas

(Tables D-1, D-2, D-3, D-4, D-5, and D-6)

BLS independently develops National and State and area employment, hours, and earnings series. Both sets of estimates are based on the same establishment reports; however, BLS uses the full CES sample to produce monthly National employment estimates, while BLS uses only the State-specific portion of the sample to develop State employment estimates. CES area statistics relate to metropolitan areas. CES uses the most recent Office of Management and Budget (OMB) Bulletin regarding statistical area definitions (OMB Bulletin No. 10-02 http://www.whitehouse.gov/sites/default/files/omb/assets/bulletins/b10-02.pdf) to define metropolitan statistical areas and metropolitan divisions. CES also produces area statistics for non-standard areas (areas which are not defined in the OMB Bulletin), noted at http://www.bls.gov/sae/saenonstd.htm. Changes in definitions are noted as they occur. Estimates for States and areas are produced using two methods. The majority of State and area estimates are produced using direct sample-based estimation. However, published area and industry combinations (domains) that do not have a large enough sample to support estimation using only sample responses have been estimated using a small domain model.

Small domain model

The small domain model consists of a weighted sum of three different relative over-the-month change estimates, , , and . These three relative over-the-month estimates are then weighted based on the variance of each of the three estimates. The larger the variance of each estimate relative to the other variances, the smaller the weight. The resulting estimate of current month employment is defined as:

where = current month t employment estimate for domain ia defined by the intersection of industry i and area a.

= current month relative over-the-month change estimate based on available sample responses for domain ia.

= current month weight assigned to based on the variances of , , and . The weights and are defined similarly.

= current month relative over-the-month change estimate based on time series forecasts using historical universe employment counts for domain ia. These historical universe employment counts are available from January 1990 to 12 months prior to the current month t.

= current month relative over-the-month change estimate based on a synthetic estimate of the relative change that uses all sample responses in the State that includes area a for industry i.

= previous month employment estimate for domain ia from the small domain model.

It is possible that for a given industry i and area a, one or even two of the inputs to the model are assigned weights of zero. The reasons for assigning a weight of zero to a model input are due to concerns regarding the stability of the inputs. For example, if or has five or fewer responses, then it is assigned a weight of zero. If exhibits an unstable variance or has extremely poor model fit, then it may also be assigned a weight of zero. In these cases, the small domain model estimate may be based on only one or two of the three described inputs.

Sampling errors are not applicable to the estimates made using the small domain models. The measure available to judge the reliability of these modeled estimates is their performance over past time periods compared with the universe values for those time periods. These measures are useful, however, it is not certain that the past performance of the modeled estimates accurately reflects their current performance.

It should also be noted that extremely small estimates of 2,000 employees or less are potentially subject to large percentage revisions that are caused by occurrences such as the relocation of one or two businesses, or a change in the activities of one or two businesses. These are non-economic classification changes that relate to the activity or location of businesses and will be present for sample-based estimates as well as the model-based estimates. Error measures for State and area estimates are available on the BLS website at http://www.bls.gov/sae/790stderr.htm.

Caution in aggregating State data. The National estimation procedures used by BLS are designed to produce accurate National data by detailed industry; correspondingly, the State estimation procedures are designed to produce accurate data for each individual State. State estimates are not forced to sum to National totals nor vice versa. Because each State series is subject to larger sampling and nonsampling errors than the National series, summing them cumulates individual State level errors and can cause distortion at an aggregate level.

The CES program employs a concurrent seasonal adjustment methodology to seasonally adjust it's National estimates of employment, hours, and earnings. Under concurrent methodology, new seasonal factors are calculated each month using all relevant data up to and including the current month period. In contrast, CES uses a 2-step seasonal adjustment procedure for adjusting State and area nonfarm payroll employment estimates. This process uses UI seasonal trends to adjust the benchmarked historical data, but incorporates sample seasonal trends to adjust the current sample-based estimates in the postbenchmark months. State and area seasonal factors are projected annually and do not incorporate current data. Users should take note of these differences in methodology.

Last Modified Date: February 3, 2012

SUBJECT AREAS »

Resources For »

At a Glance Tables »

MORE »

CUSTOMIZED TABLES »

Current Employment Statistics - CES (National)

Current Employment Statistics - CES (National)

Technical Notes to Establishment Survey Data

The Sample

Design

Sample coverage

Reliability

Statistics for States and areas

Small domain model

tools

calculators

help

info

resources