Technical Notes to Establishment Survey Data
The Sample
Design
The Current Employment Statistics (CES) sample is a stratified, simple random sample of worksites, clustered
by Unemployment Insurance (UI) account number. The UI account number is a major identifier on the BLS
longitudinal database of employer records, which serves as both the sampling
frame and the benchmark source for the CES employment estimates. The sample
strata, or subpopulations, are defined by State, industry, and employment size,
yielding a State-based design. The sampling rates for each stratum are
determined through a method known as optimum allocation, which distributes a
fixed number of sample units across a set of strata to minimize the overall
variance, or sampling error, on the primary estimate of interest. The Total
nonfarm employment level is the primary estimate of interest, and the CES sample
design gives top priority to measuring it as precisely as possible, or minimizing the statistical error around the statewide Total nonfarm
employment estimates.
Frame and sample selection. The Longitudinal Data Base (LDB)
is the universe from which BLS draws the CES sample. The LDB contains data on
the roughly 9 million U.S. business establishments covered by UI, representing
nearly all elements of the U.S. economy. The Quarterly Census of Employment and
Wages (QCEW) program collects these data from employers on a quarterly basis
in cooperation with State Workforce Agencies (SWAs). The LDB contains employment
and wage information from employers, as well as name, address, and location
information. It also contains identification information such as UI account number and reporting unit or worksite number.
The LDB contains records of all employers covered under the UI
tax system. That system covers 97 percent of all employers in the 50
States, the District of Columbia, Puerto Rico, and the Virgin Islands. There are
a few sections of the economy that are not covered, including the self-employed,
unpaid family workers, railroads, religious organizations, small agricultural
employers, and elected officials. Data for employers generally are reported at
the worksite level. Employers who have multiple establishments within a State
usually report data for each individual establishment. The LDB tracks
establishments over time and links them from quarter to quarter.
Permanent Random Numbers (PRNs) have been assigned to all UI accounts on the
sampling frame. As new units appear on the frame, random numbers are assigned to
those units as well. As records are linked across time, the PRN is carried
forward in the linkage.
The CES sample is stratified by State, industry, and size. Stratification
groups population members together for the purpose of sample allocation and
selection. The strata, or groups, are composed of homogeneous units. With 13
industries and 8 size classes, there are 104 total allocation cells per State.
The sampling rate for each stratum is determined through a method known as
optimum allocation. Optimum allocation minimizes variance at a fixed cost or
minimizes cost for a fixed variance. Under the CES probability design, a fixed
number of sample units for each State is distributed across the allocation
strata in such a way as to minimize the overall variance, or sampling error, of
the total State employment level. The number of sample units in the CES
probability sample was fixed according to available program resources. The
optimum allocation formula places more sample in cells for which data cost less
to collect, cells that have more units, and cells that have a larger variance.
In the fall of each year, a new sample is drawn from that year’s first quarter LDB data. Annual sample selection helps keep the CES survey current with respect to employment from business births and business deaths. In addition, the updated universe files provide the most recent information on industry, size, and metropolitan area designation. About a full year separates the sample draw and the sample implementation to allow time for enrollment and collection of selected units. Enrollment of the selected units begins immediately following the sample draw and collection begins immediately following enrollment. Preliminary estimates for January through December 2012 will be made using the sample selected from the 2010 LDB data.
After all out-of-scope records are removed, the sampling frame is sorted into
allocation cells. Within each allocation cell, units are sorted by metropolitan statistical area (MSA) and by
the size of the MSA, defined as the number of UI accounts in that MSA. As the
sampling rate is uniform across the entire allocation cell, implicit
stratification by MSA ensures that a proportional number of units are sampled
from each MSA. Some MSAs may have too few UI accounts in the allocation cell;
these MSAs are collapsed and treated as a single MSA. Within each selection
cell, the units are sorted by PRN, and units are selected according to the
specified sample selection rate. The number of units selected randomly from each
selection cell is equal to the product of the sample selection rate and the
number of eligible units in the cell plus any carryover from the prior
selection cell. The result is rounded to the nearest whole number. Carryover is
defined as the amount that is rounded up or down to the nearest whole
number.
As a result of the cost and workload associated with enrolling new sample
units, all units remain in the sample a minimum of two years. To insure all
units meet this minimum requirement, BLS has established a "swapping in"
procedure. The procedure allows units to be swapped into the sample that were
newly selected during the previous sample year and not reselected as part of the
current probability sample. The procedure removes a unit within the same
selection cell and places the newly selected unit from the previous year back
into the sample. Approximately 68 percent of the CES sample for the
private industries overlaps from the previous sample to the current sample.
Selection weights. Once the sample is drawn, sample selection
weights are calculated based on the number of UI accounts actually selected
within each allocation cell. The sample selection weight is approximately equal
to the inverse of the probability of selection, or the inverse of the sampling
rate. It is computed as:
Sample selection weight = Nh / nh
where:
Nh = the number of noncertainty UI accounts within
the allocation cell that are eligible for sample selection
nh = the number of noncertainty UI accounts selected
within the allocation cell
Frame maintenance and sample updates. Due to the dynamic
economy, there is a constant cycle of business births and deaths. A semi-annual update is performed during the summer each year drawing from the previous year’s third quarter LDB data. This update selects
units from the population of births and other units not previously eligible for
selection and includes them as part of the sample. Updated location, contact,
and administrative information is provided for all establishments that were
selected in the annual sample selection.
Subsampling. The primary enrollment of new
establishments takes place in BLS Regional Office Data Collection Centers
(DCCs). After the sample has been sent to the DCCs, interviewers enroll the
selected establishments. While the UI account is the sample unit, interviewers
attempt to collect the data for all individual establishments within a UI
account.
For multiple-worksite UI accounts, it is sometimes necessary to subsample
worksites. This occurs when:
- the company cannot report for all worksites from a central
location;
- the company cannot provide an aggregate report for the entire UI
account;
- there are too many individual worksites to make it practical to contact
each of them.
With subsampling of a smaller number of worksites, both interviewer workload
and respondent burden are reduced without significantly reducing the accuracy of
the estimates, but this technique will result in a small increase in variance.
In the event that a UI account is subsampled, weight adjustments are made to
reflect each of the worksites' probability of selection.
Sample coverage
Table 2-Ca shows the latest
benchmark employment levels and the approximate proportion of total universe
employment coverage at the Total nonfarm and major industry levels.
The coverage for individual industries within the supersectors may vary from the
proportions shown.
CES sample by industry. The sample distribution by industry reflects the goal of minimizing the
sampling error on the Total nonfarm employment estimate, while also providing
for reliable employment estimates by industry. Sample coverage rates vary by
industry as a result of building a design to meet these goals (See Table 2-Ca). For example,
Manufacturing and Leisure and hospitality industries are of similar size.
Manufacturing has about 11.6 million employees while Leisure and hospitality has
12.9 million employees. However their relative sample sizes are different.
Manufacturing has about 14,800 sample units with a total of 2.9 million
employees while Leisure and hospitality has many more sample units, about 52,900
sample units but covers only about 2.4 million employees. The Manufacturing
sample therefore covers about 25 percent of all employment in Manufacturing
while the Leisure and hospitality sample covers about 19 percent of all
employment in that industry. The differences are linked in part to the fact that
Manufacturing is characterized by a much larger average firm size than Leisure
and hospitality. These types of differences do not cause a bias in the CES
employment estimates because of the use of industry sampling strata and sampling
weights which ensure each firm is properly represented in the estimates.
Note on Government sampling - The CES Government sample is not part of the
program's probability-based design. The program is able to achieve a very
high level of universe employment coverage (68 percent) by obtaining full
payroll employment counts for many government agencies, thus a probability-based
sample design is not necessary for this industry. The high coverage rate
virtually assures a high degree of reliability for the Government employment
estimates. The large Government sample does not bias the Total nonfarm
employment estimates because it is used to estimate only the Government portion
of Total nonfarm employment. The probability sample is used to estimate
employment for all Private industries. The Private and Government estimates are
summed to derive Total nonfarm employment estimates.
CES sample by employment size class. The employment universe that the CES sample is estimating for is highly
skewed as shown by Table 2-Cb.
The largest UI accounts comprise only 0.2 percent of all UI accounts but contain
approximately 28 percent of Total nonfarm employment. Therefore it is very
efficient to sample these UIs with certainty - by sampling only 0.2 percent of
the UIs, the survey can cover 28 percent of total universe employment.
Conversely the smallest size class (0-9 employees) contains nearly 71 percent of
all UIs but only about 10 percent of Total nonfarm employment; therefore it is
efficient to sample these UIs at a much lower rate. Sampling larger firms at a
higher rate than smaller firms is a standard technique commonly used in business
establishment surveys.
Table 2-Cc shows the
distribution of the active CES sample units. A much greater proportion of large
than small UIs are selected; however that does not create a bias in either the
sample or the estimates made from the sample. Each sample unit selected is
assigned a weight based on its probability of selection, which ensures that all
firms of its size are properly represented in the estimates. UIs with a large number of employees are selected with certainty and assigned a weight of one, meaning
they represent only themselves in the estimates. Conversely, a UI in a smallest
firm stratum where 1 in every 100 firms are selected is assigned a weight of
100, because it represents itself and 99 other firms that were not sampled. The
use of sample weights in the estimation process prevents a large (or small) firm
bias in the estimates.
Reliability
The establishment survey, like other sample surveys, is subject to two types
of error, sampling and nonsampling error. The magnitude of sampling error, or
variance, is directly related to the size of the sample and the percentage of
universe coverage achieved by the sample. The establishment survey sample covers
over one-third of total universe employment; this yields a very small variance
on the Total nonfarm estimates. Measurements of error associated with sample
estimates are provided in Table 2-D and the
all employees,
production employees, and
women employees standard error tables.
Benchmark revision as a measure of survey error. The sum of
sampling and nonsampling error can be considered total survey error. Unlike most
sample surveys which publish sampling error as their only measure of error, the
CES can derive an annual approximation of total error, on a lagged basis,
because of the availability of the independently derived universe data. While
the benchmark error is often used as a proxy measure of total error for the CES
survey estimate, it actually represents the difference between two employment
estimates derived from separate statistical processes (i.e., the CES sample
process and the UI administrative process) and thus reflects the net of the
errors present in each program. Historically, the benchmark revision has been
small for Total nonfarm employment. Over the past decade, percentage
benchmark error has averaged 0.3 percent, with an absolute range from 0.1 percent to 0.7 percent.
Revisions between preliminary and final data. First
preliminary estimates of employment, hours, and earnings, based on less than the
total sample, are published immediately following the reference month. Final
revised sample-based estimates are published two months later when nearly all the
reports in the sample have been received. Table 2-D presents the
root-mean-square error, the mean percent, and the mean absolute percent revision
over the past five years between the preliminary and final employment estimates.
Revisions of preliminary hours and earnings estimates are normally not
greater than 0.1 of an hour for weekly hours and 1 cent for hourly earnings, at
the Total private level, and may be slightly larger for the more detailed
industry groupings.
Variance estimation. The estimation of sample
variance for the CES survey is accomplished through use of the method of
Balanced Half Samples (BHS). This replication technique uses half samples of the
original sample and calculates estimates using those subsamples. The sample
variance is calculated by measuring the variability of the subsample estimates.
The weighted link estimator is used to calculate both estimates and variances.
The sample units in each cell — where a cell is based on State, industry, and
size classification — are divided into two random groups. The basic BHS method
is applied to both groups. The subdivision of the cells is done systematically,
in the same order as the initial sample selection. Weights for units in the half
sample are multiplied by a factor of 1 + where weights for units not in the half sample are multiplied by
a factor of 1 - . Estimates from these
subgroups are calculated using the estimation formula described above.
The formula used to calculate CES variances is as follows:
,
where
- is the half-sample estimator
-
- k is the number of half samples
- is the original full-sample estimates.
Appropriate uses of sampling variances. Variance
statistics are useful for comparison purposes, but they do have some
limitations. Variances reflect the error component of the estimates that is due
to surveying only a subset of the population, rather than conducting a complete
count of the entire population. However, they do not reflect nonsampling error,
such as response errors, and bias due to nonresponse. The variances of the
over-the-month change estimates are very useful in determining when changes are
significant at some level of confidence. Variance statistics for first and second closings
are available for all employees, production employees, and women employees. In addition, third closing variances are available upon request.
Sampling errors. The sampling errors shown for all
private industries and Total nonfarm have been calculated for estimates that
follow the benchmark employment revision by a period of 16 to 20 months. The
errors are presented as median values of the observed error estimates. These
estimates have been estimated using the method of Balanced Half Samples (BHS)
with the probability sample data and sample weights assigned at the time of
sample selection.
Illustration of the use of relative standard error tables. All employees, production employees, and women employees standard error tables provide a reference for
relative standard errors of all major series developed from the CES.
The standard
errors of differences between estimates in two non-overlapping industries are
calculated as
since the
two estimates are independent.
The errors are presented as relative standard errors (standard error divided
by the estimate and expressed as a percent). Multiplying the relative standard
error by its estimated value gives the estimate of the standard error.
Suppose that the level of all employees for Financial activities in a given
month at first closing is estimated at 7,819,000. The approximate relative standard error of this
estimate (0.5 percent) is provided in Table 2-E. A 90-percent confidence
interval would then be the interval:
7,819,000 +/- (1.645*.005*7,819,000) = 7,819,000 +/- 64,311 = 7,754,689 to
7,883,311
Illustration of the use of standard error tables. All employees, production employees, and women employees standard error tables provide a reference for
the standard errors of 1-, 3-, and 12-month changes in the employment, hours, and earnings series. The
errors are presented as standard errors of the changes. Suppose that the
over-the-month change in all employee average hourly earnings (AHE) from January to February in Coal mining at second closing is $0.11.
The standard error for a 1-month change for Coal mining from the table is $0.34.
The interval estimate of the over-the-month change in AHE that will include the
true over-the-month change with 90-percent confidence is calculated:
$0.11 +/- (1.645*$0.34) = $0.11 +/- $0.56 = [-$0.45, $0.67]
The true value of the over-the-month change is in the interval -$0.45 to
$0.67. Because this interval includes $0.00 (no change), the change of $0.11
shown is not significant at the 90-percent confidence level. Alternatively, the
estimated change of $0.11 does not exceed $0.56 (1.645 * $0.34); therefore, one
could conclude from these data that the change is not significant at the
90-percent confidence level.
Statistics for States and areas
(Tables D-1, D-2, D-3, D-4, D-5, and D-6)
BLS independently develops National and State and area employment, hours, and earnings series. Both sets of estimates are based on the same establishment reports; however, BLS uses the full CES sample to produce monthly National employment estimates, while BLS uses only the State-specific portion of the sample to develop State employment estimates. CES area statistics relate to metropolitan areas. CES uses the most recent Office of Management and Budget (OMB) Bulletin regarding statistical area definitions (OMB Bulletin No. 10-02 http://www.whitehouse.gov/sites/default/files/omb/assets/bulletins/b10-02.pdf) to define metropolitan statistical areas and metropolitan divisions. CES also produces area statistics for non-standard areas (areas which are not defined in the OMB Bulletin), noted at http://www.bls.gov/sae/saenonstd.htm. Changes in definitions are noted as they occur. Estimates for States and areas are produced using two methods. The majority of State and area estimates are produced using direct sample-based estimation. However, published area and industry combinations (domains) that do not have a large enough sample to support estimation using only sample responses have been estimated using a small domain model.
Small domain model
The small domain model consists of a weighted sum of three different relative
over-the-month change estimates, , , and . These three relative
over-the-month estimates are then weighted based on the variance of each of the
three estimates. The larger the variance of each
estimate relative to the other variances, the smaller the weight. The resulting estimate of current
month employment is defined as:
where = current month t employment estimate for domain
ia defined by the intersection of industry i and area a.
= current month relative over-the-month
change estimate based on available sample responses for domain ia.
= current month weight assigned to based on the variances of
, , and . The weights and are defined
similarly.
= current month
relative over-the-month change estimate based on time series forecasts using
historical universe employment counts for domain ia. These historical universe
employment counts are available from January 1990 to 12 months prior to the
current month t.
= current month
relative over-the-month change estimate based on a synthetic estimate of the
relative change that uses all sample responses in the State that includes area
a for industry i.
= previous month employment estimate for
domain ia from the small domain model.
It is possible that for a given industry i and area a, one or even two of the
inputs to the model are assigned weights of zero.
The reasons for assigning a weight of zero to a model input are due to concerns
regarding the stability of the inputs. For example, if or has five or fewer responses, then
it is assigned a weight of zero. If exhibits an unstable variance or has extremely poor
model fit, then it may also be assigned a weight of zero. In these cases, the small
domain model estimate may be based on only one or two of the three described
inputs.
Sampling errors are not applicable to the estimates made using the small
domain models. The measure available to judge the reliability of these modeled
estimates is their performance over past time periods compared with the universe
values for those time periods. These measures are useful, however, it is not
certain that the past performance of the modeled estimates accurately reflects
their current performance.
It should also be noted that extremely small estimates of 2,000 employees or
less are potentially subject to large percentage revisions that are caused by
occurrences such as the relocation of one or two businesses, or a change in the
activities of one or two businesses. These are non-economic classification
changes that relate to the activity or location of businesses and will be
present for sample-based estimates as well as the model-based estimates. Error
measures for State and area estimates are available on the BLS website at http://www.bls.gov/sae/790stderr.htm.
Caution in aggregating State data.
The National estimation
procedures used by BLS are designed to produce accurate National data by
detailed industry; correspondingly, the State estimation procedures are designed
to produce accurate data for each individual State. State estimates are not
forced to sum to National totals nor vice versa. Because each State series is
subject to larger sampling and nonsampling errors than the National series,
summing them cumulates individual State level errors and can cause distortion at
an aggregate level.
The CES program employs a concurrent seasonal adjustment methodology to seasonally adjust it's National estimates of employment, hours, and earnings. Under concurrent methodology, new seasonal factors are calculated each month using all relevant data up to and including the current month period. In contrast, CES uses a 2-step seasonal adjustment procedure for adjusting State and area nonfarm payroll employment estimates. This process uses UI seasonal trends to adjust the benchmarked historical data, but incorporates sample seasonal trends to adjust the current sample-based estimates in the postbenchmark months. State and area seasonal factors are projected annually and do not incorporate current data. Users should take note of these differences in methodology.
Last Modified Date: February 3, 2012