U.S. Census Bureau
2002 Economic Census main page

2002 Economic Census:
Business Expenses
Methodology



Introduction to the Economic Census
Purposes and uses Industry classifications Relationship to historical classifications Additional data Historical information Sources for more information

Business Expenses - Introductory Text
Scope Geographic area Dollar values Comparison to IRS / BEA Comparability Reliability of data Disclosure More frequent data Related reports Contacts Abbreviations

Appendix C. Business Expenses Survey Methodology

Appendix C.
Methodology


SAMPLE DESIGN OF THE 2002 BUSINESS EXPENSES SURVEY

The estimates for merchant wholesale, retail trade and service industries are derived from the 2002 Business Expenses Survey (BES). The BES sample is the combination of the samples used for the 2002 Annual Trade Survey, the 2002 Annual Retail Trade Survey, and the 2002 Service Annual Survey. These samples are probability samples of firms engaged in the various industries. A firm is a business organization consisting of one or more establishments under common ownership or control. An establishment is a single physical location where business is conducted or where services are performed.

The initial sample frames for the surveys were constructed from the Census Bureau's’s Standard Statistical Establishment List (SSEL) as of June 1999. The sample frames contained two types of sampling units represented - large multiple establishment firms and Employer Identification Numbers (EINs). Both sampling units can represent one or more establishments owned or controlled by the same firm. Firms were stratified by kind-of-business and then by a measure-of-size related to their annual receipts, revenue, or sales.

The frames included only employers, and only employers were actually mailed questionnaires in the survey. In the retail and service industries sales data for nonemployers were obtained from administrative records. Estimates of the expenses for nonemployers were derived based on the administrative sales for the nonemployers and the sales and expense data for employers.

To reduce the variability of the estimates, the sampling units with the largest measures of size were selected "with certainty." This means they are self-representing (i.e., each has a selection probability of one and a sampling weight of one). Within each kind-of-business, a substratum boundary (or cutoff) that divides the certainty units from the noncertainty units was determined. If a unit was included in the certainty portion, the firm was the sampling unit. All firms not selected with certainty were subjected to sampling on an EIN basis.

Data from the 1997 Economic Census were analyzed to determine the certainty cutoffs, noncertainty stratum boundaries, and the sampling rates needed to achieve specified sampling variability objectives for each kind-of-business group. These sampling rates were applied to the sample frames to determine the total sample size for each group, which was then allocated to the size classes optimally based on the number of sampling units and the standard deviation of the units' measures of size. Within each noncertainty stratum, a simple random sample of EINs was selected. The sampling rates for the EINs varied between one in three and one in 1,000.

A two-phase sample selection procedure was used for births (new EINs issued after the initial frames were created). EIN births are new EINs assigned by the Internal Revenue Service (IRS) on their latest available list of FICA (Federal Insurance Contributions Act) taxpayers. There are no receipts values available for these EINs, so a large sample was drawn and canvassed to obtain a more reliable measure of size (sales or receipts) and a more reliable kind-of-business code, if needed. Using this more reliable information, the selected births were subjected to probability proportional to size sampling with overall probabilities equivalent to those used in drawing the initial sample from the 1999 SSEL.

ESTIMATION PROCEDURES

Data on Merchant Wholesale and Retail sales are reproduced on a 2002 NAICS basis from the 2002 Economic Census. Data compiled on Merchant Wholesale and Retail merchandise purchases are the same as presented in reports from the Annual Trade Survey and Annual Retail Trade Survey, respectively. These annual data had previously been adjusted to 2002 NAICS-based sales reported in the 2002 Economic Census. Data on service industries receipts and revenue presented in this report are reproduced on a 1997 NAICS basis from the 2002 Economic Census.

All estimates are computed as the sum of weighted data (reported and imputed) for all sampling units. The weight for a sampling unit is the reciprocal of the probability of selection (or sampling rate). Wholesale, Retail, and Accommodation and Food sales and expenses are adjusted to 2002 NAICS-based Census sales for the industry by multiplying them by the ratio of sales from the 2002 Census to sales from the BES. Service revenue and expenses are adjusted to 1997 NAICS-based Census revenue for the industry by multiplying them by the ratio of revenue from the 2002 Census to revenue from the BES. This adjustment puts revenue and expenses in line with the Census figures.

SAMPLING ERROR

The sample used in this survey is one of many possible samples that could have been selected using the same sampling methodology. Each of these possible samples would likely yield different results. The Relative Standard Error (RSE), also referred to as the coefficient of variation (CV), is a measure of the variability among the estimates from these possible samples. The RSE accounts for sampling variability but does not account for nonsampling error or systematic biases in the data. Bias is the difference, averaged over all possible samples of the same design and size, between the estimate and the true value being estimated. The sample estimate and an estimate of its relative standard error can be used to estimate the standard error (SE) and then construct interval estimates with a prescribed level of confidence that the interval includes the average results of all samples. To illustrate, if all possible samples were surveyed under essentially the same condition, and estimates calculated from each sample, then:

  1. Intervals defined by one SE above and below the sample estimate will contain the true value about 68 percent of the time,
  2. Intervals defined by 1.6 SE above and below the sample estimate will contain the true value about 90 percent of the time,
  3. Intervals defined by two SEs above and below the sample estimate will contain the true value about 95 percent of the time.

Thus, for a particular sample, one can say with specified confidence that the average of all possible samples is included in the constructed interval.

Example of a confidence interval. Suppose the estimated operating expenses are $4,572 million, and the estimated relative standard error is 1.8 percent. Then the estimated standard error is $4,572 X .018 = $82.3. An approximate 90-percent confidence interval is $4,572 -/+ (1.6 X $82.3) or $4,440.3 to $4,703.7 million.

Relative Standard Errors have not been calculated for the percent estimates shown in this report. An upper bound on the RSE of a percent can be estimated by taking the square root of [(RSE for the value in the numerator squared) plus (RSE for the value in the denominator squared)].

A description of sample design and estimation procedures can be found on the Internet for the:

NONSAMPLING ERRORS

Nonsampling errors can be attributed to many sources: inability to obtain information about all companies in the sample; inability or unwillingness on the part of respondents to provide correct information; response errors; definition difficulties; differences in the interpretation of questions; mistakes in recording or coding the data; and other errors of collection, response, coverage, and estimation for nonresponse. Explicit measures of the effects of these nonsampling errors are not available. To minimize nonsampling error, precautionary steps were taken in all phases of the collection, processing, and tabulation of the data in an effort to minimize its influence.

A potential source of bias in the estimates is due to imputing data for nonrespondents and for data that failed the edit. Imputation is the process of replacing a missing value with administrative data or a predicted value obtained from an appropriate model for nonresponse. Nonresponse is defined as the inability to obtain all the intended measurements or responses about all selected units. Two types of nonresponse are often distinguished. Unit nonresponse is used to describe the inability to obtain any of the substantive measurements about a sampled unit. In most cases of unit nonresponse, the questionnaire was never returned to the Census Bureau, after several attempts to elicit a response. Item nonresponse occurs either when a question is unanswered or the response to the question fails computer or analyst edits.

DATA SUPPRESSION

Estimates are withheld, or suppressed, when publication standards are not met. Suppression occurs when one or more of the following criteria are met:

Suppressed data are denoted by the publication of the character ‘S’ in the data tables.