The U.S. Census Bureau conducts the Service Annual Survey (SAS) to provide national estimates of annual revenues and expenses of establishments classified in select service sectors. (See the Coverage section below for more information on the industries included in the 2007 Service Annual Survey.)
We develop the estimates in this report using data from a probability sample and administrative data. Survey questionnaires are mailed to a probability sample that is regularly updated and periodically re-selected from a universe of firms located in the United States and having paid employees. The sample includes firms of all sizes and covers both taxable firms and firms exempt from Federal income taxes. Firms without paid employees (nonemployers) are included in the estimates through administrative data provided by other Federal agencies and through imputation.
The estimates contained in this report are summarized by industry classification based on the 2002 North American Industry Classification System (NAICS). The NAICS groups establishments into industries based on the activities in which they are primarily engaged. This system, developed jointly by the statistical agencies of Canada, Mexico, and the United States, allows for comparisons of business activity across North America.
Estimates in this report are presented for select industries in the following NAICS sectors:
48-49 Transportation and Warehousing
51 Information
52 Finance and Insurance
53 Real Estate and Rental and Leasing
54 Professional, Scientific, and Technical Services
56 Administrative and Support and Waste Management and
Remediation Services
62 Health Care and Social Assistance
71 Arts, Entertainment, and Recreation
81 Other Services (except Public Administration)
Detailed information about NAICS can be found on the Census Bureau website at:
http://www.census.gov/epcd/www/naics.html
Title 13 of the United States Code authorizes the Census Bureau to conduct censuses and surveys. Section 9 of the same Title requires that any information collected from the public under the authority of Title 13 be maintained as confidential. Section 214 of Title 13 and Sections 3559 and 3571 of Title 18 of the United States Code provide for the imposition of penalties of up to five years in prison and up to $250,000 in fines for wrongful disclosure of confidential census information. In accordance with Title 13, no estimates are published that would disclose the operations of an individual firm.
The Census Bureau’s internal Disclosure Review Board sets the confidentiality rules for all data releases. A checklist approach is used to ensure that all potential risks to the confidentiality of the data are considered and addressed.
Some unpublished estimates can be derived directly from this report by subtracting published estimates from their respective totals. However, the figures obtained by such subtraction are subject to poor response rates, high sampling variability, or other factors that result in their failure to meet Census Bureau standards for publication.
Individuals who use Service Annual Survey estimates to create new estimates should cite the Census Bureau as the source of only the original estimates.
The sampling frame used for the Service Annual Survey (SAS) has two types of sampling units represented: Employer Identification Numbers (EINs) and large, multiple-establishment firms. Both sampling units represent clusters of one or more establishments owned or controlled by the same firm. The information used to create these sampling units was extracted from data collected as part of the 2002 Economic Census and from establishment records contained on the Census Bureau's Business Register as updated to December 2004. The next few paragraphs give details about the Business Register; the distinction between firms, EINs, and establishments; and the construction of the sampling units. Though important, they are not essential to understanding the basic sample design and readers may continue to the Stratification, Sampling Rates, and Allocation section.
The Business Register is a multi-relational database that contains a record for each known establishment that is located in the United States or one of its territories and has paid employees. An establishment is a single physical location where business transactions take place and for which payroll and employment records are kept. Groups of one or more establishments under common ownership or control are firms. A single-unit firm owns or operates only one establishment. A multiunit firm owns or operates two or more establishments. The treatment of establishments on the Business Register differs according to whether the establishment is part of a single-unit or multiunit firm. In particular, the structure of an establishment’s primary identifier on the Business Register differs depending on whether it is owned by a single-unit firm or by a multiunit firm.
A single-unit firm’s primary identifier is its EIN. The Internal Revenue Service (IRS) issues the EIN, and the firm uses it as an identifier to report social security payments for its employees under the Federal Insurance Contributions Act (FICA). The same act requires all employer firms to use EINs. Each employer firm is associated with at least one EIN and only one firm can use a given EIN. Because a single-unit firm has only one establishment, there is a one-to-one relationship between the firm and the EIN. Thus the firm, the EIN, and the establishment all reference the same physical location and all three terms can be used interchangeably and unambiguously when referring to a single-unit firm.
For multiunit firms however, a different structure connects the firm with its establishments via the EIN. Essentially a multiunit firm is associated with a cluster of one or more EINs and EINs are associated with one or more establishments. A multiunit firm consists of at least two establishments. Each firm is associated with at least one EIN and only one firm can use a given EIN. However, one multiunit firm may have several EINs. Similarly, there is a one-to-many relationship between EINs and establishments. Each EIN can be associated with many establishments but each establishment is associated with only one EIN. Because of the possibility of one-to-many relationships, we must distinguish between the firm, its EINs, and its establishments. The multiunit firm that owns or controls a particular establishment is identified on the Business Register by way of the establishment’s primary identifier.
The primary identifier of an establishment owned by a multiunit firm consists of a unique combination of an alpha number and a plant number. The alpha number identifies the multiunit firm, and the plant number identifies a particular establishment within that firm. All establishments owned or controlled by the same multiunit firm have the same alpha number. Different multiunit firms have different alpha numbers, and different establishments within the same multiunit firm have different plant numbers. The Census Bureau assigns both the alpha number to the multiunit firm and plant numbers to the corresponding establishments based on the results of the quinquennial economic census and the annual Company Organization Survey.
To create the sampling frame, we extract the records for all establishments located in the United States and classified in select service sectors as defined by the 2002 NAICS. For these establishments, we extract revenue, payroll, employment, name and address information, as well as primary identifiers and, for establishments owned by multiunit firms, associated EINs.
To create the sampling units for multiunit firms, we aggregate the economic data of the establishments owned by these firms to an EIN level by tabulating the establishment data for all service establishments associated with the same EIN. Similarly we aggregate the data to a multiunit firm level by tabulating the establishment data for all service establishments associated with the same alpha number. No aggregation is necessary to put single-unit establishment information on an EIN basis or a firm basis. Thus, the sampling units created for single-unit firms simultaneously represent establishment, EIN, and firm information. In summary, the sampling frame is a complex amalgam of establishments, EINs, and firms.
The first step in the sample selection identified firms selected with certainty: the estimated annual revenue of the firm was greater than the corresponding certainty cutoff.
All firms not selected with certainty were subjected to sampling on an EIN basis. If a firm had more than one EIN, we treated each of its EINs as a separate sampling unit. To be eligible for the initial sampling, an EIN had to have nonzero payroll in 2003. The EINs were stratified according to their major industry and their estimated revenue (on a 2002 basis). Within each noncertainty stratum, a simple random sample of EINs was selected without replacement.
Periodically, we update the sample to represent new EINs appearing on the Business Register. These new EINs, called births, are EINs recently assigned by the IRS on the latest available IRS mailing list for FICA taxpayers and assigned an industry classification (if possible) by the Social Security Administration (SSA).
The EIN births are sampled on a quarterly basis using a two-phase selection procedure. To be eligible for selection, a birth must either have no industry classification or be classified in an industry within the scope of SAS, the Annual Wholesale Trade Survey (AWTS), or the Annual Retail Trade Survey (ARTS), and it must meet certain criteria regarding its number of paid employees or quarterly payroll. In the first phase, births are stratified by broad industry groups and a measure of size based on quarterly payroll. A relatively large sample is selected using equal probability systematic sampling. The selected births are canvassed to obtain a more reliable measure of size, consisting of sales in 2 recent months, company affiliation information, and a new or more detailed industry classification code. Births that haven’t returned their questionnaire after 30 days are contacted by telephone.
Using this more reliable information, the selected births from the first phase are subjected to probability proportional-to-size sampling with overall probabilities equivalent to those used in drawing the initial SAS, AWTS, and ARTS samples from the December 2004 Business Register. Because of the time it takes for a new employer firm to acquire an EIN from the IRS, and because of the time needed to accomplish the two-phase birth-selection procedure, births are added to the sample approximately 9 months after they begin operation.
The births that are selected in the quarterly birth-selection procedure in November of the survey year are included in the initial mailing of the SAS questionnaires in January of the following year. To better represent all EIN births in the reference year, and specifically to account for the time it takes to identify and select new EINs, we add births to the SAS sample that are selected in February, May, and August the year following the reference year. We mail survey forms to these births in June and August to supplement the initial survey mailing.
To be eligible for the sample canvass and tabulation, an EIN selected in the noncertainty sampling operations must meet both of the following requirements:
If a firm was selected with certainty and had more than one establishment at the time of sampling, any new establishments that the firm acquires, even if under new or different EINs, are included in the sample with certainty. However, if a single-unit firm was selected with certainty, only future establishments associated with that firm’s originally-selected EIN are included in the sample with certainty; any new EINs that might later be associated with that firm are subjected to sampling through the quarterly birth-selection procedure.
EINs selected into the sample with certainty are not dropped from canvass and tabulation if they are no longer on the IRS mailing list. Rather, the firm that used the EIN is contacted, and if a successor EIN is found, it is added to the survey. For both inactive and reactivated EINs, data are tabulated for only the portion of the reference year that these EINs reported payroll to the IRS.The current sample was introduced with the 2005 SAS to compute estimates based on the 2002 North American Industry Classification System (NAICS). This sample replaced one that was designed to produce estimates based on the 1997 NAICS. For more information on the NAICS industries covered by the 2006 SAS, see the Coverage section.
Totals estimated from this sample survey are computed as the sum of weighted data (reported and imputed) for all selected sampling units that meet the tabulation criteria given in the Sample Maintenance section. The weight for a given sampling unit is the reciprocal of its probability of selection into the sample. The sample-based estimated totals are then adjusted to the 2002 Economic Census using the procedure described below.
For industries affected by the change from 1997 to 2002 NAICS, published census-adjusted revenue estimates for 1998 through 2004 from the prior sample are restated on a 2002 NAICS basis, using revenue distributions from the 2002 Economic Census that link the two sets of classification codes. Of particular note, the estimates for Sector 51 (Information) are revised due to the creation of new industries for Internet publishing and broadcasting and Web search portals. For industries not affected by the change from 1997 to 2002 NAICS, there is no need to restate the published census-adjusted revenue estimates from the prior sample.
Revenue estimates for 2005 and subsequent years from the current sample are adjusted to the 2002 Economic Census by linking these estimates to the published census-adjusted estimates from the prior sample, after historical corrections are made to data from the current sample for 2004 and 2005. The linking is performed by multiplying the sample-based revenue estimate for a given detailed industry, which is generally defined by a 6-digit NAICS code, by a ratio. The numerator and denominator of the ratio are as follows:
Total estimates at 2-, 3-, 4-, and 5-digit NAICS levels are computed by summing the adjusted totals for the appropriate detailed industries comprising the aggregate. Year-to-year change estimates are computed using the appropriate adjusted totals for the industry and time period.
Note that estimates for the following Truck Transportation (NAICS 484) data items are produced directly from the sample, without adjustment:The particular sample used in this survey is one of a large number of samples of the same size that could have been selected using the same design. If all possible samples had been surveyed under the same conditions, an estimate of a population parameter of interest could have been obtained from each sample. For the parameter of interest, estimates derived from the different samples would, in general, differ from each other. Common measures of the variability among these estimates are the sampling variance, the standard error, and the coefficient of variation (CV). The sampling variance is defined as the squared difference, averaged over all possible samples of the same size and design, between the estimator and its average value. The standard error is the square root of the sampling variance. The CV expresses the standard error as a percentage of the estimate to which it refers. For example, an estimate of 200 units that has an estimated standard error of 10 units has an estimated CV of 5 percent. The sampling variance, standard error, and CV of an estimate can be estimated from the selected sample because the sample was selected using probability sampling. Note that measures of sampling variability, such as the standard error and CV, are estimated from the sample and are also subject to sampling variability. (Technically, we should refer to the estimated standard error or the estimated CV of an estimator. However, for the sake of brevity we have omitted this detail.) It is important to note that the standard error and CV only measure sampling variability. They do not measure any systematic biases in the estimates.
We estimate variances for all types of published statistics (totals, ratios, and percent changes) using the method of random groups. To implement the random group method of variance estimation, we assign a random group number to each sampling unit at the time of sample selection. Then, for each tabulation level at which estimates are produced, we compute variance estimates using the assigned random group numbers. We use 16 random groups (G=16) to estimate variances for the Service Annual Survey. For more information on the random group method of variance estimation, click here.
The Census Bureau recommends that individuals using published estimates incorporate this information into their analyses, as sampling error could affect the conclusions drawn from these estimates.
The estimate from a particular sample and its associated standard error can be used to construct a confidence interval. A confidence interval is a range about a given estimator that has a specified probability of containing the average of the estimates for the parameter derived from all possible samples of the same size and design. Associated with each interval is a percentage of confidence, which is interpreted as follows. If, for each possible sample, an estimate of a population parameter and its approximate standard error were obtained and using a t-statistic with 15 (=G-1) degrees of freedom, then:
A potential source of bias in the estimates is nonresponse. Nonresponse is defined as the inability to obtain all the intended measurements or responses about all selected units. Two types of nonresponse are often distinguished. Unit nonresponse is used to describe the inability to obtain any of the substantive measurements about a sampled unit. In most cases of unit nonresponse, the questionnaire was never returned to the Census Bureau after several attempts to elicit a response. Item nonresponse occurs either when a question is unanswered or the response to the question fails computer or analyst edits.
For both unit and item nonresponse, a missing value is replaced by a predicted value obtained from an appropriate model for nonresponse. This procedure is called imputation and uses survey data and administrative data as input. Imputation rates for total revenue for employer firms at the published sector and sub-sector levels are as follows:
48-49 Transportation and Warehousing
11.4
51 Information
8.5
523 Securities, Commodity Contracts, and
Other Financial Investments and Related Services
6.4
532 Rental and Leasing Services
11.2
54 Professional, Scientific, and Technical Services
13.4
56 Administrative and Support and Waste Management
and Remediation Services
14.2
62 Health Care and Social Assistance
12.3
71 Arts, Entertainment, and Recreation
13.6
81 Other Services (except Public Administration)
10.1