U.S. Department of Commerce

Statistics of U.S. Businesses

Skip top of page navigation
You are here: Census.govBusiness & IndustryEconomy–Wide StatisticsStatistics of U.S. Businesses › How the Data are Collected

Methodology - How the Data are Collected

Universe for Statistics of U.S. Businesses

The Business Register is the Census Bureau’s source of information on employer establishments included in the Statistics of U.S. Businesses (SUSB) program. The Business Register is a multi-relational database that contains a record for each known establishment that is located in the United States or Puerto Rico and has employees. An establishment is a single physical location where business transactions take place and for which payroll and employment records are kept. Groups of one or more establishments under common ownership or control are enterprises. A single-unit enterprise owns or operates only one establishment. A multi-unit enterprise owns or operates two or more establishments. The treatment of establishments on the Business Register differs according to whether the establishment is part of a single-unit or multi-unit enterprise.

A single-unit enterprise’s primary identifier is its Employer Identification Number (EIN). The Internal Revenue Service (IRS) issues the EIN and the enterprise uses it as an identifier to report its payroll taxes. All employer enterprises are required to have at least one EIN and only one enterprise can use a given EIN. Because a single-unit enterprise has only one establishment, there is a one-to-one relationship between the enterprise and the EIN. Thus the enterprise, the EIN, and the establishment all reference the same physical location and all three terms can be used interchangeably and unambiguously when referring to a single-unit enterprise.

Descriptive information for a single-unit establishment in the SUSB universe, including geographic location, industry classification, payroll and employment, come from a variety of administrative record and survey sources. Administrative records filed by EIN are the most common source of this information for single-unit establishments, with updates on geographic location and industry classification coming from Census Bureau surveys when available.

For multi-unit enterprises however, a different structure connects the enterprise with its establishments via the EIN. Essentially a multiunit enterprise is associated with a cluster of one or more EINs and EINs are associated with one or more establishments. A multiunit enterprise consists of at least two establishments. Each enterprise is associated with at least one EIN and only one enterprise can use a given EIN. However, one multiunit enterprise may have several EINs. Similarly, there is a one-to-many relationship between EINs and establishments. Each EIN can be associated with many establishments, but each establishment is associated with only one EIN. Because of the possibility of one-to-many relationships, we must distinguish between the enterprise, its EINs, and its establishments. A unique employer unit identification number identifies each establishment owned by a multi-unit enterprise on the Business Register.

Because EIN and establishment are not equivalent for multi-unit enterprises, there is less dependency on administrative record sources for multi-unit establishment information. The Census Bureau’s Economic Census (conducted every five years ending in ‘2’ and ‘7’) initially identifies multi-unit companies when a company expands to more than one establishment. Establishments for a multi-unit company are identified through the Economic Census and the annual Company Organization Survey (COS). Geographic location, industry classification, payroll and employment come primarily from the Economic Census and the COS. EIN-level administrative payroll and employment data are apportioned to the establishment level in cases of nonresponse or for smaller enterprises not selected for the COS.

Businesses operating without an EIN, and businesses with an EIN but without employees, are excluded from the SUSB universe.

A certain amount of undercoverage occurs in the universe, primarily with establishments for multi-unit companies. The Census Bureau does not create a multi-unit company structure in the Business Register for very small employers (less than 10 employees) identified in the Economic Census. In addition, the COS is an annual mail survey that includes all multi-unit companies with 250 or more employees. Companies with less than 250 employees are only selected for the COS when administrative record sources indicate the company may be undergoing organizational change and are adding or dropping establishments. Establishments for smaller companies may be missed, as well as establishments for companies not responding to the Economic Census or the COS. The Census Bureau takes much effort to get establishment information for large companies because of their importance to the economy. The Census Bureau does not have any estimates of establishment undercoverage. Coverage of payroll and employment is very good because of the usage of administrative record data.

Industry Classification

Industry classification of business establishments in SUSB is according to the 2007 North American Industry Classification System (NAICS), which includes nearly 1,200 industries. For more information on the 2007 NAICS codes, as well as comparisons between the 2007 and 2002 codes, go to http://www.census.gov/naics/.

The primary source of industry classification is derived from data collected through the Economic Census or through other Census surveys. When this is not available, the Census Bureau uses a hierarchy of administrative record sources to assign a code, including classifications from the Bureau of Labor Statistics, business birth information, and self-assigned codes from income tax records.

For a small percentage of records, only a partial classification is possible from all sources. For these cases, a complete industry classification is assigned, or imputed, by using a distribution of complete six-digit codes and a randomly assigned number to select a code and preserve the overall distribution of establishments by NAICS. Analysts review the assignments to ensure that anomalies do not occur at the county level. For some multi-unit establishments with a partial classification, a complete code is imputed from another establishment within the same company. The imputation rate for complete codes varies widely during the five-year Economic Census processing cycle, but generally affects small businesses. Completely unclassified records are an even smaller percentage and are tabulated and published separately.

Geographic Classification

SUSB classifies an establishment by its physical location. Under the usual definition, an establishment or business is a fixed physical location or permanent structure where some form of business activity is conducted. The Economic Census and the COS requests the physical location of each establishment in a enterprise. In addition, administrative record sources provide physical location addresses. In some cases, the physical location is not available, and the geographic assignment is based on the mailing address. When a business relocates, there may be a significant delay until the Census Bureau receives the updated physical location address, particularly for small businesses.

Data Withheld from Publication

In accordance with U.S. Code, Title 13, Section 9, no data are published that would disclose the operations of an individual employer.

Noise Infusion

Starting with 2007data, the Statistics of U.S. Businesses has adopted the Noise Infusion method of data protection. Noise infusion is a method of disclosure avoidance in which values for each establishment are perturbed prior to table creation by applying a random noise multiplier to the magnitude data (i.e., characteristics such as first-quarter payroll, annual payroll, and number of employees) for each company. Disclosure protection is accomplished in a manner that results in a relatively small change in the vast majority of cell values. For Statistics of U.S. Businesses, each published cell value has an associated noise flag, indicating the relative amount of distortion in the cell value resulting from the perturbation of the data for the contributors to the cell. The flag for ‘low noise’ (G) indicates the cell value was changed by less than 2 percent with the application of noise, and the flag for ‘moderate noise’ (H) indicates the value was changed by 2 percent or more but less than 5 percent. Cells that have been changed by 5 percent or more are suppressed from the published tables. Additionally, other cells in the table may be suppressed for additional protection from disclosure or because the quality of the data does not meet publication standards. Though some of these suppressed cells may be derived by subtraction, the results are not official and may differ substantially from the true estimate.

The number of establishments in a particular tabulation cell is not considered a disclosure; therefore, this information may be released without the addition of protective noise. For an introduction to the noise confidentiality protection method, see Using Noise for Disclosure Limitation of Establishment Tabular Data [PDF] by Timothy Evans, Laura Zayatz, and John Slanta in the Journal of Official Statistics (1998).

Reliability of Data

Payroll and employment data are tabulated from administrative records for single-unit enterprises and a combination of administrative records and survey-collected data for multi-unit enterprises. They are not subject to sampling error, but are subject to nonsampling errors, which can be attributed to several sources: inability to identify all cases that should be in the universe; definition and classification difficulties; errors in recording or coding the data obtained; and other errors of coverage, processing, and estimation for missing or misreported data.

The accuracy of these tabulated data is determined by the joint effects of the various nonsampling errors. No direct measurement of these effects has been obtained except for estimation for missing or misreported industry classifications; however, precautionary steps were taken in all phases of the processing to minimize the effects of nonsampling errors.

Employment data are missing from approximately 15 percent of incoming administrative payroll records. For these records, employment is imputed using average wage data for the prior year for the EIN, if available. If it’s not available, an employment figure is imputed based on the average wage for the industry and geographic area. Quarterly payroll is edited by comparing with reported data from other quarters over a two-year period to determine any anomalies and potential misreporting. Suspected missing payroll and extreme values are imputed based on company reporting patterns. The Census Bureau imputes payroll for less than one percent of all incoming administrative payroll records.

Establishment payroll and employment for multi-unit companies is collected through the Economic Census and the COS. Data for companies not included in the COS or not responding to the survey are imputed from administrative record data by taking company level administrative payroll and employment and breaking it down to the establishment level by best estimates of the size of each establishment in the company. If some establishments have reported payroll and some do not, the breakdown is performed with the difference between the administrative data at the company level and the total reported amounts.

Definitions

SUSB definitions available at http://www.census.gov/econ/susb/definitions.html.





[Excel] or the letters [xls] indicate a document is in the Microsoft® Excel® Spreadsheet Format (XLS). To view the file, you will need the Microsoft® Excel® Viewer Off Site available for free from Microsoft®. This symbol Off Site indicates a link to a non-government web site. Our linking to these sites does not constitute an endorsement of any products, services or the information found on them. Once you link to another site you are subject to the policies of the new site.
[PDF] or PDF denotes a file in Adobe’s Portable Document Format. To view the file, you will need the Adobe® Reader® Off Site available free from Adobe.

Source: U.S. Census Bureau | Statistics of U.S. Businesses | (301) 763-3321 or Email |  Last Revised: May 29, 2012