The US Census Bureau

2002 Economic Census main page

2002 Economic Census:
Characteristics of Business Owners (CBO)
Characteristics of Businesses (CB)
Methodology



Appendix C. Methodology Return to
introductory text
Sources Industry classifications Sampling and estimation Reliability of estimates Comparability of 2002 and 1992

SOURCES OF THE DATA

The 2002 Survey of Business Owners (SBO) was conducted by mail. One of two census forms was mailed to a random sample of businesses selected from a list of all firms operating during 2002 with receipts of $1,000 or more, except those classified in the following NAICS industries:

The lists of all firms (or universe) are compiled from a combination of business tax returns and data collected on other economic census reports. The Census Bureau obtains electronic files from the Internal Revenue Service (IRS) for all companies filing IRS Form 1040, Schedule C (individual proprietorship or self-employed person); 1065 (partnership); any one of the 1120 corporation tax forms; and 941 (Employer's Quarterly Federal Tax Return). The IRS provides certain identification, classification, and measurement data for businesses filing those forms.

For most firms with paid employees, the Census Bureau also collected employment, payroll, receipts, and kind of business for each plant, store, or physical location during the 2002 Economic Census.

The report forms used to collect information are available at www.census.gov/csd/sbo/index.html.

The SBO is conducted on a company or firm basis rather than an establishment basis. A company or firm is a business consisting of one or more domestic establishments that the reporting firm specified under its ownership or control at the end of 2002. Firms were instructed to return their completed report form by mail. Two report form remails were conducted at one-month intervals to all delinquent respondents. A telephone follow-up was conducted to obtain a subset of information from selected firms that failed to return their report form. The returned forms underwent extensive review and computer processing. All reports were geographically coded, data-keyed, and edited. The editing process identified records with significant problems and firms were contacted for correction resolution. Corrections were performed interactively using standard procedures.

The data were then tabulated by NAICS, subjected to further data analysis, and the resulting corrections applied to individual computer records. Corrected tabulations were then produced for the final published reports.

A more detailed examination of census methodology is presented in the History of the 2002 Economic Census at www.census.gov/econ/www/history.html.

INDUSTRY CLASSIFICATION OF FIRMS

The classification for all establishments are based on the North American Industry Classification System, United States, 2002, manual. The kind-of-business or industry classification codes for the SBO are obtained from the 2002 Economic Census. More information on the industry classification codes is included in the Industry Classifications and Relationship to Historical Industry Classifications sections in the introductory text.

SAMPLING AND ESTIMATION METHODOLOGIES

Sampling. To design the 2002 SBO sample, the Census Bureau used the following sources of information to estimate the probability that a business was minority- or women-owned:

These probabilities were then used to place each firm in the SBO universe in one of nine frames for sampling:

The SBO universe was stratified by state, industry, frame, and whether the company had paid employees in 2002. The Census Bureau selected large companies, including those operating in more than one state, with certainty. These companies were selected based on volume of sales, payroll, or number of paid employees. All certainty cases were sure to be selected and represented only themselves (i.e., had a selection probability of one and a sampling weight of one). The certainty cutoffs varied by sampling stratum, and each stratum was sampled at varying rates, depending on the number of firms in a particular industry in a particular state. The remaining universe was subjected to stratified systematic random sampling.

A firm selected into the sample was mailed one of two questionnaires. The Census Bureau sent the SBO-1 questionnaire to partnerships and corporations. The businesses were asked to report the percentage of ownership, gender, Hispanic or Latino origin, race, and several characteristic questions (e.g., age, education level) for each of the three largest percentage owners. The SBO-2 questionnaire was used for sole proprietors and self-employed individuals. The businesses were asked essentially the same information as asked on the SBO-1, but limited to two owners.

Treatment of Nonresponse. Approximately 81 percent of the 2.3 million businesses in the SBO sample responded to the survey. Data from the 1997 survey were used for businesses in both the 1997 and 2002 samples. For the remaining nonrespondents, gender, Hispanic or Latino origin, and race were imputed from donor respondents with similar characteristics (state, industry, employment status, size, and sampling frame).

Tabulation. Business ownership is defined as having 51 percent or more of the stock or equity in the business and is categorized by:

Firms equally male-/female-owned were counted and tabulated as a separate category.

Businesses could be tabulated in more than one racial group. This can result because:

  1. the sole owner reported more than one race;
  2. the majority owner reported more than one race;
  3. a majority combination of owners reported more than one race.

The detail may not add to the total or subgroup total because a Hispanic or Latino firm may be of any race, and because a firm could be tabulated in more than one racial group. For example, if a firm responded as both Chinese and Black majority owned, the firm would be included in the detailed Asian and Black estimates, but would only be counted once toward the higher level all firms' estimates.

The sum of the detailed Hispanic or Latino origin may not add to the total because no one Hispanic subgroup (i.e., Mexican, Puerto Rican, Cuban, or Other Spanish/Hispanic/Latino) owned a majority of the firm, but a combination of these subgroups did own a majority. For example, if a firm had two owners each with equal ownership, one responding Puerto Rican and the other responding Cuban, there is no one subgroup with a majority ownership, but the firm is Hispanic-owned. This firm would be tabulated in the Hispanic or Latino estimate, but would not appear in any of the subgroup estimates.

Also, the subgroup detail for both Asians and Native Hawaiians and Other Pacific Islanders may not add to the total for similar reasons as explained above.

In the Characteristics of Businesses and the Characteristics of Business Owners reports, the tabulations of demographic and economic business and owner characteristics included only those firms that returned the survey form and provided the gender, Hispanic or Latino origin, and race for the owner(s) or indicated the firm was publicly held. These tabulations also included the owners who identified with more than one race. For example, an Asian Hispanic male veteran owner would have his information tabulated in each of those four categories. However, such a record was counted only once in the "All owners of respondent firms" line of the publication.

For the tabulations by gender, Hispanic or Latino origin, and race, the data for each firm in the SBO sample were weighted by the reciprocal of the firm’s probability of selection. The data for each owner are inflated using the sampling weight assigned to the owner's corresponding firm record.

RELIABILITY OF ESTIMATES

The figures shown in this report are, in part, estimated from a sample and will differ from the figures which would have been obtained from a complete census. Two types of possible errors are associated with estimates based on data from sample surveys: sampling errors and nonsampling errors. The accuracy of a survey result depends not only on the sampling errors and nonsampling errors measured, but also on the nonsampling errors not explicitly measured. For particular estimates, the total error may considerably exceed the measured errors. The following is a description of the sampling and nonsampling errors associated with this tabulation.

Sampling variablility. The particular sample used for this survey is one of a large number of all possible samples of the same size that could have been selected using the same sample design. Estimates derived from the different samples would differ from each other. The relative standard error is a measure of the variability among the estimates from all possible samples. The estimated relative standard errors presented in the tables estimate the sampling variability, and thus measure the precision with which an estimate from the particular sample selected for this survey approximates the average result of all possible samples. Relative standard errors are applicable only to those published cells in which sample cases are tabulated. A relative standard error is an expression of the standard error as a percent of the quantity being estimated.

The sample estimate and an estimate of its relative standard error can be used to estimate the standard error and then construct interval estimates with a prescribed level of confidence that the interval includes the average results of all samples. To illustrate, if all possible samples were surveyed under essentially the same condition, and estimates calculated from each sample, then:

  1. Approximately 68 percent of the intervals from one standard error below the estimate to one standard error above the estimate would include the average value of all possible samples.
  2. Approximately 90 percent of the intervals from 1.6 standard errors below the estimate to 1.6 standard errors above the estimate would include the average value of all possible samples.

Thus, for a particular sample, one can say with specified confidence that the average of all possible samples is included in the constructed interval.

Example of a confidence interval. Suppose the estimate is 51,707 and the estimated relative standard error is 2 percent. The standard error is then 2 percent of 51,707 or 1,034. An approximate 90-percent confidence interval is found by first multiplying the standard error by 1.6 and then adding and subtracting that result from the estimate to obtain the upper and lower bounds. Since 1.6 x 1,034 = 1,654, the confidence interval in this example is 51,707 - 1,654 to 51,707 + 1,654, or the range 50,053 to 53,361.

For the Characteristics of Businesses and Characteristics of Business Owners reports, much of the data is expressed as percentages with standard errors rather than relative standard errors as indicated above. This saves a step in the construction of the confidence interval as illustrated by the following example.

Example of a confidence interval for percentage data. Suppose the estimate is 76.9 percent and the estimated standard error is 0.4 percent. An approximate 90-percent confidence interval is found by first multiplying the standard error by 1.6 and then adding and subtracting that result from the estimate to obtain the upper and lower bounds. Since 1.6 x 0.4 = 0.64, the confidence interval in this example is 76.9 + or - 0.64 or the range 76.26 to 77.54.

Nonsampling errors. All surveys and censuses are subject to nonsampling errors. Nonsampling errors are attributable to many sources, including the inability to obtain information for all cases in the universe, imputation for missing data, data errors and biases, mistakes in recording or keying data, errors in collection or processing, and coverage problems.

Explicit measures of the effects of these nonsampling errors are not available. However, it is believed that most of the important operational and data errors were detected and corrected through an automated data edit designed to review the data for reasonableness and consistency. Quality control techniques were used to verify that operating procedures were carried out as specified.

COMPARABILITY OF 2002 CB/CBO AND 1992 CBO DATA

Particular care should be taken in comparing estimates from 2002 to 1992 due to the following changes in survey methodology in 2002 which affect comparability:

  1. In 2002, the SBO sample of 2.3 million businesses included all corporations in addition to partnerships and sole proprietorships. Businesses were asked to report the percentage of interest and the gender, Hispanic or Latino origin, and race of up to three individuals with the largest share of ownership; additional owners were not surveyed regarding characteristics. Selected economic and demographic characteristics were also asked of the businesses and business owners. Wording of questions and/or response categories may be new or different from those used in the past. (See 2002 forms SBO-1 and SBO-2 and 1992 forms.) The CB data are presented by business ownership determined by the gender, Hispanic or Latino origin, and race of the person(s) owning majority interest in the business. The CBO data are presented for all interest owners, as well as majority, equal, and nonmajority interest owners. Each owner is classified by their gender, Hispanic or Latino origin, and race. Each owner could self-identify with more than one racial group; therefore it was possible for a business and its owner(s) to be classified and tabulated in more than one racial group.

    In 1992, a sample of 78,000 businesses was selected from the Surveys of Minority- and Women-Owned Business Enterprises (SMOBE/SWOBE) sample of 1.2 million businesses. Only subchapter S corporations in addition to partnerships and sole proprietorships were included in the SMOBE/SWOBE. Businesses were asked to report the gender, Hispanic or Latino origin, and race of the majority of the owners, as well as the number of owners. Business ownership was determined based on the majority of the number of owners, without regard to percentage of interest owned. Based on the number of owners provided, a CBO questionnaire was then mailed to as many as 10 owners of a business to collect selected economic and demographic characteristics to yield a sample size of approximately 116,000 owners. The subsequent CBO respondents, both the firms and the owner(s), were then considered as belonging to the same gender, Hispanic or Latino origin, and race as that of the business determined from the SMOBE/SWOBE.

  2. In 2002, all estimates were based on firms that responded to the 2002 SBO. A respondent firm is defined as a business that returned the survey form and provided the gender, Hispanic or Latino origin, or race for the owner(s) or indicated that the firm was publicly held. The data for owners of respondent firms exclude businesses which were publicly held. In 1992, responding firms were reweighted to compensate for those businesses which had not returned the survey form.
  3. In 2002, separate estimates for American Indian- and Alaska Native-, Asian-, Native Hawaiian- and Other Pacific Islander-owned firms are provided. However, in 1992, estimates for these businesses were published under the category of “Other minority-owned businesses.”