2002 Economic Census: |
Appendix C. Methodology | Return to introductory text |
||||
Sources | Industry classifications | Sampling and estimation | Reliability of estimates | Comparability of 2002 and 1992 |
The 2002 Survey of Business Owners (SBO) was conducted by mail. One of two census forms was mailed to a random sample of businesses selected from a list of all firms operating during 2002 with receipts of $1,000 or more, except those classified in the following NAICS industries:
The lists of all firms (or universe) are compiled from a combination of business tax returns and data collected on other economic census reports. The Census Bureau obtains electronic files from the Internal Revenue Service (IRS) for all companies filing IRS Form 1040, Schedule C (individual proprietorship or self-employed person); 1065 (partnership); any one of the 1120 corporation tax forms; and 941 (Employer's Quarterly Federal Tax Return). The IRS provides certain identification, classification, and measurement data for businesses filing those forms.
For most firms with paid employees, the Census Bureau also collected employment, payroll, receipts, and kind of business for each plant, store, or physical location during the 2002 Economic Census.
The report forms used to collect information are available at www.census.gov/csd/sbo/index.html.
The SBO is conducted on a company or firm basis rather than an establishment basis. A company or firm is a business consisting of one or more domestic establishments that the reporting firm specified under its ownership or control at the end of 2002. Firms were instructed to return their completed report form by mail. Two report form remails were conducted at one-month intervals to all delinquent respondents. A telephone follow-up was conducted to obtain a subset of information from selected firms that failed to return their report form. The returned forms underwent extensive review and computer processing. All reports were geographically coded, data-keyed, and edited. The editing process identified records with significant problems and firms were contacted for correction resolution. Corrections were performed interactively using standard procedures.
The data were then tabulated by NAICS, subjected to further data analysis, and the resulting corrections applied to individual computer records. Corrected tabulations were then produced for the final published reports.
A more detailed examination of census methodology is presented in the History of the 2002 Economic Census at www.census.gov/econ/www/history.html.
The classification for all establishments are based on the North American Industry Classification System, United States, 2002, manual. The kind-of-business or industry classification codes for the SBO are obtained from the 2002 Economic Census. More information on the industry classification codes is included in the Industry Classifications and Relationship to Historical Industry Classifications sections in the introductory text.
Sampling. To design the 2002 SBO sample, the Census Bureau used the following sources of information to estimate the probability that a business was minority- or women-owned:
These probabilities were then used to place each firm in the SBO universe in one of nine frames for sampling:
The SBO universe was stratified by state, industry, frame, and whether the company had paid employees in 2002. The Census Bureau selected large companies, including those operating in more than one state, with certainty. These companies were selected based on volume of sales, payroll, or number of paid employees. All certainty cases were sure to be selected and represented only themselves (i.e., had a selection probability of one and a sampling weight of one). The certainty cutoffs varied by sampling stratum, and each stratum was sampled at varying rates, depending on the number of firms in a particular industry in a particular state. The remaining universe was subjected to stratified systematic random sampling.
A firm selected into the sample was mailed one of two questionnaires. The Census Bureau sent the SBO-1 questionnaire to partnerships and corporations. The businesses were asked to report the percentage of ownership, gender, Hispanic or Latino origin, race, and several characteristic questions (e.g., age, education level) for each of the three largest percentage owners. The SBO-2 questionnaire was used for sole proprietors and self-employed individuals. The businesses were asked essentially the same information as asked on the SBO-1, but limited to two owners.
Treatment of Nonresponse. Approximately 81 percent of the 2.3 million businesses in the SBO sample responded to the survey. Data from the 1997 survey were used for businesses in both the 1997 and 2002 samples. For the remaining nonrespondents, gender, Hispanic or Latino origin, and race were imputed from donor respondents with similar characteristics (state, industry, employment status, size, and sampling frame).
Tabulation. Business ownership is defined as having 51 percent or more of the stock or equity in the business and is categorized by:
Firms equally male-/female-owned were counted and tabulated as a separate category.
Businesses could be tabulated in more than one racial group. This can result because:
The detail may not add to the total or subgroup total because a Hispanic or Latino firm may be of any race, and because a firm could be tabulated in more than one racial group. For example, if a firm responded as both Chinese and Black majority owned, the firm would be included in the detailed Asian and Black estimates, but would only be counted once toward the higher level all firms' estimates.
The sum of the detailed Hispanic or Latino origin may not add to the total because no one Hispanic subgroup (i.e., Mexican, Puerto Rican, Cuban, or Other Spanish/Hispanic/Latino) owned a majority of the firm, but a combination of these subgroups did own a majority. For example, if a firm had two owners each with equal ownership, one responding Puerto Rican and the other responding Cuban, there is no one subgroup with a majority ownership, but the firm is Hispanic-owned. This firm would be tabulated in the Hispanic or Latino estimate, but would not appear in any of the subgroup estimates.
Also, the subgroup detail for both Asians and Native Hawaiians and Other Pacific Islanders may not add to the total for similar reasons as explained above.
In the Characteristics of Businesses and the Characteristics of Business Owners reports, the tabulations of demographic and economic business and owner characteristics included only those firms that returned the survey form and provided the gender, Hispanic or Latino origin, and race for the owner(s) or indicated the firm was publicly held. These tabulations also included the owners who identified with more than one race. For example, an Asian Hispanic male veteran owner would have his information tabulated in each of those four categories. However, such a record was counted only once in the "All owners of respondent firms" line of the publication.
For the tabulations by gender, Hispanic or Latino origin, and race, the data for each firm in the SBO sample were weighted by the reciprocal of the firm’s probability of selection. The data for each owner are inflated using the sampling weight assigned to the owner's corresponding firm record.
The figures shown in this report are, in part, estimated from a sample and will differ from the figures which would have been obtained from a complete census. Two types of possible errors are associated with estimates based on data from sample surveys: sampling errors and nonsampling errors. The accuracy of a survey result depends not only on the sampling errors and nonsampling errors measured, but also on the nonsampling errors not explicitly measured. For particular estimates, the total error may considerably exceed the measured errors. The following is a description of the sampling and nonsampling errors associated with this tabulation.
Sampling variablility. The particular sample used for this survey is one of a large number of all possible samples of the same size that could have been selected using the same sample design. Estimates derived from the different samples would differ from each other. The relative standard error is a measure of the variability among the estimates from all possible samples. The estimated relative standard errors presented in the tables estimate the sampling variability, and thus measure the precision with which an estimate from the particular sample selected for this survey approximates the average result of all possible samples. Relative standard errors are applicable only to those published cells in which sample cases are tabulated. A relative standard error is an expression of the standard error as a percent of the quantity being estimated.
The sample estimate and an estimate of its relative standard error can be used to estimate the standard error and then construct interval estimates with a prescribed level of confidence that the interval includes the average results of all samples. To illustrate, if all possible samples were surveyed under essentially the same condition, and estimates calculated from each sample, then:
Thus, for a particular sample, one can say with specified confidence that the average of all possible samples is included in the constructed interval.
Example of a confidence interval. Suppose the estimate is 51,707 and the estimated relative standard error is 2 percent. The standard error is then 2 percent of 51,707 or 1,034. An approximate 90-percent confidence interval is found by first multiplying the standard error by 1.6 and then adding and subtracting that result from the estimate to obtain the upper and lower bounds. Since 1.6 x 1,034 = 1,654, the confidence interval in this example is 51,707 - 1,654 to 51,707 + 1,654, or the range 50,053 to 53,361.
For the Characteristics of Businesses and Characteristics of Business Owners reports, much of the data is expressed as percentages with standard errors rather than relative standard errors as indicated above. This saves a step in the construction of the confidence interval as illustrated by the following example.
Example of a confidence interval for percentage data. Suppose the estimate is 76.9 percent and the estimated standard error is 0.4 percent. An approximate 90-percent confidence interval is found by first multiplying the standard error by 1.6 and then adding and subtracting that result from the estimate to obtain the upper and lower bounds. Since 1.6 x 0.4 = 0.64, the confidence interval in this example is 76.9 + or - 0.64 or the range 76.26 to 77.54.
Nonsampling errors. All surveys and censuses are subject to nonsampling errors. Nonsampling errors are attributable to many sources, including the inability to obtain information for all cases in the universe, imputation for missing data, data errors and biases, mistakes in recording or keying data, errors in collection or processing, and coverage problems.
Explicit measures of the effects of these nonsampling errors are not available. However, it is believed that most of the important operational and data errors were detected and corrected through an automated data edit designed to review the data for reasonableness and consistency. Quality control techniques were used to verify that operating procedures were carried out as specified.
Particular care should be taken in comparing estimates from 2002 to 1992 due to the following changes in survey methodology in 2002 which affect comparability: