BTS | Reliability of the 1993 Census Estimates

Bureau of Transportation Statistics (BTS) - Research and Innovative Technology Administration (RITA) - United States Department of Transportation (USDOT, US DOT or DOT)

ABOUT RITA | CONTACT US | PRESS ROOM | CAREERS | SITE MAP

Data and Statistics

Bookstore

Programs

About BTS

Upcoming Press Releases

External Links

BTS Home > Programs > Commodity Flow Survey > Methods and Limitations

Reliability of the 1993 Census Estimates

Reliability of the Estimates

An estimate based on a sample survey potentially contains two types of errors— sampling and nonsampling. Sampling errors occur because the estimate is based on a sample, not on the entire universe. Nonsampling errors can be attributed to many sources in the collection and processing of the data. The accuracy of a survey result is affected jointly by the two types of errors. The following is a description of the sampling and nonsampling errors associated with the estimates computed from the 1993 Commodity Flow Survey (CFS).

Measures of Sampling Variability

Because the estimates were based on a sample, exact agreement with the results that would be obtained from a complete census of establishments in the CFS frame using the same enumeration procedure was not expected. How-ever, because each establishment in the Standard Statistical Establishment List (SSEL) in the specified Standard Industrial Classifications (SIC) had a known probability of being selected into the sample, it is possible to estimate the sampling variability of the estimates.

The standard error of the estimate is a measure of the variability among the values of the estimate computed from all possible samples of the same size and design. Thus, it is a measure of the precision with which an estimate from a particular sample approximates the results of a complete enumeration. The coefficient of variation is the standard error of the estimate divided by the value being estimated. It is expressed as a percent. Note that measures of sampling variability, such as the standard error or coefficient of variation, are estimated from the sample and are also subject to sampling variability. Coefficients of variation for number of shipments, dollar value, shipment weight (tons), and tonmiles estimates are shown in tables B- 1 through B- 7 in this appendix. Standard errors for the corresponding percentage estimates are also shown there.

The standard errors and coefficients of variation presented in these tables permit certain confidence statements about the sample estimates. The particular sample used in this survey was one of a large number of samples of the same size that could have been selected using the same design. In about 9 out of 10 (90 percent) of these samples, the estimates would differ from the results of a complete enumeration by less than 1.65 times the standard error of the estimate. In about 19 out of 20 (95 percent) of the samples, the estimates would differ from the result of a complete enumeration by less than twice the standard error of the estimate.

To illustrate the computations involved in the above confidence statements as related to the dollar value estimates, assume that an estimate of shipment value published in table 6 is $10,750 million for a particular commodity and mode of transportation, and that the coefficient of variation for this estimate, as given in appendix A, table B- 6 is 1. 8 percent, or 0. 018. Multiplying $10,750 million by 0.018 yields the standard error, $194 million. Typical practice is to construct a 90- or 95- percent confidence interval. Multiplying $194 million by 1.65 gives $320 million. Therefore, a 90- percent confidence interval is $10,430 million to $11,070 million ($ 10,750 million plus or minus $320 million). If corresponding confidence intervals were constructed for all possible samples of the same size and design, approximately 9 out of 10 (90 percent) of the intervals would contain the figure obtained from a complete enumeration. Similarly, a 95- percent confidence interval is $10,362 million to $11,138 million ($ 10,750 million plus or minus $388 million).

To illustrate the computations involved related to the percentage estimates, assume that the percentage estimate of shipment value published in table 6 is 25 percent for a particular commodity and mode of transportation, and that the standard error of this estimate, as given in appendix A, table B- 6 is 2. 2 percent, or 0. 022. Multiplying 2.2 percent by 1.65 gives 3. 6 percent. So a 90- percent confidence interval is 21.4 percent to 28.6 percent (25 percent plus or minus 3. 6 percent.) If corresponding confidence intervals were constructed for all possible samples of the same size and design, approximately 9 out of 10 (90 percent) of the intervals would contain the figure obtained from a complete enumeration.

Nonsampling Errors

As calculated for this report, the standard error and coefficient of variation measures sampling errors but does not measure any systematic biases in the data. Bias is the difference, averaged over all possible samples of the same size and design, between the estimate and the true value being estimated.

In the CFS as in other surveys nonsampling errors can be attributed to many sources: (1) inability to obtain information about all cases in the sample, (2) response errors, (3) definitional difficulties, (4) differences in the interpretation of questions, (5) mistakes in coding or recoding the data obtained, and (6) other errors of collection, response, coverage, and estimation. These nonsampling errors also occur in complete censuses.

Some sources of error are specific to the CFS: (1) Some respondents may have sampled incorrectly when selecting a sample of their documents, (2) some reporters may have used but not reported other units for their measurements— tons instead of pounds, dollars instead of thousands of dollars, etc., (3) on any shipment selected for sample, only the major commodity (by weight) was reported; secondary commodities within shipments were not recorded. Although unlikely, this might lead to a net undercoverage of some secondary commodities. These and other problems could yield a bias of undetermined amount in certain estimates.

Another possible source of bias in estimating the number of shipments, value, shipment weight (tons), and tonmiles is the imputation of missing data and for data which fail edit. Any systematic error in the imputation procedure can introduce bias into the estimates.

Although no direct measurement of the biases due to nonsampling error has been obtained, precautionary steps were taken in all phases of the collection, processing, and tabulation of the data in an effort to minimize their influence.

Biases in the published estimates are due in large part to imputing data for nonrespondents and for data which fail edit. The overall imputation rate for the survey was 30 to 40 percent.