|
Chapter 5
Data Analysis
BTS employs a wide variety of statistical techniques in its work. However, regardless of the techniques used,
there are some steps that should be included in any data analysis. This chapter provides general guidance on
those steps, and then leaves the choice of analytical tools up to the data
analyst performing the work.
This chapter contains standards for planning a data analysis
(Section 5.1), calculating estimates and performing inferences (Section 5.2),
and documenting the data analysis (Section 5.3). For quick-response projects, compliance with
these standards is recommended, but not required.
5.1 Data Analysis Planning
Standard 5.1: Plan before
starting a specific data analysis to ensure that the resulting product
addresses the needs of BTS customers and that the resources are available to
complete the data analysis.
Key Terms: key variable,
target audience
Guideline
5.1.1: Criteria for the Conduct of Data
Analysis
The data analysis should
be relevant, objective, comprehensive, and add value to existing
information. To meet these goals, data analysts
need to:
- Conduct the data analysis
in an objective and policy-neutral manner that focuses on the statistical
and economic facts.
- Maintain awareness of subject
matter issues so that the data analysis can address topics of interest and
importance.
- Consult with subject area
specialists about relevant issues, the strengths and weaknesses of data
sources, and important references to key topic elements.
- If the data analysis is
not comprehensive, indicate what further types of data analysis should be
considered and whether BTS plans to do that work.
Guideline
5.1.2: Data Analysis Plan
Requirement
Prepare a data analysis plan in the proper format (BTS 2004)
prior to the start of the data analysis.
- Include the purpose of the
data analysis, the research question, target audience, data sources
(including a description and any limitations), key variables to be used, and
the data analysis methods. Also
provide target completion dates and an estimate for the amount of
resources needed to complete the product.
- Subject matter experts
should review the plan to ensure that the proposed data analysis will answer
relevant questions. Data analysis
experts should review the plan to ensure that appropriate data and methods
will be used.
- The data analysis plan
must be approved by the designated manager.
Related Information
Bureau of Transportation Statistics (BTS). 2004. BTS Information Product Scoping Paper. Washington, DC.
Approval Date: June 28, 2005
5.2 Statistical Estimation
and Inference
Standard 5.2: Estimates and statistical inferences made
regarding the data must be based on acceptable statistical practice.
Key Terms: accuracy, bias,
bridge estimates, estimates, inference, reliability, robustness, time
series, trend, variance
Guideline 5.2.1: Data Analysis Methods
Analyses must use theory and methods
justifiable by reference to statistical literature (provided below in “Related
Information”) or by mathematical derivation.
- Use
appropriate analysis methods for complex sample, time series, and
geospatial data, or variance estimates may be seriously biased.
- If
extensive seasonality, irregularities, known special causes, or variation
in trends are present in the data, take those into account in the trend
analysis.
- Use
robust methods if in doubt about the quality of the data (i.e., the
quality of the data cleaning) or about the suitability of the data for
analysis by standard parametric methods.
Guideline 5.2.2: Indicating Uncertainty
Statistical statements should be
accompanied by some assessment of the limitations and uncertainty of the
results.
- Estimated
errors due to statistical sampling or modeling indicate the reliability of
the estimate. However, these
estimated errors do not account for bias, which may have a greater effect
on accuracy, and does not decrease as the number of cases increases.
- Analysts
must consider data quality issues related to measurement error and missing
data. The purpose, design, methods,
and quality of processing can all place limitations on the analysis and
interpretation of the data. If
possible, quantify and eliminate biasing effects. Otherwise, discuss the nature and estimated
magnitude of these limitations in the report.
Guideline 5.2.3: Inference and Comparisons
Support statistical statements with proper
testing and inference procedures.
- Sampling
error estimates should accompany any estimates from samples.
- For
complex sample designs, the BTS office originating the data should
provide guidance on estimation and variance calculation. The guidelines should cover proper use
of weights and recommend a maximum coefficient of variation and a minimum
cell size for usability.
- When
doing multiple comparisons with the same data between subgroups, include a
note with the test results indicating whether or not the significance
criterion (Type I error) was adjusted and, if adjusted, the method used.
- Not
every statistically significant difference is important. Given a comparison with a statistically
significant difference, subject matter expertise is needed to determine
whether the difference is important.
In the context of the measure and its fluctuation over time, it may
be regarded as insignificant.
Guideline 5.2.4: Bridge Estimates
If the scope of data collection changes or
part of an historical series is revised, data for both the old and the new
series should be published for a suitable overlap period.
Guideline 5.2.5: Assumptions and Diagnostics
State all statistical assumptions (such as
assumptions about data distributions or structured dependence) made during the
data analysis.
- Perform diagnostics to detect violations of assumptions,
and provide the results of the diagnostics in the report. Plots of data and statistical output,
such as residuals, are often useful in detecting violations of
assumptions.
- For each assumption, include a discussion of the
likelihood that the assumption will be violated by small or large amounts
and the robustness of the data analysis method to each such violation.
Related Information
Agresti, A. 1990. Categorical Data Analysis. New York, NY: Wiley.
Anderson, T.W. 2003. An Introduction to Multivariate Statistical Analysis, 3rd ed. New York: Wiley.
Box, G.P., Jenkins, G.M., and Reinsel, G.C. 1994. Time Series Analysis: Forecasting and Control, 3rd ed. New York: Prentice Hall.
Casella, G. and Berger, R.L. 2001. Statistical Inference, 2nd ed. Belmont, CA: Duxbury Press.
Chatfield, C. 2003. The Analysis of Time Series: An Introduction, 6th ed. New York: Chapman and Hall.
Cleveland, W.S. 1993. Visualizing Data. Summit, NJ: Hobart Press.
Cochran, W.G. 1977. Sampling Techniques, 3rd ed. New York: Wiley.
Cook, R.D. and Weisberg, S.
1999. Applied Regression Including Computing and Graphics. New York:
Wiley.
Cressie, N. 1991. Statistics for Spatial Data. New York: Wiley.
Daniel, C. and Wood, F.S. 1980. Fitting Equations to Data. New York: Wiley.
DeGroot, M.H. 1989. Probability and Statistics. Reading, MA: Addison-Wesley.
Diggle, P.J., Liang, K.-Y., and Zeger, S.L. 2000. Analysis of Longitudinal Data. Oxford: Oxford University Press.
Draper, N.R. and Smith, H. 1998. Applied Regression Analysis, 3rd ed. New York: Wiley.
Efron, B. and Tibshirani, R.J. 1994. An Introduction to the Bootstrap. New York: Chapman and Hall.
Fleiss, J.L. 1981. Statistical Methods for Rates and Proportions, 2nd ed. New York: Wiley.
Hampel, F.R., Ronchetti, E.M., Rousseeuw, P.J., and Stahel, W.A. 2005. Robust Statistics: The Approach Based on Influence Functions, rev. ed. New York: Wiley.
Harvey, A.C. 1993. Time Series Models, 2nd ed. Cambridge, MA: MIT Press.
Hicks, C.R., and Turner, K.V. 1999. Fundamental Concepts in the Design of Experiments. Oxford, UK: Oxford University Press.
Hogg, R.V., Craig, A., and McKean, J.W. 2004. Introduction to Mathematical Statistics, 6th ed. New York: Prentice Hall.
Hosmer, D.W., and Lemeshow, S. 1989. Applied Logistic Regression. New York: Wiley.
Huber, P.J. 1981. Robust Statistics. New York: Wiley.
Kelsey, J.L., Whittemore, A.S., Evans, A.S., and Thompson, W.D. 1996. Methods in Observational Epidemiology. New York: Oxford University Press.
Kleinbaum, D.G., Kupper, L.L., and Muller, K.E. 1988. Applied Regression Analysis and Other Multivariable Methods. Boston: PWS-Kent.
Lehmann, E.L. and Romano, J.P. 2005. Testing Statistical Hypotheses, 3rd ed. New York: Springer Verlag.
Lehmann, E.L. and Casella, G. 1998. Theory of Point Estimation, 2nd ed. New York: Springer Verlag.
Little, R.J.A. and Rubin, D. 1987. Statistical Analysis with Missing Data. New York: Wiley.
McCulloch, C.E. and Searle, S.R. 2001. Generalized, Linear, and Mixed Models. New York: Wiley.
Mood, A.M., Graybill,
F.A., and Boes, D.C. 1974. Introduction
to the Theory of Statistics. New York: McGraw-Hill.
Office of Management and Budget (OMB). 2005. Standards for Statistical Surveys (Proposed),
Sections 4.1 (Developing Estimates and Projections) and 5.2 (Inference and
Comparisons). Washington,
DC.
July 14.
Pankratz, A. 1983. Forecasting with Univariate Box-Jenkins Models. New York: Wiley.
Rao, C.R. 1973. Linear Statistical Inference and Its Applications, 2nd ed. New York: Wiley.
Rohatgi, V.K. 1976. An Introduction to Probability Theory and Mathematical Statistics. New York: Wiley.
__________. 1984. Statistical Inference. New York: Wiley.
Rousseeuw, P.J., and Leroy, A.M. 1987. Robust Regression and Outlier Detection. New York: Wiley.
Särndal, C.-E., Swensson, B., and Wretman, J. 1991. Model Assisted Survey Sampling. New York: Springer Verlag.
Scheffé, H. 1959. Analysis of Variance. New York: Wiley.
Searle, S.R., Casella, G., and McCulloch, C.E. 1992. Variance Components. New York: Wiley.
Seber, G.A.F., and Lee, A.J. 2003. Linear Regression Analysis, 2nd ed. New York: Wiley.
Selvin, S. 1996. Statistical Analysis of Epidemiologic Data. Oxford, UK: Oxford University Press.
Skinner, C., Holt, D., and Smith, T. 1989. Analysis of Complex Surveys. New York: Wiley.
Snedecor, G.W. and Cochran, W.G. 1989. Statistical Methods, 8th ed. Ames, IA: Iowa State University Press.
Tukey, J. 1977. Exploratory Data Analysis. Reading, MA: Addison-Wesley.
U.S. Department of Transportation. 2002. The Department of Transportation Information Dissemination Quality Guidelines, Appendix A, Sections 4.3 (Production of Estimates and Projections) and 4.4 (Data Analysis and Interpretation). Available at http://dms.dot.gov/ombfinal092502.pdf as of January 19, 2005.
Wolter, K.M. 1985. Introduction to Variance Estimation. New York: Springer Verlag.
Zacks, S. 1971. Theory of Statistical Inference. New York: Wiley.
Approval Date: June 28, 2005
5.3 Data Analysis
Documentation
Standard 5.3: Document the methods and models used in data
analysis products to help ensure objectivity, utility, transparency, and
reproducibility of the estimates and projections.
Key Terms: reproducibility,
transparency
Guideline 5.3.1: Documentation Content
The data analysis report must contain
details of the methods used during the data analysis, including a description
of software used, a discussion of the data analysis assumptions, and key
information relevant to obtaining the data analysis results.
- Document
all methods, assumptions, diagnostics, and robustness checks. Provide references to support the
methods used in the data analysis, or a derivation of the theory
supporting the method used in the report.
- Include
a statement of the limitations of the data analysis, including coverage
and response limitations and statistical variation.
- Archive
the data and models used in the data analysis so the estimates can be
reproduced.
- Archive
supporting technical documentation, such as standard error and
significance test calculations, that help ensure transparency and
reproducibility.
- For
recurring reports, consider producing a methodological report.
Related Information
Bureau of Transportation Statistics (BTS). 2005. BTS Statistical Standards Manual,
Section 6.8 (Public Documentation), Washington,
DC. Available at http://www.bts.gov/programs/statistical_policy_and_research/bts_statistical_standards_manual/index.html,
as of June
10, 2005.
Office of Management and Budget (OMB). 2002. Guidelines for Ensuring and Maximizing the
Quality, Objectivity, Utility, and Integrity of Information Disseminated by
Federal Agencies. Federal Register,
Vol. 67, No. 36, pp. 8452-8460. Washington,
DC.
February 22.
__________. 2005. Standards
for Statistical Surveys (Proposed), Section 4.1 (Developing Estimates and
Projections). Washington,
DC. July
14.
Approval Date: June 28, 2005
|
|