|
Chapter 2
Planning and Design of Data Collection Systems
BTS data collection systems must be designed to meet both
internal and external user needs and the agency’s legislative mandates.
This chapter covers the planning and design of data
collection systems, including:
- Establishing
data needs and data collection system objectives
(Section 2.1),
- Identifying
the data providers (Section 2.2),
- Planning
and designing data collection methods to meet data needs and objectives
(Section 2.3), and
- Documenting
data collection plans and designs (Section 2.4).
2.1 Objectives and
Requirements
Standard 2.1: Planning for a data collection system,
whether it is a new system or a revision of an established system, must
include:
- Consultation
with data users and providers,
- Definition
of data needs and objectives, and
- Choice
of how to meet data requirements.
Key Terms: major data users,
precision
Guideline 2.1.1: Consultation with Data Users and
Providers
Develop and update the
data system objectives in partnership with major data users and data providers. Establish a process to consult regularly with
major data users regarding changes in data needs and possible updates to the data
collection system.
- OMB
requires publication of a Federal Register notice requesting public
comments for all proposed information collections, administered by a
federal agency, that would collect data from ten or more persons outside
the federal government within a year,
- Consultations
with data users and providers should be expanded to include other means
for collecting comments and suggestions, such as individual meetings,
focus groups, presentations at conferences and workshops, cognitive
testing, and pretests/pilot tests.
- When
revising an established data collection system, review any previous
evaluation studies for information relating user needs to current system performance.
Guideline 2.1.2: Definition of Data Needs and Objectives
Establish system objectives in clear,
specific terms that identify data user needs and data analysis goals before
initiating data system development.
Modifications required later are often difficult and expensive to
implement. The definition of data needs
should include:
- What data items are needed
and how they will be used,
- The precision level required
for estimates,
- The format, level of
detail, and types of tabulations and outputs, and
- When and how frequently
users need the data.
The final data collection choices will be made in the design
phase (Section 2.3), taking into account constraining factors (e.g., cost,
time, legal factors), and quality of available data.
Guideline 2.1.3: Choice of How to Meet Data Requirements
Before beginning detailed planning for the
collection of specific data items, review related studies and data collection
systems. Determine whether all or part
of the required data are already available, or could be more easily obtained by
adding or modifying questions in existing federal data collections.
- If the required
information is not directly available, determine whether it can be derived
or estimated using existing data sources.
- If existing federal data collection systems meet some
but not all of the data requirements, determine whether the existing data
systems can be altered to meet the data requirements through, for example,
an inter-agency agreement.
Related Information
Office of Management and Budget (OMB). 2005. Standards for Statistical Surveys (Proposed),
Section 1.1 (Survey Planning). Washington,
DC. July
14.
Stopher, P. and Jones, P., eds. 2003. Transport Survey Quality and Innovation. Oxford, UK: Pergamon.
U.S. Department of Transportation (DOT). 2002. The
Department of Transportation Information Dissemination Quality Guidelines,
Chapter 2 (Planning Data Systems). Washington, DC. Available at http://dms.dot.gov/ombfinal092502.pdf as of January 19, 2005.
Approval Date: August 15, 2005
2.2 Target Population and
Sample Design
Standard 2.2: Planning and design
must specify the proposed target population, source for lists of the target
population, and (where applicable) sample design and sample size, accuracy
requirements, and response rate goals.
Key Terms: accuracy, coverage,
frame, response rate, target population
Guideline 2.2.1: Target Population and Frames
Lists of the units
in target population are required to obtain information from the target
population. Availability of such lists (also
known as frames) is often a restriction to the method used in data collection. When a new frame is needed for a data
program, develop and implement a plan for constructing the frame. The plan should cover:
- Choice
of the target population and the rationale,
- Any exclusions that have been applied to target
and/or frame populations by design,
- Sources of lists
of target population units,
- Identification
and description of other frame files which exist and whether portions of
other frame files will be used to construct a new file,
- When applicable,
a description of any multistage sampling, such as geographic area
sampling, that will be undertaken prior to development of lists of units
and the stages in which the final lists will be developed,
- Methods for
matching and merging population lists, if applicable,
- Data items needed
for units in the frame,
- Anticipated
coverage of the target population by the frame,
- Coverage rates in excess of 95 percent overall
and for each major target population subgroup are desirable.
- Consider using
frame enhancements, such as frame supplementation or dual frame
estimation, to increase coverage.
- If
the anticipated coverage falls below 85 percent, evaluate and document
the potential for bias (OMB 2005).
- Any
estimation techniques used to improve the coverage of estimates, such as
post-stratification procedures,
- Other limitations of the frame including the
timeliness of the frame, and
- Projected
frequency of frame updates.
Guideline 2.2.2: Sample Design
A 100 percent
data collection may be required by law, necessitated by accuracy requirements,
or relatively inexpensive (e.g., data readily available). Otherwise, the sample design should include
appropriate sampling methods. Any sample design chosen should ensure the sample
will yield the data required to meet the objectives of the data collection.
- Use probability
sampling so that sampling error can be estimated. Any use of nonprobability sampling
methods (e.g., cut-off or model-based samples) must be justified
statistically and be able to measure estimation error.
- The
sample design should include:
- Identification of the
sampling frame and the adequacy of the frame,
- The sampling unit used
(at each stage if multistage design),
- Criteria for stratifying
or clustering,
- Sampling strata,
- Sample size by stratum,
- Expected yield by
stratum,
- Sample selection
procedures,
- The known probability (or
probabilities) of selection,
- Estimated efficiency of
sample design,
- Power analyses to
determine sample sizes and effective sample size for key variables by
reporting domains (where appropriate),
- Response rate goals
(Guideline 4.5.3),
- Estimation and weighting
plan,
- Variance estimation
techniques appropriate to the sample design,
- Expected
precision of estimates for key variables, and
- References
for the sampling methods used.
- For
nonprobability sample designs, include a detailed selection process and
demonstrate that units not in the sample are impartially excluded on
objective grounds.
- Discuss potential
nonsampling errors, including reporting errors, response variance, measurement
bias, nonresponse, imputation error, and errors in processing the data. Indicate steps to be taken to minimize
the effect of these problems on the data.
Related Information
Bureau of Transportation Statistics. 2004. Confidentiality
Procedures Manual. Washington,
DC.
__________.
2005. BTS Statistical Standards Manual, Section 3.2 (Frame Maintenance
and Updates). Washington, DC. Available at http://www.bts.gov/programs/statistical_policy_and_research/bts_statistical_standards_manual/index.html as of July 29,
2005.
Cochran, W.G. 1977. Sampling Techniques, 3rd ed. New York: Wiley.
Office of Management and Budget (OMB). 2004.
Questions and Answers When Designing Surveys for Information
Collection. Washington, DC. December 6.
__________. 2005. Standards
for Statistical Surveys (Proposed), Section 1.2 (Survey Design) and Section
2.1 (Developing Sampling Frames). Washington,
DC.
July 14.
Särndal, C.-E., Swensson, B., and Wretman, J. 1991. Model Assisted Survey Sampling. New York: Springer Verlag.
U.S. Department of Transportation (DOT). 2002. The
Department of Transportation Information Dissemination Quality Guidelines,
Chapter 2 (Planning Data Systems). Washington, DC. Available at http://dms.dot.gov/ombfinal092502.pdf as of January 19, 2005.
Wolter, K.M. 1985. Introduction to Variance Estimation. New York: Springer Verlag.
Approval Date: August 15, 2005
2.3 Data Collection Methods
Standard 2.3: The design and
planning for data collection must include:
- The detailed
methods to be used to collect data,
- The data
collection instruments and associated instructions,
- A pretest for
new data collection systems, or existing systems with major revisions, and
- Plans for the
dissemination of major resulting information products to the public.
Key Terms: bias, bridge study,
collection instrument, confidentiality, crosswalk, key variable, measurement
error, response rate
Guideline
2.3.1: Methods of Obtaining Data
The data collection method should be
appropriate to the nature, amount, and complexity of the data requested, the
number of data providers, available resources, and the amount of time
available.
- Determine
the method, or combination of methods, of data collection (e.g., mail,
telephone, Internet, etc.) that is appropriate for the target population
and the objectives of the data program.
The determination should include consideration of the likely effect
of method choice on response rates.
- Establish a data
collection period that allows sufficient
response time for data providers to supply reliable data, including time
to follow up on missing data, and meets the required dissemination
schedule.
- Develop a plan for
confidentiality protection (BTS 2004) during sampling, data collection,
processing, data analysis, and dissemination.
- Develop plans for data
processing, including data editing and imputation (BTS 2005, Chapter
4).
- Plan for quality assurance
during each phase of the data collection process to permit monitoring and
assessing the performance during implementation. Include contingencies to modify the
procedures if critical requirements (e.g., for the response rate) are not
met.
- Establish
a formal training process for persons involved in interviewing, observing,
or reporting data to ensure that the intended procedures are followed.
- If redesigning an existing
data system, analyze and document the potential impact of changes in key
variables or data collection procedures.
- Plan for evaluating data
collection and processing procedures, results, and potential biases.
- Develop general
specifications for an internal project management system for the complete
data collection cycle that identifies critical activities and key
milestones that will be monitored, and the time relationships among them.
Guideline 2.3.2: Instruments and Instructions
Design the data
collection instrument in a manner that maximizes data quality, while minimizing
respondent burden:
- Do not use instrument
formats that are inappropriate for the method of data collection. For example, if using a
self-administered collection instrument, limit skip patterns to ease
navigation.
- Develop clearly written
instructions to help reporters minimize missing data and measurement
error.
- Require that data items
are clearly defined in terms the reporters understand, with entries in a
logical sequence and with reasonable visual cues and instrument formatting
(if applicable). Pretest to
identify problems with interpretability.
- Structure the order and
presentation of data items such that responses do not unduly influence
responses to subsequent items.
- Minimize the number of
data calculations and conversions the reporter must make.
- For computer-assisted and
other forms of electronic data collection (using GPS devises, sensors,
etc.):
- Test for validity and
reliability under conditions similar to those of the planned data
collection.
- Develop protocols for the
backup and recovery of data.
- If possible, have
alternate methods of data collection available in case of equipment
failure. Otherwise, develop plans
to impute or adjust for faulty or missing observations.
- Establish
protocols that minimize measurement error, such as conducting response
analysis surveys that ensure records exist for data elements requested for
business data collections, establishing
recall periods that are reasonable for personal data collections, and developing computer systems that ensure internet
data collections function properly.
Guideline 2.3.3: Standard Codes and Classifications
To allow data comparisons across
databases, use standard names, variables, numerical units, codes, and
definitions. Use codes and classifications consistent with the
federal coding standards listed below, if applicable. If a federal coding standard does not exist,
consult with subject area experts to determine if applicable non-federal
standards exist. Provide crosswalk
tables to the federal standard codes for any legacy coding that does not meet
the federal standards. These codes are
updated periodically. Current federal
standard codes include:
- FIPS
Codes. The National Institute of
Standards and Technology (NIST n.d.) maintains Federal Information Processing
Standards (FIPS) required for use in federal information processing in
accordance with OMB Circular A-130.
The following FIPS should be used for coding:
- 5-2, Codes for the Identification of the States,
the District of Columbia and the
Outlying Areas of the United States,
and Associated Areas.
- 6-4, Counties and Equivalent Entities of the
U.S., Its Possessions, and Associated Areas.
- 10-4, Countries, Dependencies, Areas of Special
Sovereignty and Their Principal Administrative Divisions.
- Statistical
Areas. OMB (2005b) defines
Metropolitan Statistical Areas, Micropolitan Statistical Areas, Combined
Statistical Areas, and New England City and Town Areas for use in Federal
statistical activities. These areas, as well as principal cities, are
updated annually to reflect changes in population estimates.
- NAICS
Codes. The North American Industry
Classification System (NAICS) should be used to classify establishments
(U.S. Census Bureau n.d.). NAICS
was developed jointly by the United States,
Canada,
and Mexico
to provide new comparability in statistics about business activity across North
America. (NAICS coding
replaced the U.S. Standard Industrial Classification (SIC) system.)
- SOC
Codes. The Standard Occupational Classification (SOC) system (BLS 2000)
should be used to classify workers into occupational categories for the
purpose of collecting, calculating, or disseminating data.
- Race and
Ethnicity. Classification of race
and ethnicity, as well as methods of collection, should comply with OMB’s
Standards for Maintaining, Collecting, and Presenting Federal Data on Race
and Ethnicity (OMB 2000).
- Aviation. The International Air Transport
Association, an airline industry association, establishes standard codes
for airlines and airport locations (IATA n.d.). The BTS Office of Airline Information also
develops and maintains Aviation Support Tables (BTS n.d.) that provide standard
codes and other information for air carriers (U.S. and foreign), worldwide
airport locations, and for aircraft types and models. The BTS codes do not always agree with
IATA coding.
- Standard
Classification of Transported Goods (SCTG) Reporting System Codes. The SCTG coding system (Statistics
Canada n.d.) was created by the U.S. and Canadian governments, and is used
to address statistical needs regarding the transportation of
products.
- United
Nations (UN) Numbers and North American (NA) Numbers. UN numbers are four digit numbers used
worldwide to identify different hazardous materials. The UN numbers are developed through the
framework of the United Nations Model Regulations on the Transport of
Dangerous Goods. NA numbers are
assigned by the U.S. and Canada to hazardous materials that have not been
assigned a UN number. The PHMSA
Office of Hazardous Materials Safety (PHMSA n.d.) maintains a consolidated
table of hazardous materials codes and information.
- Injury
Codes. “The International
Classification of Diseases, Ninth Revision, Clinical Modification
(ICD-9-CM)” (NCHS n.d.) is the official system of assigning codes to diagnoses
and procedures associated with hospital utilization in the United
States. The E-codes in this manual
are for injuries. Transportation
related injuries span from E800 to E848.
- Human
Factors Codes. The FAA Office of
Aviation Medicine (FAA 2000) uses “The Human Factors Analysis and
Classification System—HFACS.”
Guideline 2.3.4: Pretesting
For new data collections or major
revisions of ongoing collections, all components must be pretested so that they
minimize measurement error and function as intended prior to full
implementation.
- One component of
pretesting is a pilot test in which some components of a data collection
can be pretested prior to a field test of the data collection (for
example, using focus groups, cognitive laboratory work, and or calibration
studies).
- Another
component of pretesting is a field test. Components of a data collection
that cannot be successfully demonstrated through previous work should be
field tested prior to implementation of the full-scale data collection. The design of a field test should
reflect realistic conditions, including those likely to pose difficulties
for the data collection.
Guideline 2.3.5: Proposed Data Analysis and Information
Products
Develop a dissemination
agenda that identifies proposed major information products, timing of release,
and their target audiences.
- Proposed data analysis
should identify issues, objectives, and key variables, and be linked to
the questions the data collection was intended to answer.
- Develop adjustment methods,
such as crosswalks and bridge studies that will be used to preserve trend
analyses and inform users about the impact of changes.
Related Information
Bureau of Labor Statistics (BLS). 2000. Standard
Occupational Classification (SOC) System.
Available at http://www.bls.gov/soc/ as of November 15, 2004.
Bureau of Transportation Statistics (BTS). n.d. Aviation Support Tables. Office of Airline Information: Washington, DC. Available at http://www.transtats.bts.gov/Tables.asp?DB_ID=595&DB_Name=Aviation%20Support%20Tables&DB_Short_Name=Aviation%20Support%20Tables as of July
20, 2005.
__________. 2004. Confidentiality Procedures Manual. Washington, DC.
__________.
2005. BTS Statistical Standards Manual, Chapters 3-6. Washington, DC. Available at http://www.bts.gov/programs/statistical_policy_and_research/bts_statistical_standards_manual/index.html as of July 29,
2005.
Energy
Information Administration (EIA). 2002.
EIA Standards Manual, Standard
EIA 2002-5 (Frames Development and Maintenance) and Standard 2002-4
Supplementary Materials, Forms Design Checklist. Washington, DC. Available at http://www.eia.doe.gov/smg/Standard.pdf as of January
25, 2005.
Federal Aviation Administration (FAA). 2000.
The Human Factors Analysis and Classification System—HFACS. DOT/FAA/AM-00/7. Office of Aviation Medicine: Washington,
DC. Available at http://www.hf.faa.gov/Portal/ShowProduct.aspx?ProductID=54 as of June
15, 2005.
International Air Transportation
Association (IATA). n.d. Airline Coding Directory.
London, UK. Available
at http://www.iata.org/ps/publications/9095.htm as of July 26, 2005.
National Center for Health Statistics (NCHS). n.d. “The International
Classification of Diseases, Ninth Revision, Clinical Modification” (ICD-9-CM). Available at http://www.cdc.gov/nchs/about/otheract/icd9/abticd9.htm as of June 14, 2005
National Institute of Standards
and Technology (NIST). n.d. Federal Information Processing Standards Publications. Available at
http://www.itl.nist.gov/fipspubs/index.htm as of November 15, 2004.
Office of Management and Budget (OMB). 2000.
Provisional Guidance on the Implementation of the 1997 Standards for
Federal Data on Race and Ethnicity. Available at
http://www.whitehouse.gov/omb/inforeg/statpolicy.html#dr as of November 15, 2004.
__________.
2004. Questions and Answers When
Designing Surveys for Information Collection.
Washington, DC. December 6.
__________. 2005a. Standards for Statistical
Surveys (Proposed), Section 3.3 (Coding).
Washington, DC. May 19.
__________. 2005b. Update of Statistical
Area Definitions and Guidance on Their Uses. Available at http://www.whitehouse.gov/omb/inforeg/statpolicy.html#ms as of July
15, 2005.
Pipeline and Hazardous Materials Safety
Administration (PHMSA). n.d. “Hazmat Table.” Office of Hazardous Material Safety: Washington, DC. Available at http://www.myregs.com/dotrspa/ as of July
20, 2005.
Presser, S., Rothgeb, J.M., Couper, M.P., Lessler,
J.T., Martin, M., Martin, J., and Eleanor Singer. 2004. Methods for Testing and Evaluating Survey
Questionnaires. New York: Wiley.
Statistics Canada. n.d. Standard Classification of
Transported Goods (SCTG). Ottawa,
Canada. Available at http://www.statcan.ca/english/Subjects/Standard/sctg/sctg-intro.htm as of June
14, 2005.
Stopher, P. and Jones,
P., eds.
2003. Transport Survey Quality and Innovation. Oxford, UK: Pergamon.
Sudman, S., Bradburn, N., and Schwarz, N.
1996.
Thinking about Answers: The
Application of Cognitive Processes to Survey Methodology. San Francisco: Jossey-Bass.
U.S. Census Bureau. n.d. The North American Industry Classification System (NAICS). Washington, DC. Available at
http://www.census.gov/epcd/www/naics.html as of November 15, 2004.
U.S. Department of Transportation (DOT). 2002. The
Department of Transportation Information Dissemination Quality Guidelines,
Chapter 2 (Planning Data Systems). Washington, DC. Available at http://dms.dot.gov/ombfinal092502.pdf as of January 19, 2005.
Approval Date: August 15, 2005
2.4 Documents and Documentation
Standard 2.4: Planning activities
must include the documentation of user needs and design decisions as
well as the preparation of required administrative documents.
Key Terms: coverage, frame,
target population
Guideline 2.4.1: Documentation of Data Needs
After establishing the data needs and
requirements, prepare a detailed technical document that describes the goals and objectives of the data collection, including:
- A summary of the consultations with major data users and
data providers, plus any other sources consulted,
- The
information needs that will be met,
including the desired accuracy, timeliness, and dissemination
format(s) for the data, and
- The
choices made for meeting data needs and their relationship to the
requirements.
Guideline 2.4.2: Target Population and Frames
Documentation
Describe the target
populations and associated frames (lists of
population units) in detail. Include a discussion of coverage issues (Guideline 2.2.1).
Guideline 2.4.3: Sample Design Documentation
If sampling is part of the data
collection design, prepare a detailed description of the sample design
(Guideline 2.2.2) and how it will yield the data required to meet the
objectives of the data collection. When
a nonprobabilistic sampling method is employed, the survey design documentation
should include:
- A
discussion of what options were considered and why the final design was
selected,
- An
estimate of the potential bias in the estimates, and
- The
methodology to be used to measure estimation error.
Guideline 2.4.4: Collection and Processing Methodology
Documentation
Document the collection design and its connection to the data
requirements (Section 2.3). The
documentation should include the methods of obtaining data, copies of the data
collection instrument and instructions, pretest design and findings, and plans
for disseminating the results of the data collection to the public.
Guideline 2.4.5: Administrative Documents
Comply with the following requirements as
part of the data collection planning and design:
- When
planning and design is in its initial stages, prepare a project plan
specifying schedules and resource requirements in the format specified by
BTS management.
- Data
collections (and related activities such as focus groups, cognitive
interviews, pilot studies, field tests, etc.) are all collections of
information subject to the requirements of the Paperwork Reduction Act of
1995 (P.L. 104-13, 44 U.S.C. 3501 et seq.) and OMB’s regulations (5 CFR
Part 1320, Controlling Paperwork Burdens on the Public). OMB approval is
required before the agency may collect information from ten or more
persons outside the Federal government in a twelve-month period. The documentation specified in this
section can all be used in Part B of the submission to OMB (OMB 2004a)
- Projects that require a new IT
investment or significant modification of an existing IT investment must
go through the Capital Planning and Investment Control process.
- Contracts should include language stating that the contractor
shall comply with all standards and guidelines contained in the BTS Statistical Standards Manual and the
BTS Confidentiality Procedures
Manual.
Related Information
Bureau of Transportation Statistics. 2004. Confidentiality
Procedures Manual. Washington,
DC.
__________. 2005. BTS
Statistical Standards Manual. Washington, DC. Available at http://www.bts.gov/programs/statistical_policy_and_research/bts_statistical_standards_manual/index.html as of July 29,
2005.
Office of Management and Budget (OMB). 2004a. Paperwork Reduction Act Submission
(Form OMB 83-I). Washington,
DC. February. Available at http://www.whitehouse.gov/omb/inforeg/83i-fill.pdf as
of June 15,
2005.
__________. 2004b.
Questions and Answers When Designing Surveys for Information Collection. Washington, DC. December 6.
U.S. Department of Transportation (DOT). 2002. The
Department of Transportation Information Dissemination Quality Guidelines,
Chapter 2 (Planning Data Systems). Washington, DC. Available at http://dms.dot.gov/ombfinal092502.pdf as of January 19, 2005.
Approval Date: August 15, 2005
|
|