U.S. Census Bureau

Data Quality: Automated Edit/Imputation and Record Linkage

William E. Winkler

KEY WORDS:

ABSTRACT

Statistical agencies collect data from surveys and create data warehouses by combining data from a variety of sources. To be suitable for analytic purposes, the files must be relatively free of error. Record linkage (Fellegi and Sunter, JASA 1969) is used for identifying duplicates within a file or across a set of files. Statistical data editing and imputation (Fellegi and Holt, JASA 1976) are used for locating erroneous values of variables and filling-in for missing data. Although these powerful methods were introduced in the statistical literature, the primary means of implementing the methods have been via computer science and operations research (Winkler, Information Systems 2004a). This paper provides an overview of the recent developments.

CITATION:

Source: U.S. Census Bureau, Statistical Research Division

Created: July 12, 2006
Last revised: July 12, 2006