U.S. Census Bureau

Automatically Estimating Record Linkage False Match Rates

William E. Winkler

KEY WORDS: EM algorithm, unsupervised and semi-supervised learning

ABSTRACT

This paper provides a mechanism for automatically estimating record linkage false match rates in situations where the subset of the true matches is reasonably well separated from other pairs. The method provides an alternative to the method of Belin and Rubin (JASA 1995) and is applicable in more situations. We provide examples demonstrating why the general problem of error rate estimation (both false match and false nonmatch rates) is likely impossible in situations without training data and exceptionally difficult even in the extremely rare situations when training data are available.

CITATION:

Source: U.S. Census Bureau, Statistical Research Division

Created: June 13, 2007
Last revised: June 13, 2007