The U.S. Census Bureau

Methods for Record Linkage and Bayesian Networks

William E. Winkler

KEY WORDS:likelihood ratio, Bayesian Nets, EM Algorithm

ABSTRACT

Although terminology differs, there is considerable overlap between record linkage methods based on the Fellegi-Sunter model (JASA 1969) and Bayesian networks used in machine learning (Mitchell 1997). Both are based on formal probabilistic models that can be shown to be equivalent in many situations (Winkler 2000). When no missing data are present in identifying fields and training data are available, then both can efficiently estimate parameters of interest. When missing data are present, the EM algorithm can be used for parameter estimation in Bayesian Networks when there are training data (Friedman 1997) and in record linkage when there are no training data (unsupervised learning). EM and MCMC methods can be used for automatically estimating error rates in some of the record linkage situations (Belin and Rubin 1995, Larsen and Rubin 2001).

CITATION:

Source: U.S. Census Bureau, Statistical Research Division

Created: 05-NOV-2002
Last revised: November 06 2002