The U.S. Census Bureau

Improving EM Algorithm Estimates for Record Linkage Parameters

William E. Yancey

KEY WORDS: record linkage, EM algorithm


The EM algorithm can be used to estimate conditional probabilities for matching field patterns for the Fellegi-Sunter model for record linkage. The algorithm is based on a latent class model for the record pairs where one of the classes is the set of true matches. If the number of true match pairs in the data set is too small, then the EM algorithm cannot detect the correct latent class. We consider methods for enriching the density of matches in the set of examined record pairs in order to obtain improved EM algorithm estimates for the record linkage conditional probability parameters.


Source: U.S. Census Bureau, Statistical Research Division

Created: 18-FEB-2004
Last revised: February 20 2004