The U.S. Census Bureau

USING THE EM ALGORITHM FOR WEIGHT COMPUTATION IN THE FELLEGI-SUNTER MODEL OF RECORD LINKAGE

William E. Winkler, Bureau of the Census

KEY WORDS: decision rule, error rate

ABSTRACT

Let A×B be the product space of two sets A and B which is divided into a (pairs representing the same entity) and nonmatches (pairs representing different entities). Linkage rules are those that divide A×B into links (designated matches), possible links (pairs for which we delay a decision), and nonlinks (designated nonmatches). Under fixed bounds on the error rates, Fellegi and Sunter (1969) provided a linkage rule that is optimal in the sense that it minimizes the set of possible links. The optimality is dependent on knowledge of certain joint inclusion probabilities that are used in a crucial likelihood ratio. In applying the record linkage model, assumptions are often made that allow estimation of weights that are a function of the joint inclusion probabilities. If the assumptions are not met, then the linkage procedure using estimates computed under the assumptions may not be optimal. This paper describes a method for estimating weights using the EM Algorithm under less restrictive assumptions. The weight computation automatically incorporates a Bayesian adjustment based on file characteristics.

CITATION:

Source: U.S. Census Bureau, Statistical Research Division

Created: October 4, 2000
Last revised: October 05 2000