Census Bureau

Approximate String Comparison and its Effect on an Advanced Record Linkage System

Edward H. Porter and William E. Winkler, Bureau of the Census

KEY WORDS: string comparator, bigram, assignment algorithm, EM algorithm, latent class.


This paper examines various methods of string comparison for dealing with typographical error, models their relationship to the main likelihood ratio used in the Fellegi-Sunter decision rule, and shows how they improve matching performance.