The U.S. Census Bureau

Evaluating String Comparator Performance for Record Linkage

William E. Yancey



We compare variations of string comparators based on the Jaro-Winkler comparator and edit distance comparator. We apply the comparators to Census data to see which are better classifiers for matches and non-matches, first by comparing their classification abilities using a ROC curve based analysis, then by considering a direct comparison between two candidate comparators in record linkage results.


Source: U.S. Census Bureau, Statistical Research Division

Created: June 13, 2005
Last revised: June 13, 2005