The U.S. Census Bureau

An Adaptive String Comparator for Record Linkage

William E. Yancey

KEY WORDS:

ABSTRACT

We develop a string comparator based on edit distance that uses variable edit-step costs derived from training data. Using first and last name data from census files, we compare the performance of this string comparator with one without variable edit step costs and with the Jaro-Winkler string comparator, which is standardly used in the Census Bureau's record linkage software.

CITATION:

Source: U.S. Census Bureau, Statistical Research Division

Created: 25-FEB-2004
Last revised: March 04 2004