NIST

Jaro-Winkler

(algorithm)

Definition: A measure of similarity between two strings. The Jaro measure is the weighted sum of percentage of matched characters from each file and transposed characters. Winkler increased this measure for matching initial characters, then rescaled it by a piecewise function, whose intervals and weights depend on the type of string (first name, last name, street, etc.).

Generalization (I am a kind of ...)
string matching with errors.

See also Levenshtein distance, phonetic coding.

Note: For "piecewise function", see the definition in MathWorld or answers from Dr. Math.

Author: PEB

Implementation

Cohen, Ravikumar, and Fienberg have an implementation in their SecondString (Java) package.

More information

Winkler and Thibaudeau paper abstract (HTML) and full paper (PDF).

William E. Winkler and Yves Thibaudeau, An Application of the Fellegi-Sunter Model of Record Linkage to the 1990 U.S. Decennial Census, Statistical Research Report Series RR91/09, U.S. Bureau of the Census, Washington, D.C., 1991.
Matthew A. Jaro, UNIMATCH: A Record Linkage System: User's Manual, Technical Report, U.S. Bureau of the Census, Washington, D.C., 1976.
Matthew A. Jaro, Advances in Record-linkage Methodology a Applied to Matching the 1985 Census of Tampa, Florida, Journal of the American Statistical Association, 89:414-420.


Go to the Dictionary of Algorithms and Data Structures home page.

If you have suggestions, corrections, or comments, please get in touch with Paul E. Black.

Entry modified 14 August 2008.
HTML page formatted Thu Aug 14 12:18:08 2008.

Cite this as:
Paul E. Black, "Jaro-Winkler", in Dictionary of Algorithms and Data Structures [online], Paul E. Black, ed., U.S. National Institute of Standards and Technology. 14 August 2008. (accessed TODAY) Available from: http://www.nist.gov/dads/HTML/jaroWinkler.html

to NIST home page