NIST

Jaro-Winkler

(algorithm)

Definition: A measure of similarity between two strings. The Jaro measure is the weighted sum of percentage of matched characters from each file and transposed characters. Winkler increased this measure for matching initial characters, then rescaled it by a piecewise function, whose intervals and weights depend on the type of string (first name, last name, street, etc.).

Generalization (I am a kind of ...)
string matching with errors.

See also Levenshtein distance, phonetic coding.

Note: For "piecewise function", see the definition in MathWorld or answers from Dr. Math.

Author: PEB

Implementation

Cohen, Ravikumar, and Fienberg have an implementation in their SecondString (Java) package.

More information

William E. Winkler and Yves Thibaudeau, An Application of the Fellegi-Sunter Model of Record Linkage to the 1990 U.S. Decennial Census, Statistical Research Report Series RR91/09, U.S. Bureau of the Census, Washington, D.C., 1991. The abstract (HTML) and full paper (PDF).
Matthew A. Jaro, UNIMATCH: A Record Linkage System: User's Manual, Technical Report, U.S. Bureau of the Census, Washington, D.C., 1976.
Matthew A. Jaro, Advances in Record-linkage Methodology a Applied to Matching the 1985 Census of Tampa, Florida, Journal of the American Statistical Association, 89:414-420.


Go to the Dictionary of Algorithms and Data Structures home page.

If you have suggestions, corrections, or comments, please get in touch with Paul E. Black.

Entry modified 26 December 2012.
HTML page formatted Wed Dec 26 09:36:57 2012.

Cite this as:
Paul E. Black, "Jaro-Winkler", in Dictionary of Algorithms and Data Structures [online], Paul E. Black, ed., U.S. National Institute of Standards and Technology. 26 December 2012. (accessed TODAY) Available from: http://www.nist.gov/dads/HTML/jaroWinkler.html

to NIST home page