The Discrimination Power of Dependency Structures
Yves Thibaudeau
RR-92/08, 6/30/92
ABSTRACT
A record linkage process brings together records from two files into pairs of two records, one
from each file, for the purpose of comparison. Each record represents an individual. The status
of the pair is a "matched pair" status if the two records do not represent the same individual.
The record-linkage process is governed by an underlying probabilistic process. A record-linkage
rule infers the status of each pair of records based on the value of the comparison. The pair is
declared a "link" of the inferred status is that of a matched pair, and it is declared a "non-link"
if the inferred status is that of an unmatched pair. The discrimination power of a record-linkage
rule is the capacity of the rule to designate a maximum number of matched pairs as links, while
keeping the rate of unmatched pairs designated as links to a minimum. In most of the existing
literature, it is assumed that the underlying probabilistic process is an instance of the conditional
independence latent class model. However, in many situations, this assumption is false. The
paper introduces more general models.