Chapter 12: Statistical and Other Issues

This chapter provides an overview of some of the statistical issues involved in making DNA-based identifications of victims of a mass fatality incident. Because both mathematics and policy should be considered when determining statistical thresholds for making an identification, a single statistical approach may not be sufficient for every mass fatality disaster. Issues to consider when setting policy for a mass fatality response would include, for example, the condition of the remains, and the existence and reliability of samples. Appendix I contains an extensive reference list that may assist laboratory managers, policymakers, or public officials who desire a more indepth understanding of the use of statistics in making DNA-based identifications.

Michael Conneally

In the field of human genetic research, genotypes of relatives have been used to reconstruct a partial or total genotype for the purpose of gene mapping. Experience gained in this area proved to be invaluable in helping to identify WTC victims.

When dealing with statistical issues—including the statistical threshold necessary to make a DNA-based identification of a victim’s remains—it is important that the identification policy for a particular mass fatality response effort be consistent with the goals of the effort. Decisions about the number of and specific loci to type, the statistical thresholds, and the use of outside laboratories and consultants should be made quickly (see chapter 4, Major Decisions). For example, in the World Trade Center (WTC) identification effort, the Kinship and Data Analysis Panel (KADAP) endorsed the decision of the New York City Office of the Chief Medical Examiner (OCME) to use the standard Combined DNA Index System (CODIS) core 13 short tandem repeat (STR) loci, as well as the Amelogenin sex-typing locus used in forensic laboratories throughout the United States (see chapter 11, Sample Analysis, for more on “mini-STRs,” single nucleotide polymorphisms, and mitochondrial DNA sequencing, which also were used in the WTC identification effort).

It is important to note that the identification of WTC victims did not require the creation of any new statistical approach. In fact, the statistical approach recommended by the KADAP and used by the OCME was based on two well-established methods.

The first method, known as “direct matching,” assesses the probability or likelihood that a DNA profile from a victim’s remains and a profile developed from a personal item known to belong to a missing individual would share—by chance—the same DNA profile. The direct matching method is similar to that used in forensic genetic testing, in which there is an estimate of the strength of a match between a DNA profile from biological evidence and a profile obtained from a known reference sample. Direct matching was used in approximately two-thirds of the WTC DNA identifications.

A second statistical method, called “indirect matching,” uses methods of formal genetic kinship analysis, in which a comparison is made of the DNA profile from a victim’s remains and those of biological relatives in a known kindred (i.e., a “family tree” or pedigree). Also called “kinship analysis,” this approach is similar to that used for parentage assignment in paternity testing, nursery mixup resolution, immigration, and probate disputes. Kinship analysis was necessary in about one-third of the DNA-based identifications of WTC victims.

The theories and practices of statistical analyses in making DNA-based identifications are well developed and well documented. Before any DNA testing is performed, the “posterior probability”—that is, the level of confidence needed to make an identification—should be established. The posterior probability is based on the product (multiplication) of two components: a “prior probability” and a “likelihood ratio.”

Prior probability is the chance that any remains sample belongs to a particular individual; typically, it is based only on the estimated number of reported missing persons (RM), which can change over the course of the identification process. In the WTC identification effort, for example, the number of RMs was originally much higher—as many as 5,000—than the final estimate of approximately 2,750, after multiple reports, multiple nicknames, and other victim data were reconciled.

Likelihood ratio is the strength of the DNA evidence favoring identification.