Matching and Statistics Software

There are two basic approaches to DNA matching: (1) direct matching, and (2) kinship matching. Direct matching compares two DNA profiles to determine whether they come from the same source (“individual”). Sophisticated direct matching algorithms consider allelic dropout for nuclear DNA and heteroplasmy for mtDNA. Kinship matching, on the other hand, uses DNA profiles to identify biological relationships among individuals. Kinship matching should consider both allelic dropout (nuclear DNA) and mutations (nuclear and mtDNA).

Exhibit 17 shows how mass fatality incident data may be searched.

One of the primary tools for making DNA identifications is “matching software.” Currently, the most widely used forensic DNA matching software in the United States is the FBI’s Combined DNA Index System (CODIS). However, an underlying design principle of CODIS is that matches are rare and independent events—and, in mass fatality incidents, matches are neither rare nor independent of one another. Therefore, a laboratory director should be aware of CODIS limitations in a mass fatality incident response.

CODIS is designed to rapidly search crime-scene DNA profiles against each other and against DNA profiles of known individuals. One assumption built into CODIS is that each profile will match only a tiny fraction (usually one or none) of the profiles in the database. In a criminal case, which CODIS is primarily designed to handle, the DNA profile obtained from a piece of evidence might not match any of the million-plus convicted-offender DNA profiles in the database, simply because that person has not previously been convicted of a crime that mandated collection of a DNA sample.

In a mass fatality incident, however, every human remain likely will match several samples, including other remains or personal items. Although CODIS can properly identify all of the matches in a mass fatality incident (through pairwise comparisons), it does not aggregate similar matches, and, therefore, is less useful in a situation where the goal is to assemble all potential matches across time and space. That said, CODIS has a standard data file format that is used to report STR data, and this common “.cmf” format was used in the WTC identification analyses.

Once a potential direct or kinship match is identified, the laboratory must determine its statistical significance using a likelihood ratio for kinship matching. To declare a match as an identification, the computed estimates must exceed threshold values that are predefined for direct and kinship matches. The identification thresholds are determined based on the number of victims, the biological relationships of the victims, and the nature of the incident. This was a major focus of the KADAP and is addressed in chapter 12, Statistical and Other Issues.

Finally, the laboratory may elect to factor nongenetic data into the identification process. For example, human remains recovered from the WTC were catalogued based on their physical location within a two-dimensional grid superimposed on the disaster site. These data are useful when likelihood-ratio thresholds cannot be met due to incomplete DNA profiles.