In prenatal screening for cystic fibrosis, the aim is to identify couples in which both the mother and her partner have identifiable cystic fibrosis mutations. Their offspring have a 1 in 4 risk of having cystic fibrosis and definitive diagnostic testing is available. The DNA test results are qualitative (e.g., a specific mutation is reported as present or absent).
One additional consideration might be that laboratories perform differently when testing proficiency testing samples than when testing clinical samples on a routine basis. This difference might take the form of less good performance because the sample is handled outside of the laboratory routine. Alternatively, the performance might be better because extra attention might be paid to obtaining a reliable result. Future analyses should be aimed at providing reliable method- and, possibly, mutation-specific analytic performance estimates. One approach for collecting such data might include the following steps:
Ideally, this blinded sample set would be available to manufacturers as part of the pre-market approval process, with the understanding that multiple laboratories using these commercial reagents would be asked by the manufacturer to analyze portions of the sample set independently. This initial assay validation process is distinct from assay control samples that are discussed later (Question 11).
Appropriate sample size for determining analytic specificity can be derived by choosing an acceptable target specificity and an acceptable lower limit that should be excluded in the 95 percent confidence interval. The higher the specificity chosen and the tighter the confidence interval, the larger is the sample size that will be necessary to provide a definitive answer. For example, if a laboratory chose a target specificity of 98 percent and wanted to rule out a specificity of 90 percent, it would need to correctly identify at least 49 of 50 known negative samples (estimated using the binomial distribution). On the other hand, a target specificity of 99.5 percent and a desire to rule out a specificity of 98 percent would require correctly identifying at least 398 of 400 known negative samples. The determination of even higher analytic specificity with tighter confidence intervals may not be economically feasible for an individual laboratory. However, this could be attained by a consortium of laboratories using the same methodology, or by a manufacturer that forms a consortium of laboratories using its reagents.
Appropriate sample size for determining the analytic sensitivity (detection rate) could be derived using similar analyses. If a laboratory chose a target sensitivity of 95 percent and wanted to rule out a sensitivity of 80 percent, it would need to correctly identify at least 38 of 40 chromosomes with known mutations. A higher sensitivity estimate of 98 percent that rules out a rate of 95 percent would require the correct identification of at least 196 of 200 chromosomes with known mutations. If mutation-specific detection rates are desired, each would need the same number of challenges. Again, however, this may not be feasible for individual laboratories but may be possible for a consortium or manufacturer, especially for the more common mutations.
The analytic performance (analytic sensitivity and specificity) could then be determined for each methodology, along with an estimate of between-laboratory, within-method variability. Further, estimates could be made for specific racial/ethnic groups, based on the mutation-specific performance and the frequency of each mutation within that group. Overall, the analytic performance for laboratories in the United States could be estimated, given the mix of methodologies for established screening laboratories. All of these analyses could be done using a 2x2 table, and all rates could be accompanied by 95 percent confidence intervals (CI). Published method comparisons focus on technical errors in the analytic phase and usually do not deal with the pre- and post-analytic phases of the laboratory testing process.
The ACMG/CAP external proficiency testing scheme
Background and definitions As part of ACMG/CAP external proficiency testing in the United States, purified DNA from established cell lines (derived from human cells with known mutations (http::/locus.umdmj.edu/nigms/qc/dnaqc.html) is distributed to enrolled laboratories. The majority of these laboratories are likely to be providing clinical services, but reagent manufacturers and research laboratories also participate. In late 2001, there were 45 participants reporting cystic fibrosis results. A false positive result occurs when the laboratory reports finding a mutation in the sample, when none is present. A false negative result occurs when a laboratory reports no mutation, but a mutation for which it tests is, in fact, present in the sample. A third type of error occurs when the laboratory accurately identifies that a mutation is present, but it is not the correct mutation (e.g., a laboratory that is able to separately identify delF508 and delI507 reports finding delF508 when only the delI507 mutation is present). The three types of errors all are included in the analysis and encompass all three phases of testing.
The present analysis, which utilizes the ACMG/CAP data, initially examines the rates of these three types of errors independently, by chromosome (e.g., the results on one chromosome are counted separately from the results reported for the other).
Gap in Knowledge: How should the finding of a wrong mutation influence computation of the analytic performance? The relationship between the third type of error (wrong mutation) and analytic performance has not yet been formally addressed. In this document, a wrong mutation will be considered an incorrect result, since this type of error could cause harm. For example, diagnostic testing in the fetus might target the mutations reported in the couple and not identify the correct mutation in the fetus. Also, family members would not receive correct information. Further, a wrong mutation finding will treated as a false positive in this document. Confirmatory testing of positive results will provide the opportunity to correct this type of error
Error rates for the ACMG/CAP external proficiency testing scheme Table 2-1 shows the number of alleles tested and the results from the ACMG/CAP Molecular Genetics Laboratory (MGL) Survey from 1996 to 2001. Overall, 3.0% (95 percent CI 2.4 to 3.9%) of the alleles were incorrectly identified. For all data between 1996 and 2001, 2,131 of 2,198 chromosomes 97.0 percent were correctly identified (95 percent CI 96.1 to 97.6%). Appendix A contains a complete listing of the sample challenges, the responses along with the type of error (e.g., false positive), and any other adjustments made during the analysis (e.g., laboratory did not test for a mutation included in the challenge). More errors (56) occurred between 1996 and 1998 than between 1999 and 2001 (11). However, the composition of challenges in the earlier time period explains much of this excess and is taken into account in analyses that are presented later in this section.
Table 2-1. CFTR Mutation Testing: Results of the ACMG/CAP MGL Survey
1996 |
47 |
282 |
267 (96.5) |
15 ( 3.5) |
2 (0.7) |
2 (0.7) |
11 (3.9) |
1997 |
46 |
276 |
245 (89.5) |
31 (10.5) |
6 (2.2) |
7 (2.5) |
18 (6.5) |
1998 |
51 |
306 |
296 (96.7) |
10 ( 3.3) |
0 (0.0) |
10 (3.3) |
0 (0.0) |
1999 |
43 |
342 |
341 (99.7) |
1 ( 0.3) |
0 (0.0) |
0 (0.0) |
1 (0.3) |
2000 |
41 |
458 |
452 (98.7) |
6 (1.3) |
0 (0.0) |
2 (0.4) |
4 (0.9) |
2001 |
45 |
534 |
528 (99.2) |
4 (0.8) |
2 (0.4) |
1 (0.2) |
1 (0.2) |
|
|
|
|
|
|
|
|
All |
|
2198 |
2131 (97.0) |
67 (3.0) |
10 (0.5) |
22 (1.0) |
34 (1.6) |
Table 2-2 makes use of the ACMG/CAP MGL Survey data (Appendix A) to compute a preliminary estimate of analytic sensitivity and specificity. The apparent improvement in performance over time may be real, or due to differences in the types of challenges. For example, no wild/wild mutation challenges were included prior to 2000, while 8 of 12 challenges since then were wild/wild. It is not possible, because of the small numbers, to stratify the results by methodology or to provide separate estimates of performance for most of the mutations tested.
Table 2-2. Analytic Performance for Identifying All Cystic Fibrosis Mutations According to Data from the ACMG/CAP Molecular Genetics Survey
1996 |
98.9 |
(96.1-99.9) |
87.1 |
(79.0-93.0) |
1997 |
96.0 |
(91.8-98.4) |
76.7 |
(67.3-84.5) |
1998 |
96.5 |
(93.6-98.3) |
100.0 |
(83.9- 100) |
1999 |
100.0 |
(98.3- 100) |
99.2 |
(95.8-99.9) |
2000 |
97.4 |
(90.8-99.7) |
99.0 |
(97.3-99.7) |
2001 |
99.4 |
(96.7-99.9) |
99.2 |
(97.6-99.8) |
|
|
|
|
|
All |
97.9 |
(96.9-98.7) |
95.8 |
(94.4-96.9) |
Complicating factors in interpreting these results An additional aim of these external challenges was education. For that reason, it may not be appropriate to use these data to determine analytic performance without taking into account the design of these exercises. For example, 14 percent (3/21) of the challenges required that participating laboratories distinguish between the delI507 and delF508 mutations. All of these challenges occurred in the first three years of the survey. The delI507 mutation occurs in less than 1 in 2500 non-Hispanic Caucasians tested (1 percent of 1/25). This rare and difficult laboratory circumstance is emphasized because of the educational and laboratory-improvement focus of the ACMG/CAP MGL Survey. An additional complicating feature arises because it is not always clear whether some ‘false negatives' might be due to laboratories not testing for the mutation. The present analysis attempts to take this into account (Appendix A). The opportunity for a laboratory to identify a wrong mutation is considerably greater in proficiency testing exercises than in practice, due to the high frequency of mutations. For that reason, the rate of wrong mutations in proficiency testing needs to be adjusted downward in order to simulate performance in routine clinical practice.
A more reliable approach to estimating analytic sensitivity and specificity It is possible to recompute the previous analysis using only challenges that do not involve delI507. Separate estimates can then computed for the four challenges involving delI507. These two stratified estimates of analytic performance are shown in Table 2-3, along with the summary estimate from Table 2-2. The analytic specificity for identifying the delI507 mutation is poorer than for the other mutations. The sensitivity is actually better, since some mutation was reported in all instances where a delI507 mutation was present. A better estimate of overall performance that would be expected in the real world is found when challenges involving the delI507 mutation are not counted (the bolded row in Table 2-3).
Table 2-3. Analytic Performance With and Without delI507 Mutation Challenges Based on the ACMG/CAP Molecular Genetics Survey Data
All mutations (Table 2-2) |
2198 |
97.9 (96.9-98.7)1 |
95.8 (94.4-96.9) |
All but delI507 |
1940 |
97.9 (96.8-98.7) |
98.4 (97.3-99.1)2 |
delI507 only |
258 |
100 (97.1-100) |
79.1 (71.2-85.6) |
1 95 percent CI
2 A more reliable estimate of analytic specificity is provided later in this section.
Table 2-4 shows the analytic performance estimates by year for challenges without delI507. No trend is evident for improvement in analytic sensitivity, and the overall rate of 97.9 percent appears reasonable. The upper and lower confidence intervals could be taken to model the most pessimistic (96.8 percent) and optimistic (98.7 percent) estimates of analytic sensitivity. A standardized mutation panel is now becoming widely adopted, as a result of ACMG recommendations (Grody WW, 2001). As a result, manufacturers are now marketing reagents (under the rule for Analyte Specific Reagents – ASR) that have been subjected to good manufacturing processes. Analytic performance may improve as a consequence. The present analysis establishes a ‘baseline' estimate of analytic sensitivity and specificity, against which to assess that possibility.
Analytic specificity is more difficult to interpret. Thirteen of 15 errors occurred during one distribution (1997-B). Some of these might be explained by sample mix-up, but at least half appear not to be due to this cause. The European Concerted Action on Cystic Fibrosis reported that commercial kits were found to have problems identifying G551D and R553X. The majority of errors in the 1997 ACMG/CAP survey occurred when challenging these two mutations.
Table 2-4. Analytic Performance for Cystic Fibrosis Mutations According to Data from the ACMG/CAP Molecular Genetics Survey (Excluding delI507 Mutation Challenges)
1996 |
98.5 |
(94.8-99.8) |
98.1 |
(89.9-99.9) |
1997 |
96.1 |
(91.1-98.7) |
82.5 |
(70.1-91.3) |
1998 |
96.5 |
(93.6-98.3) |
100 |
(83.9- 100) |
1999 |
100 |
(98.3-100) |
99.2 |
(95.8-99.9) |
2000 |
95.3 |
(84.2-99.4) |
100 |
(98.9-100) |
2001 |
99.4 |
(96.7-99.9) |
99.2 |
(97.6-99.8) |
|
|
|
|
|
All but delI507 (Table 2-3) |
97.9 |
(96.8-98.7) |
98.41 |
(97.3-99.1) |
1 A more reliable estimate of analytic specificity is provided later in the next section
A final estimate for analytic specificity As stated earlier, the definition being used in this analysis for false positives (1-specificity) are composed of two types of errors: false positive results and wrong mutations. Finding a ‘false positive' can occur whenever a detectable mutation is not present; a common situation in screening. The finding of a ‘wrong mutation' can only occur when a mutation is present; a relatively uncommon common situation in screening. However, it is common in proficiency testing samples. There have been a total of 949 mutation challenges and 922 wild challenges (after ignoring all delI507 samples). Thus, a mutation being tested for is present in about 50 percent of the chromosomes. Conversely, only about 1.8 percent of chromosomes in the general pregnancy population will have a mutation identified (1/25 non-Hispanic Caucasians are carriers and about 90 percent of the mutations on the mutated chromosome can be detected). For this reason, the rate of wrong mutations must be ‘discounted' by a factor of about 28 (50/1.8). Thus, although Table 2-1 shows a ratio of 10 false positive results to 34 wrong mutations, the expected ratio in the general population would be more like 10 false positives to 1 or 2 wrong mutations (34/28). After samples have been removed that included delI507 and after the rate of ‘wrong mutation' in the general population has been taken into account, the revised estimate of analytic specificity is 99.4% (95 percent CI 98.7 to 99.8%).
Gap in Knowledge: Method- and mutation-specific analytic performance estimates
Tables 2-2 through 2-4 present the best available data for estimating analytic performance. These analyses should not be interpreted as being complete or robust. For example, the problems identified by the delI507/delF508 challenges are method-specific, but no attempt is made in this report to analyze laboratory performance by specific method. The results here are for the mix of methodologies presently being used in the United States and, as such, represent the average laboratory performance a clinician might expect when ordering such testing. To generate more reliable analytic performance estimates, large numbers of specimens with known genotypes will need to be run using specific methodologies. For example, Gasparini et al. (1999) used the PCR/OLA methodology to identify 114 newborns with a mutation; all of these were subsequently confirmed by DNA sequencing. Although this rules out false positives, it does not provide an estimate of analytic sensitivity, since only a small random subset of negative results was similarly sequenced and the possibility of false negative results exists. Until more refined performance estimates are available, the existing information is useful in estimating clinical performance.
Gap in Knowledge: Analytic performance estimates are available for only a small number of mutations.
Only a small number of mutations (10) has been subjected to external proficiency testing (delF508, delI507, G542X, 621+1G>T, G85E, W1282X, G551D, R553X, 1717-1G>T, and R117H ). The majority of the mutations in the recommended panel have not have been subjected to external proficiency testing. This is an important consideration because performance may vary according to laboratory methodology.
Gap in Knowledge: Analytic performance and mutation panel size. It is possible that analytic performance will differ, depending on the numbers of mutations tested, even when the same methodology is employed. Panels utilizing a higher number of mutations might be more robust because of automation or, conversely, the larger number of analytic steps might be more prone to errors.
Sensitivity and specificity by person rather than by chromosome
It is possible to compute analytic sensitivity and specificity according to whether a person's genotype has been correctly classified, rather than whether an individual chromosome has been correctly classified. That is, the genotype is correct or incorrect when detectable mutations are present (analytic sensitivity) or the genotype is correct or incorrect when no detectable mutations are present (analytic specificity). Table 2-5 shows the results of this analytic approach, stratified by the year that proficiency testing results were obtained. All three samples containing a delI507 mutation have been removed from the analysis. According to these data, the overall estimate for analytic sensitivity is 95.9% (95 percent CI 93.3 to 97.1%). This is lower than shown in Table 2-4 (97.9 percent), where the analysis is by chromosome rather than by person. When the analysis is performed by person, wrong mutations are included in the computation of analytic sensitivity. Once the eight instances of wrong mutations are accounted for, analytic sensitivity is corrected upward to 97.2 percent. This estimate is now similar to that found when the analysis was by chromosome. Table 2-5 also shows an analytic specificity of 99.7% (95 percent CI 98.4 to 99.9%), consistent with that found in Table 2-4 (99.4 percent).
Table 2-5. Analytic Sensitivity and Specificity based on the ACMG/CAP MGL Survey, Classified According to Whether a Person's Genotype is Correctly Identified
Detectable mutation present |
|
|
|
1996 |
91 (96.8) |
3 (3.2) |
94 |
1997 |
83 (90.2) |
9 (9.8) |
92 |
1998 |
143 (93.5) |
10 (6.5) |
153 |
1999 |
171 (99.4) |
1 (0.6) |
172 |
2000 |
32 (97.0) |
1 (3.0) |
33 |
2001 |
87 (97.8) |
2 (2.2) |
89 |
|
|
|
|
Analytic Sensitivity |
607 (95.9) |
26 (4.1) |
633 |
|
|
|
|
Detectable mutation not present |
|
|
|
1996 |
3 ( 100) |
0 (0.0) |
3 |
1997 |
9 ( 100) |
0 (0.0) |
9 |
1998 |
1 ( 100) |
0 (0.0) |
1 |
1999 |
2 ( 100) |
0 (0.0) |
2 |
2000 |
155 ( 100) |
0 (0.0) |
156 |
2001 |
171 (99.4) |
1 (0.6) |
172 |
|
|
|
|
Analytic Specificity |
341 (99.7) |
1 (0.3) |
342 |
External proficiency testing in Europe
Results of the proficiency testing survey conducted by the European Concerted Action for Cystic Fibrosis. Table 2-6 shows the results of that study. Because that study's report did not distinguish between false positive, false negative and incorrect mutations, it is not possible to compute an analytic sensitivity or specificity. However, the overall rate of 2.8 percent incorrectly classified chromosomes (95 percent CI 2.4 to 3.4%) is similar to the overall 3.0 percent error rate found in the ACMG/CAP survey reported earlier in this section. This study also reported that 48 percent of 114 participants had correct responses for all challenges. Another 39 percent committed one error, while 2 percent failed all challenges.
Interpretation of the results. This survey also attempted to determine the cause of errors, including sample contamination and clerical errors. In general, laboratories would have been able to correct their false positive results, if their policy had been to reanalyze samples with positive results. This indicates that the original sample was neither contaminated nor incorrectly labeled. Clerical errors/reporting mistakes/incorrect interpretations were estimated to be responsible for 90 percent of the errors. The error rate was not associated with the numbers of samples processed by the laboratory.
Table 2-6. Survey Results from the European Concerted Action for Cystic Fibrosis, According to Whether the Chromosome was Correctly Classified
1996 |
1632 |
1569 (96.1) |
63 (3.9) |
1997 |
1740 |
1691 (97.2) |
49 (2.8) |
1998 |
1908 |
1872 (98.1) |
36 (1.9) |
|
|
|
|
All |
5280 |
5132 (97.2) |
148 (2.8) |
Comparing error rates for DNA-based cystic fibrosis testing with biochemical testing for Down syndrome
A similar proficiency testing program (Survey FP) for maternal serum Down syndrome markers serves as one source for comparing error rates in non-DNA testing. In that survey (jointly sponsored by the Foundation for Blood Research and CAP), participating laboratories are asked to measure three biochemical markers, to combine these measurements with a pre-assigned maternal age, and then calculate a Down syndrome risk. Five challenges are distributed, three times each year. The proportion of laboratories with one or more outlying Down syndrome risk estimates on a given distribution is routinely reported to all participants each year (FBR/CAP FP Survey Participant Summary Report, 2000, FP-C). This proportion has remained relatively constant between 1998 and 2000 at about 5 percent. Assuming that the laboratory will have only one (or two) of the five risks classified as being an outlier, the actual error rate per sample distributed is closer to 1 or 2 percent. This is similar to the error rate for the ACMG/CAP MGL survey found in Table 2-1. This analysis is limited to data prior to 2001, since a problem with sample preparation was identified in 2001 and corrected in 2002.
References
Appendix A. Data used to calculate analytic sensitivity and specificity
Table 2-7. Computations for the ACMG/CAP Proficiency Testing Surveys
Response and commentary of the CAP/ACMG Biochemical and Molecular Genetics Resource Committee