Data Extraction Strategy
A data extraction form was used (see Appendix 2 of assessment report) to record details of study design, methods, participants, interventions, testing procedures, outcomes and follow-up. Two reviewers extracted data independently. Differences that could not be resolved through discussion were referred to an arbiter. Reviewers were not blinded to the names of study authors, institutions or publications.
Quality Assessment Strategy
The methodological quality of the diagnostic studies was assessed using the quality assessment of diagnostic accuracy studies (QUADAS) tool developed by the National Health Service (NHS) Centre for Reviews and Dissemination (see Appendix 3 of assessment report). The tool did not incorporate a quality score but was a structured list of 12 questions, covering areas such as spectrum and verification bias, with each question to be answered 'Yes', 'No' or 'Unclear'. Two reviewers independently assessed the quality of the included studies. Any differences that could not be resolved through discussion were referred to an arbiter.
The prognostic studies were assessed using the Downs and Black checklist (see Appendix 4 of assessment report). The checklist assessed the quality of both randomised and non-randomised studies (including cohort studies). Question 27 (study power) was omitted as studies with less than 100 participants were excluded. The adapted checklist, therefore, contained 26 questions, covering the following subscales:
- Reporting (ten questions)
- External validity (three questions)
- Internal validity - bias (seven questions)
- Internal validity - confounding (six questions)
An overall score as well as scores for each of the subscales was calculated. A list of principal confounders and possible adverse events was developed (see Appendix 5 of assessment report) to provide supplementary information to questions 5 and 8 of the checklist. The maximum achievable scores within each subscale were: reporting (11), external validity (3), internal validity - bias (7) and internal validity - confounding (6) providing an overall maximum achievable score of 27.
Synthesis of Diagnostic Studies
Diagnostic performance indexes (sensitivity, specificity, accuracy, predictive values, and likelihood ratios) were extracted and recalculated for each study for both tests (single photon emission computed tomography (SPECT) versus coronary angiography (CA) and stress electrocardiogram (ECG) versus CA) and 2x2 contingency tables of true positive, false positive, false negative and true negative were generated. For studies with missing data (e.g. studies reporting only sensitivity and specificity values) an attempt was made to reconstruct the contingency tables from the data available in the published reports. This proved to be feasible only when the total number of participants, sensitivity, specificity, and accuracy were provided or when the total number of participants, sensitivity, specificity, positive and negative likelihood ratios were known.
Details of the mathematical formulae applied are given in Appendix 6 of the assessment report. Use of the formulae was not always straightforward because in many cases they yielded noninteger values of true positives, false positives, false negatives and true negatives. This was usually because published values of sensitivity and specificity were often given to just two decimal places. In most cases it was possible to find integer values for the contingency tables that yielded the corresponding published values of sensitivity and specificity using the formulae described above. There was, however, a minority of comparisons where no exact match could be found. For example, for the Santana-Boado study the chosen integer values for the 2x2 table for the SPECT versus CA comparison yielded a sensitivity of 0.917 but the reported value of sensitivity was 0.91 and not 0.92. In these cases it was decided to use the data providing the closest match to the published values as the differences were not great and it is likely that the discrepancies were caused by rounding errors.
For the statistical analysis of studies of diagnostic performance the methods suggested by Midgette and colleagues were applied (see Figure 3.1 of the assessment report). They first advocate plotting the true positive rate (sensitivity) versus the false positive rate (1 - specificity) and calculating the Spearman's rank correlation coefficient. If a large positive correlation is noted then this is an indication that calculation of a summary receiver operating characteristic (ROC) curve is desirable. In the absence of a positive correlation, heterogeneity between true and false positive rates is tested using a chi-squared test (or an extension of Fisher's exact test if the numbers are too small). If the data are homogenous it is reasonable to conduct meta-analyses of sensitivities and specificities. Conversely, when data are heterogeneous and not positively correlated a statistical summary is not recommended.
Summary ROC curves for SPECT versus CA and stress ECG versus CA were considered when a positive correlation between the true and false positive rates was found and when a sufficient number of studies was available for each comparison. A ROC curve for a test with high discriminatory power should yield a "path" close to the top-left corner of the plot, indicating that it provides a high true positive rate and a low false positive rate. It is commonly used to describe how different test cut-off points affect the trade-off between sensitivity and specificity.
If appropriate, it was planned to calculate pooled estimates of sensitivity and specificity and their confidence intervals for both SPECT and stress ECG for each comparison. These are averages of the sensitivities and specificities weighted by the inverse of the variance of each study. Studies for which 2x2 table information could not be obtained could not be included in this analysis.
In addition, meta-analyses of positive and negative likelihood ratios were conducted where appropriate. Likelihood ratios express the probability that a certain test result is expected in a patient with the target disorder, as opposed to one without the disorder. For instance, a likelihood ratio of 10 means that a positive test result is 10 times as likely to occur in patients having the disease under investigation (i.e. coronary artery disease [CAD]) than in healthy subjects. A likelihood ratio of one means that the test result does not provide diagnostic information and does not change the probability of the target condition. Likelihood ratios below one indicate a decrease in the probability of the target condition (the smaller the likelihood ratio, the greater the decrease). As likelihood ratios are identical in construction to risk ratios, meta-analyses of positive and negative likelihood ratios were conducted using a random effects model and treated as meta-analyses of risk ratios.