Genomics|HuGENet|Publications|Asymmetry Tests

The appropriateness of asymmetry tests for publication bias in meta-analyses: a large survey
John P.A. Ioannidis and Thomas A. Trikalinos
CMAJ 2007 April;176(8):1091-1096

From the Tufts University School of Medicine (Ioannidis, Trikalinos), Boston, Mass., and the Clinical Trials and Evidence-Based Medicine Unit and Clinical and Molecular Epidemiology Unit, Department of Hygiene and Epidemiology, University of Ioannina School of Medicine, and the Biomedical Research Institute, Foundation for Research and Technology — Hellas (Ioannidis), Ioannina, Greece.

Download print version (257KB)

Background

Statistical tests for funnel-plot asymmetry are common in meta-analyses. Inappropriate application can generate misleading inferences about publication bias. We aimed to measure, in a survey of meta-analyses, how frequently the application of these tests would be not meaningful or inappropriate.

Methods

We evaluated all meta-analyses of binary outcomes with >= 3 studies in the Cochrane Database of Systematic Reviews (2003, issue 2). A separate, restricted analysis was confined to the largest meta-analysis in each of the review articles. In each meta-analysis, we assessed whether criteria to apply asymmetry tests were met: no significant heterogeneity, I2 < 50%, >= 10 studies (with statistically significant results in at least 1) and ratio of the maximal to minimal variance across studies > 4. We performed a correlation and 2 regression asymmetry tests and evaluated their concordance. Finally, we sampled 60 meta-analyses from print journals in 2005 that cited use of the standard regression test.

Results

A total of 366 of 6873 (5%) and 98 of 846 meta-analyses (12%) in the wider and restricted Cochrane data set, respectively, would have qualified for use of asymmetry tests. Asymmetry test results were significant in 7%–18% of the meta-analyses. Concordance between the 3 tests was modest (estimated κ 0.33–0.66). Of the 60 journal meta-analyses, 7 (12%) would qualify for asymmetry tests; all 11 claims for identification of publication bias were made in the face of large and significant heterogeneity.

Interpretation

Statistical conditions for employing asymmetry tests for publication bias are absent from most meta-analyses; yet, in medical journals these tests are performed often and interpreted erroneously.

Publication bias, the selective publication of studies based on whether results are “positive” or not, is a major threat to the validity of clinical research (1–4) . This bias can distort the totality of the available evidence on a research question, which leads to misleading inferences in reviews and meta-analyses. Without up-front study registration, however, this bias is difficult to identify after the fact (5). Many tests have therefore been proposed to help identify publication bias (6).

The most common approaches try to investigate the presence of asymmetry in (inverted) funnel plots (7–10). A funnel plot shows the relation between study effect size and its precision. The premise is that small studies are more likely to remain unpublished if their results are nonsignificant or unfavourable, whereas larger studies get published regardless. This leads to funnel-plot asymmetry. Although visual inspection of funnel plots is unreliable,(11,12) statistical tests can be used to quantify the asymmetry (7–10) .These tests have become popular: one relevant article(8) has been cited more than 1000 times.

The limitations of these tests have been documented for some time. Begg and Mazumdar (7) mentioned in 1994 that the false-positive rates of their popular rank-correlation test were too low. In 2000, Sterne and colleagues (13) showed in a simulation study that the regression method described by Egger and associates (8) was more powerful than the rank correlation test, although the power of either method was low for meta-analyses of 10 or fewer trials. False-positive results were found to be a major concern in the presence of heterogeneity (13,14). To reduce the problem, a modified regression test was developed, (10) and several other tests proposed (6,15). Because they differ in their assumptions and statistical properties, discordant results can be expected with different tests.

There are situations when the use of these tests is clearly inappropriate, and others where their use is futile or meaningless. Application of these tests with few studies is not wrong, but has low statistical power. Application in the presence of heterogeneity is more clearly inappropriate, and may lead to false-positive claims for publication bias (14,16,17). When all available studies are equally large (i.e., have similar precision), the tests are not meaningful. Finally, it makes no sense to evaluate whether studies with significant results are preferentially published when none with significant results have been published.

Methods

We used issue 2, 2003, of the Cochrane Database of Systematic Reviews (n = 1669 reviews). We imported into Stata software all meta-analyses that had binary outcomes and numerical 2 × 2 table information available (n = 12 709) (18). We did not consider studies where no patients in either arm of the study had an event, or all patients in both arms had an event; this eliminated 906 meta-analyses. Zero counts in one arm only were handled in the calculations via the addition of 0.5 to all data cells, which allowed an odds ratio to be calculated without distorting the data appreciably. Meta-analysis data sets were further scrutinized for similarity. When numbers of studies, patients and events were all the same and summary results were identical (to 7 digits of accuracy), the meta-analyses were considered to contain duplicate data sets and only one of them was retained: similarity checks eliminated 761 duplicate meta-analyses. We also excluded meta-analyses where only 2 studies were available (n = 4169), which makes correlation and regression diagnostics impossible to calculate. Thus, our analysis of the wider Cochrane data set included data from 6873 meta-analyses.

The data sets of these meta-analyses are not necessarily independent. Within the same systematic review, different outcomes, contrasts and analyses may be correlated. To minimize correlation, we created a separate, more restricted data set for which we selected one meta-analysis, the one with the largest number of studies, per systematic review. When the largest number of studies was equal in 2 or more of the meta-analyses, we chose the one with the largest number of subjects; if that number was also equal, we chose the one with the largest number of events. The problem of inappropriateness of the asymmetry tests due to limited number of studies was thereby minimized in this analysis of the restricted Cochrane data set of data from 846 meta-analyses.

For each eligible meta-analysis, we evaluated 4 aspects that bear on whether applying an asymmetry test may be meaningful or appropriate. Statistical significance was tested with the χ2-based Q statistic and considered significant for p < 0.10 (2-tailed);(19) the extent of between-study heterogeneity was measured with the I2 statistic and considered large for values of 50% or more (20). The number of included studies was noted; 10 or more was considered sufficient. To see if the difference in precision of the largest and the smallest study was sufficiently large (ratio of extreme values of variances > 4), we noted the ratio of the maximal versus minimal variance (the square of the standard error of estimates) across the included studies. Finally, we recorded whether at least one study had found formally statistically significant results (p < 0.05).

Some debate about the extent to which criteria need be fulfilled for asymmetry tests to be meaningful or appropriate is unavoidable. The thresholds listed above are not very demanding, based on the properties of the tests. Results of analyses with alternative, even more lenient criteria are illustrated in Venn diagrams of the 4 overlapping criteria.

The odds ratio was used as the metric of choice for all the meta-analyses. We documented the degree of overlap of the criteria described above and the number of meta-analyses that would qualify, based not only upon each criterion but also on combinations thereof.

We evaluated each meta-analysis by means of 3 asymmetry tests: the 2 most popular tests in the literature (the Begg– Mazumdar τ rank-correlation coefficient,(7) and the standard regression test of the standardized effect size [i.e., the natural logarithm of the odds ratio divided by its standard error] against its precision [the inverse of the standard error](8) and a new variant, a modified version of the regression test, which has a lower false-positive rate (10). For all tests, statistical significance was claimed for p < 0.10 (2-tailed).(7,8,10) We estimated inferences on the basis of these 3 tests in the entire data sets and in the subsets of meta-analyses fulfilling the appropriateness criteria already described. Pairwise concordance between the 3 tests was assessed with the κ statistic (21).

The Cochrane Handbook for Systematic Reviews of Interventions,(16) has taken a critical stance to the use of these tests. RevMan, the Cochrane Library meta-analysis software, does not include any options for running them, and their use in the Cochrane Library is limited (22). We therefore used a sample of meta-analyses in printed journals to examine whether these tests are used inappropriately in practice. We examined papers published in 2005 that cited the most common reference for the standard regression test,(8) the asymmetry test most commonly used in the current literature. We screened citations in sequential order (as indexed in the Science Citation Index) until we identified 60 meta-analyses in which asymmetry testing had been employed. The 60 meta-analyses examined were within 24 published articles. Although we focused on the standard regression test,(8) we also recorded results from the other 2 tests whenever such data were reported. We examined whether these 60 meta-analyses fulfilled the criteria that we set, what they found, and how they interpreted the application of the test.

Results

In terms of fulfillment of criteria, the most common feasibility problem we encountered in both of our Cochrane data-set analyses was too low a number of studies, with three-quarters or more of the meta-analyses examining fewer than 10 studies (Table 1). Lack of significant studies was also a major issue: of the wider and restricted data sets, about half and a third of the meta-analyses, respectively, included no studies with statistically significant results; a fifth/a quarter had significant or large between-study heterogeneity; and nearly a quarter/ a fifth had a ratio of extreme values of variances of 4 or greater. Only 366 (5%) of the meta-analyses in the wider Cochrane data set and 98 (12%) of those in the restricted Cochrane data set fulfilled all 4 of the original criteria (Figure 1, left).

[view table]

TABLE 1: Statistical characteristics of meta-analyses according to wide or restricted extractions from the Cochrane database

[view figure]

FIGURE 1: Venn diagrams showing the overlap of the subsets of meta-analyses according to our chosen criteria (diagrams to the left: ≥ 1 study with statistically significant results; ≥ 10 studies in the meta-analysis; I2 < 50% with nonsignificant Q; ratio of extreme study variances > 4). For comparison, results when a set of very lenient criteria (right: ≥ 1 significant study; ≥ 5 studies; I2 < 50% regardless of Q; extreme study variances > 2) is used are also depicted. Each set of criteria is likewise shown for our wider data set of meta-analyses (upper diagrams: n = 6873) and for the restricted data set of 1 meta-analysis per systematic review (lower diagrams: n = 846). Shading indicates categories in which substantially more studies met criteria.

Results of the 3 tests showed statistically significant asymmetry in few meta-analyses (Table 2); overall, in the 2 data sets, rates of significant signals (i.e., statistically significant results) varied between 7% and 18%. They tended to be smallest for the correlation test and highest for the unmodified standard regression test, but did not much differ between the 2 data sets. When the data sets were split according to whether meta-analyses met the criteria for applying asymmetry tests or not, significant signals were more prevalent in the meta-analyses that fulfilled the criteria than in those that did not. Nevertheless, even in the former group, the rates of signals varied from 14% to 24%.

[view table]

TABLE 2:

The 3 asymmetry tests had modest concordance across the entire data sets (Table 2, Figure 2); results were largely similar across the wider and restricted Cochrane data sets. Overall, 3% and 4% of the meta-analyses, respectively, gave a significant signal with all 3 tests. In 19% and 22% of the meta-analyses, a result from at least 1 of the 3 tests was significant. Estimated κ values fell generally below 0.5 (range 0.33–0.45) for the concordance of the correlation test with either of the regression diagnostics, and were somewhat higher (0.64–0.66) for concordance between the unmodified and modified regression diagnostics. When analyses were limited to meta-analyses that fulfilled the criteria for asymmetry tests, concordance slightly improved between the correlation and the regression diagnostics (estimated κ 0.39–0.60) and worsened slightly between the unmodified and modified regression diagnostics (estimated κ 0.57–0.59).

[view figure]

FIGURE 2: Venn diagrams disclosing modest concordance in the application of the 3 funnel-plot asymmetry tests to statistically significant results in the wider data set of 6873 meta-analyses (left) and in the restricted data set of 846 meta-analyses (right). Data inside the circles refer only to meta-analyses with significant results with the corresponding test (p < 0.10).

Of the 60 meta-analyses that stated their use of the regression test within the 24 print articles, use of the test was meaningful or appropriate in 7 of the meta-analyses (12%, 95% confidence interval 5%–23%). Of the 24 articles, 6 had at least one meta-analysis where use of the test was appropriate. Twenty-six meta-analyses had significant heterogeneity (all with I2 > 50%), and another 4 had I2 > 50% without statistically significant heterogeneity. Twenty-six meta-analyses were of fewer than 10 studies. Eighteen meta-analyses included no significant studies; 3 had ratios of extreme variances ≤ 4. Four of the 24 articles also reported rank correlation test results (with similar inferences). Another cited the regression test when what had actually been performed were rank correlation tests. One other article apparently used a regression test based on sample size, a different test than the one that was cited.

All 24 articles claimed that the tests were done to estimate publication bias, with a single exception: an article that clarified that the authors tested for “small-study bias, of which publication bias is one potential cause.” Eleven meta-analyses (18%) claimed that there was evidence for publication bias, whereas the other 49 stated that they found no such evidence. All meta-analyses that claimed to have detected publication bias were found to have between-study heterogeneity that was large and statistically significant.

Interpretation

In most meta-analyses, the application of funnel-plot asymmetry tests to detect publication bias is inappropriate or not meaningful. We found a major problem to be lack of a sufficient number of studies; lack of studies with significant results and the presence of heterogeneity were also common issues. In a smaller proportion of meta-analyses, differences in the magnitude of the smallest versus the largest studies were negligible.

When each of 3 asymmetry (“publication bias”) tests were applied, we found a minority of the examined meta-analyses to have a positive signal. About a fifth of the meta-analyses gave a signal with any of the 3 tests; 3%–4% gave consistent signals for asymmetry with all diagnostics. In the absence of a criterion standard about the presence of publication bias, it is impossible to decide whether these figures were low because the tests we examined were underpowered or because publication bias is uncommon. Moreover, concordance among the 3 tests was modest. Automatic and undocumented use of these tests may lead to unreliable inferences.

A survey of 60 recently published meta-analyses from 24 published reports that had cited use of the standard regression test (8) revealed that most had used the test inappropriately. With one exception, all these articles misleadingly equated the results of these tests with the presence or absence of publication bias, ignoring numerous other causes that may underlie differences between small and larger studies (8). Moreover, all signals for publication bias occurred in meta-analyses with large, significant between-study heterogeneity. It is also disquieting that 82% of the meta-analyses were assumed to have no publication bias simply because of a “negative” asymmetry test result.

When these diagnostics give significant signals, this does not necessarily mean that publication bias is present. This applies even when the meta-analyses fulfill all of the 4 eligibility criteria that we considered. In the absence of a prospective registry of studies, publication bias cannot be proven or excluded, because a criterion standard is lacking.

The 4 criteria we used are merely technical and conceptual prerequisites. Even if statistical prerequisites are met, the conceptual assumptions may sometimes not hold. Very large sample size,(11) increased attention to the research question and heightened interest in contradicting previous publications with extreme opposite results may contribute as much or more than statistical significance to dictating publication in selected cases or in entire scientific fields (23).

We used the Cochrane Database of Systematic Reviews because it is by far the largest compilation of meta-analyses. The composition of this database may differ from that of the totality of meta-analyses published (22,24,25). Despite some uneven emphasis on specific diseases in the evolving Cochrane Database of Systematic Reviews,(26) this database is likely to be less selective compared with the meta-analyses that appear in the medical journal literature. Meta-analyses published in printed medical journals are larger but also more likely to have large heterogeneity, because they also include a greater share of nonrandomized studies. In the journal literature, the percentage of meta-analyses where asymmetry tests are applied inappropriately is therefore also very high.

There can be some subjectivity about thresholds for a definition of when a statistical test is meaningful or appropriate. Our criteria tended toward the lenient; use of even more lenient criteria would increase the proportion of appropriateness, but not to very high percentages (Figure 1).

Publication bias is compounded by additional biases that pertain to selective outcome reporting(27,28) and “significance-chasing”(29) in the data published. It would be misleading to claim that all these problems can be addressed with asymmetry tests. Occasionally, in a meta-analysis of many studies, the retrieval of unpublished data may “correct” a funnel-plot asymmetry (30). However, we should caution that, when unpublished data exist, only a portion might possibly be retrievable; so, it is unknown what would happen if data from all studies could be retrieved. Whenever both unpublished and published information is available, the results of these 2 types of evidence should be compared. Nevertheless, as has been stressed repeatedly, prospective registration of clinical studies and of their analyses and outcomes (5,31) may be the only means to properly address publication bias.

In conclusion, meta-analysts should refrain from inappropriate or unmeaningful application of funnel-plot asymmetry tests. Readers should not be misled that publication bias has been documented or excluded according to inappropriate use or interpretation of funnel plots.

Footnotes

This article has been peer reviewed.

Contributors: John Ioannidis originated the study concept and wrote the protocol and manuscript, with input and critical revisions by Thomas Trikalinos. John Ioannidis evaluated the meta-analyses published in printed journals; Thomas Trikalinos performed all the statistical analyses. Both authors interpreted the data from their analyses, and approved the final version of the article for publication.

Competing interests: None declared.

Correspondence to: Dr. John Ioannidis, Department of Hygiene and Epidemiology, University of Ioannina School of Medicine, 45 110 Ioannina, Greece; fax +30 26510 97867; jioannid@cc.uoi.gr

References

Dickersin K, Min YI. Publication bias: the problem that won't go away. Ann N Y Acad Sci 1993;703:135-46.
Easterbrook PJ, Berlin JA, Gopalan R, et al. Publication bias in clinical research. Lancet 1991;337:867-72
Ioannidis JP. Effect of the statistical significance of results on the time to completion and publication of randomized efficacy trials. JAMA 1998;279:281-6.
Stern JM, Simes RJ. Publication bias: evidence of delayed publication in a cohort study of clinical research projects. BMJ 1997;315:640-5.
DeAngelis CD, Drazen JM, Frizelle FA, et al; International Committee of Medical Journal Editors. Clinical trial registration: a statement from the International Committee of Medical Journal Editors. JAMA 2004;292:1363-4.
Rothstein HR, Sutton AJ, Borenstein M, editors. Publication bias in meta-analysis. Prevention, assessment and adjustments. Sussex: John Wiley and Sons; 2005
Begg CB, Mazumdar M. Operating characteristics of a rank correlation test for publication bias. Biometrics 1994;50:1088-101.
Egger M, Davey Smith G, Schneider M, et al. Bias in meta-analysis detected by a simple, graphical test. BMJ 1997;315:629-34.
Sterne JA, Egger M. Funnel plots for detecting bias in meta-analysis: guidelines on choice of axis. J Clin Epidemiol 2001;54:1046-55.
Harbord RM, Egger M, Sterne JA. A modified test for small-study effects in meta-analyses of controlled trials with binary endpoints. Stat Med 2006;25:3443-57.
Tang JL, Liu JL. Misleading funnel plot for detection of bias in meta-analysis. J Clin Epidemiol 2000;53:477-84.
Terrin N, Schmid CH, Lau J. In an empirical evaluation of the funnel plot, researchers could not visually identify publication bias. J Clin Epidemiol 2005;58:894-901.
Sterne JA, Gavaghan D, Egger M. Publication and related bias in meta-analysis: power of statistical tests and prevalence in the literature. J Clin Epidemiol 2000;53:1119-29.
Terrin N, Schmid CH, Lau J, et al. Adjusting for publication bias in the presence of heterogeneity. Stat Med 2003;22:2113-26.
Macaskill P, Walter SD, Irwig L. A comparison of methods to detect publication bias in meta-analysis. Stat Med 2001;20:641-54.
The Cochrane Collaboration. Cochrane handbook for systematic reviews of interventions. Available: http://www.cochrane.org/resources/handbook (accessed 2007 Feb 8).
Ioannidis JP. Differentiating biases from genuine heterogeneity: distinguishing artifactual from substantive effects. In: Roth stein HR, Sutton AJ, Borenstein M, editors. Publication bias in meta-analysis: prevention, assessment and adjustments. Sussex: John Wiley and Sons; 2005. p 287-302.
Ioannidis JP, Trikalinos TA, Zintzaras E. Extreme between-study homogeneity in meta-analyses could offer useful insights. J Clin Epidemiol 2006;59:1023-32.
Lau J, Ioannidis JP, Schmid CH. Quantitative synthesis in systematic reviews. Ann Intern Med 1997;127:820-6.
Higgins JP, Thompson SG. Quantifying heterogeneity in a meta-analysis. Stat Med 2002;21:1539-58.
Cohen J. A coefficient of agreement for nominal scales. Educ Psychol Meas 1960;20:37-46.
Palma S, Delgado-Rodriguez M. Assessment of publication bias in meta-analyses of cardiovascular diseases. J Epidemiol Community Health 2005;59:864-9.
Ioannidis JP, Trikalinos TA. Early extreme contradictory estimates may appeaar in published research: the Proteus phenomenon in molecular genetics research and randomized trials. J Clin Epidemiol 2005;58:543-9.
Jadad AR, Cook DJ, Jones A, et al. Methodology and reports of systematic reviews and meta-analyses: a comparison of Cochrane reviews with articles published in paper-based journals. JAMA 1998;280:278-80.
Shea B, Moher D, Graham I, et al. A comparison of the quality of Cochrane reviews and systematic reviews published in paper-based journals. Eval Health Prof 2002;25:116-29
Mallett S, Clarke M. How many Cochrane reviews are needed to cover existing evidence on the effects of health care interventions? ACP J Club 2003;139:A11
Chan AW, Altman DG. Identifying outcome reporting bias in randomised trials on PubMed: review of publications and survey of authors. BMJ 2005;330:753.
Chan AW, Krleza-Jeric K, Schmid I, et al. Outcome reporting bias in randomized trials funded by the Canadian Institutes of Health Research. CMAJ 2004;171:735-40.
Ioannidis JP. Why most published research findings are false. PLoS Med 2005;2:e124.
Kyzas PA, Loizou KT, Ioannidis JP. Selective reporting biases in cancer prognostic factor studies. J Natl Cancer Inst 2005;97:1043-55.
Sim I, Detmer DE. Beyond trial registration: a global trial bank for clinical trial reporting. PLoS Med 2005;2:e365.