Centers for Disease Control and Prevention Centers for Disease Control and Prevention CDC Home Search CDC CDC Health Topics A-Z site search
National Office of Public Health Genomics
Centers for Disease Control and Prevention
Office of Genomics and Disease Prevention
Site Search

HuGENet Publications

Concordance of functional in vitro data and epidemiological associations in complex disease genetics
John P. A. Ioannidis, MD 1,2,3 and Fotini K. Kavvoura, MD1
Genetics in Medicine 2006 September;86(9):583-593

From the (1) Clinical and Molecular Epidemiology Unit, Department of Hygiene and Epidemiology, University of Ioannina School of Medicine, Ioannina, Greece; (2) Biomedical Research Institute, Foundation for Research and Technology - Hellas, Ioannina, Greece; and (3) Tufts University School of Medicine, Boston, Massachusetts.

line

Abstract

Purpose: We aimed to assess whether epidemiological evidence on genetic associations for complex diseases concord with in vitro functional data.

Methods: We examined 36 studies on bi-allelic markers and 23 studies on haplotypes where investigators had addressed both epidemiological associations and the functional effect of the same gene variants in luciferase reporter systems in vitro.

Results: There was no correlation between epidemiological odds ratios and luciferase activity ratios (-0.09, P = 0.60). Luciferase activity ratios could not tell whether a probed epidemiologic association would be significant or not (area under receiver operating characteristics curve, 0.52). Luciferase results usually were qualitatively similar across cell lines and experimental conditions, with some exceptions. A luciferase activity ratio of 1.44 adequately separated statistically significant from non-significant functional differences (area under receiver operating characteristics curve, 0.95). Binary and continuous disease outcomes usually gave concordant results; other in vitro methods, in particular EMSA, agreed with luciferase results. Selective reporting and use of different variants and contrasts between functional and epidemiological analyses were common in these studies.

Conclusions: In vitro biological data and epidemiology provide independent lines of evidence on complex diseases. We provide suggestions for improving the design and reporting of studies addressing both in vitro and epidemiological effects.

Variability in the human genome amounts to over 5 million polymorphisms, but only a fraction of them has biological and clinical significance.(1-3) Documentation of functional relevance may lead to better insights about various biological pathways and complex disease outcomes. Moreover, epidemiological investigations are further strengthened if gene variants with population-level phenotype associations are also shown to have functional relevance.(3,4)

There are many methods for assessing and establishing the functional relevance of genetic variants.(3) For a few metabolic candidate genes, assays have long been available to measure enzymatic activity, but this applies to only a minority of genes in the current discovery-oriented era. Gene variants that entirely abrogate protein expression or function are also a very small minority. However, for most genes and variants, it is a challenge to evaluate their impact on gene transcription, let alone protein levels and activity. A large component of the genetic variability that impacts on phenotypes and complex diseases may reflect regulatory variation in the human genome.(5) Such variation seems to be very extensive across different nonhuman genomes (6-9) and the same may apply to humans. (8,10)

However, making sense of the epidemiological and clinical meaning of this variation remains a major challenge. A large number of in vitro, ex vivo, and in vivo functional assays are available. While newer technologies still emerge,(10) luciferase reporter systems have been the most popular method for establishing in vitro the functional significance of polymorphisms to date,(3,11,12) especially for variants in regulatory regions (e.g., promoter or enhancer regions). Luciferase reporter systems use constructs that contain segments of a genetic region of interest along with the luciferase gene. These constructs are transfected in cell lines. Experiments can be performed with segments containing different genetic variants. The transcriptional efficiency of the different variants is then measured through luciferase activity. Important questions may be posed. Do these in vitro functional effects correspond to the presence or not of postulated gene-disease associations? Do stronger functional effects of different gene variants correspond also to stronger gene-disease associations? Are the in vitro results consistent across different cell lines and experimental protocols?

Here we evaluated a systematic sample of studies where investigators had reported a probed epidemiologic association of a common disease phenotype and concurrently examined the differential effects of this polymorphism in transfected cell lines with luciferase gene reporter systems. We estimated the empirical concordance between epidemiological and functional biological data, and aimed to obtain insight on how the conduct, reporting and interpretation of studies addressing both functional and epidemiological data could be improved.


Materials and Methods

Identification of eligible studies

Eligible studies for this analysis were retrieved from PubMed using the combination of luciferase and polymorphism. We screened the retrieved articles as of June 2005 for studies that presented epidemiological data from cases and controls with and without a disease or with and without a disease outcome; and data on the activity of the same genetic variant of interest based on a luciferase assay. We excluded studies where only epidemiological data or only functional data were presented in the article. We excluded articles with non-original data, non-English language articles, and articles where very rare variants were described occurring in <1% of the control population. To maximize consistency, for epidemiological data we focused on case-control studies of unrelated subjects (including studies with other design, e.g., cohorts, where case and control status could be inferred) and excluded the sparse available data on family-based designs.

In order to further achieve standardization of the data to be analyzed, we created two datasets of eligible studies. The first dataset focused on bi-allelic polymorphisms and on dichotomous outcomes (including continuous traits, if categorized upfront into two groups by the authors). This dataset included also data on haplotypes of several polymorphisms, whenever there was complete linkage disequilibrium and thus only two haplotypes were available. In this first dataset it would be possible to estimate consistently an odds ratio for the gene-disease association and a luciferase activity ratio for the comparison of the two alleles and perform a quantitative comparison of these data.

The second dataset included all studies where luciferase experiments had been performed with haplotypes of two or more different gene variants (not in perfect linkage disequilibrium). This second dataset allowed to extent a qualitative comparison of the epidemiological and functional inferences with haplotypes, since haplotype analyses have recently become the standard in population genetics.(13,14)

Of the 342 electronically retrieved items, 201 were excluded upon reading the title and abstract, as it was clear that they did not have original data with both luciferase experiments and epidemiological associations in human populations. Of the remaining 141 articles, 80 were excluded as they did not fulfill eligibility criteria, 5 could not be retrieved in full text for further scrutiny, and 56 were eligible for the analysis (36 in the first dataset of bi-allelic markers, 23 in the second dataset of haplotypes [3 articles were common in both datasets]).


Analyzed data

From each eligible article, we extracted data on the authors, year of publication, and the genetic variant(s) or haplotypes of interest where both epidemiologic and functional data were available.

For the bi-allelic marker dataset, we also recorded the 2-by-2 table for cases and controls at the allele level for each eligible genetic variant and outcome of interest and odds ratios were estimated for each 2-by-2 table. When different case-control samples were available from populations of similar ethnic descent, data were merged to obtain a single 2-by-2 table, while data from populations of different ethnic descent or significantly different allele frequencies in their control groups were combined by the Mantel-Haenszel method (15) (a Mantel-Haenszel synthesis was performed also in one study that addressed two types of cancer with separate case and control samples). The odds ratios for the epidemiological association were expressed consistently to show the association of the disease/outcome with the minor allele. We recorded whether this odds ratio was formally statistically significant (P < 0.05) or not and whether the original authors had claimed a significant epidemiological association based on any allele- or genotype-based contrast in the entire population or subgroups thereof. Discrepancies in the level of statistical significance were noted along with their reasons.

For the haplotypes dataset, we recorded whether an analysis had been performed considering all haplotypes with frequency of at least 1% in the study population and if so whether there were formally statistically significant differences. We also noted whether the original authors had claimed a significant epidemiological association based on any allele-, genotype- or haplotype-based contrast in the entire population or subgroups thereof.

We also recorded for each probed association, the data on luciferase experiments. For the bi-allelic maker dataset, we recorded the ratio of luciferase activity with the minor versus major allele construct under baseline conditions as well as whether the difference between the two alleles was formally statistically significant (P < 0.05). When more than one cell type was used, data were recorded separately for each cell type. When data were also provided with various co-stimulation conditions or changed plasmid constructs, these were also recorded separately for each experimental condition. For the haplotype dataset, we similarly recorded the haplotypes, cell lines, and experimental conditions assessed and whether the functional differences were formally statistically significant or not when all tested haplotypes were considered. When P-values were not given for an analysis involving all tested haplotypes, we performed an analysis of variance using the presented mean values and standard deviations.

Assessment of continuous traits is far less common than assessment of binary phenotypes. Nevertheless, for all eligible studies, we also examined whether any additional continuous phenotypes had been evaluated representing the disease under study and whether inferences were similar to those obtained using the binary disease outcomes.

Finally, we recorded information on whether any additional in vitro assays had been used to establish functional differences between gene variants or haplotypes, and if so, what the results had been. The sparse in vivo and ex vivo data were also recorded.

All data were extracted independently by two investigators and discrepancies were resolved with discussion. Consensus was reached on all items.


Analyses

In the bi-allelic marker dataset, we examined whether there is correlation between epidemiological odds ratios and luciferase activity ratios. Data were analyzed either using the minor allele's data as the nominator for both odds ratios and luciferase ratios; or coining both odds ratios and luciferase ratios to be ≥1, a probably biased analysis that forces the biological signal to square with the direction of the epidemiological signal. Analyses were performed using nonparametric Spearman's correlation coefficients (secondary analyses used the parametric Pearson correlation coefficient with both metrics log-transformed).

We also examined whether the absolute values of luciferase activity ratios can tell whether the respective probed epidemiological association would be statistically significant or not; and whether the absolute values of luciferase ratios can tell whether they are also statistically significant or not. We estimated the luciferase activity ratio that would yield a minimum of 90% sensitivity and calculated the respective specificity. All luciferase ratios were coined as ≥1 for these analyses. Analyses were based on receiver operating characteristics curves that plot the sensitivity against the specificity for various cut-offs of absolute luciferase activity ratios. Areas under the ROC curves were estimated. An area of 0.5 shows total lack of concordance (no diagnostic ability) and an area of 1.0 shows perfect concordance (perfect diagnostic ability).

We used analysis of variance to estimate whether variability in luciferase activity ratios was larger between different gene variants or between different cell types and experimental conditions for the same gene variant.

In the haplotypes dataset, we examined whether luciferase and epidemiological inferences agreed or not in the presence of formal statistical significance.

For both datasets, we recorded whether different cell lines or experimental conditions gave luciferase activity ratio estimates that differed in their level of statistical significance. Finally, we examined the concordance of other functional assays that had been used as compared with luciferase results and epidemiological association results.

All analyses were conducted in SPSS 12.0 (SPSS Inc., Chicago, IL) and reported P-values are 2-tailed.


Results

Bi-allelic markers

Of the 36 evaluated bi-allelic polymorphisms(16-51) (Table 1), 28 were located in the 5′-flanking region, 5 were exonic, 2 were intronic, and 1 lay in the 3′-untranslated region. A wide variety of disease phenotypes were probed.

TABLE 1: Studies addressing concurrently epidemiological associations and luciferase experiments on the same alleles for bi-allelic markers

For 29 of the 36 cases, the investigators claimed the presence of a statistically significant epidemiological association (Table 1 and Appendix 1). However, in 8 of the 29 claimed associations, there was no formal statistical significance for the contrast of the two alleles, when all data were analyzed. Significant associations had been based on selected genotype contrasts, often with peculiar choices (e.g., a contrast of both homozygote groups combined vs. heterozygotes) without further justification; or on exploratory subgroup analyses based on age or racial descent, although the results in the selected isolated subgroups did not differ beyond chance compared to the other subjects.(52) In one study, a significant association was seen only in a selected genotype contrast, for the subgroup of younger people, further limited to the sub-subgroup of those carrying a specific genotype of another gene.(37) Based on a priori definitions in our protocol, we considered these eight associations as not formally significant, since they were clearly post hoc explorations. Moreover the direct equivalent of luciferase assays would be allele-based comparisons, since the transfection constructs use alleles.

Luciferase activity ratios versus genetic odds ratios in bi-allelic markers

There was no correlation between the observed luciferase activity ratio and the observed odds ratio in the epidemiological case-control association analysis. Across the 36 topics, the Spearman correlation coefficient was -0.09 (P = 0.60, Pearson correlation coefficient 0.04, P = 0.83) when we considered the geometrical mean of the luciferase activity ratios of different cell lines with baseline experimental conditions and the allele-level odds ratio (Figure 1). When data from different cell lines on the same gene variant were considered as separate data points, the Spearman correlation coefficient was -0.27 (P = 0.06, Pearson correlation coefficient 0.19, P = 0.28), suggesting a small trend for smaller luciferase activity ratios with larger epidemiological effects.

FIGURE 1: Lack of correlation between the observed odds ratio in the case-control epidemiological study and the luciferase activity ratio for the same gene variant. Odds ratios pertain to allele-level estimates for the effect of the minor allele. For direct analogy, the luciferase activity ratio pertains to the activity of the construct containing the minor allele versus the construct with the major allele. Only the baseline experimental conditions for luciferase assays are considered here. When many different cell lines were tested, we used the geometrical mean of the luciferase activity ratios across cell lines. Two outliers are not shown.

We also performed an analysis where all odds ratios and all luciferase activity ratios (geometric means for several cell lines) were also coined to be ≥1. This analysis assumes that the allele that increases the risk of a disease phenotype may either increase or decrease mRNA levels and both increase and decrease count as evidence of biological function that is concordant with the epidemiological effect. Thus, the analysis forces the results toward concordance. Even with this analysis, the correlation coefficient was only 0.24 and not statistically significant (P = 0.17).

Luciferase activity ratios also had absolutely no diagnostic ability for telling whether the respective epidemiological study would show a statistically significant (P < 0.05) or not association. The area under the ROC curve was 0.52 (Figure 2).

FIGURE 2: Receiver operating characteristic (ROC) curve for luciferase activity ratios as a diagnostic test for determining whether the respective epidemiological association would be statistically significant (P < 0.05) or not. The diagonal shows total lack of diagnostic information (no concordance at all) and the observed data are hovering around this diagonal with area under the curve 0.52 (P = 0.82). To achieve a sensitivity of 91%, the specificity is only 26%. Only the baseline experimental conditions for luciferase assays have been considered. Luciferase ratios have been consistently coined to be ≥1, so as to always show the difference between the high- versus low-activity allele. When different cell lines were tested, these have been entered separately in the calculations. Analyses using the geometrical mean of the luciferase activity ratios across different cell lines on the same gene variant yield similar results (area under the curve 0.60, P = 0.31, not shown).

The number of total luciferase experiments and replicates varied from 1 to 39 (Appendix Table 2), but many studies were unclear whether they reported on the number of independent experiments or number of replicates of the same experiment.


Variability across luciferase assay experimental conditions for bi-allelic markers

Across all 99 available datasets (Appendix Table 2), we found that the variation due to different experimental conditions accounted for < 8% of the total variation based on analysis of variance. For 19 gene variants, experiments had been done with two or more different cell lines and/or various experimental conditions. In 12 of them, all cell lines and experimental conditions yielded the same conclusions (always statistically significant differences between the two alleles or always nonstatistically significant differences). For five gene variants there were no significant differences for constructs bearing the two different alleles at baseline conditions, but differences emerged upon stimulation with various substances; one gene variant had opposite effects in different cell lines; and for one gene variant the differential effect of the minor allele was seen only on infected, but not uninfected cell lines. Despite these modest differences, statistically significant luciferase activity ratios in opposite direction were seen for only one gene variant.

The absolute value of the luciferase activity ratio could tell with high accuracy whether it would also be formally statistically significant (P < 0.05) or not - the area under the ROC curve was 0.95. Using a ratio cut-off of 1.44 for the high versus low activity allele had a sensitivity of 91% and specificity of 94% for identifying formally statistically significant differences in function.


Associations involving haplotypes

Twenty-three studies(16,35,45,53-72) performed luciferase experiments using constructs with haplotypes and also addressed epidemiological associations (Table 2). Of the 23 evaluated haplotypes (Table 2), 19 were entirely in the 5′-flanking region, 3 also included intronic or coding regions, and one was intronic.

TABLE 2: Studies addressing concurrently epidemiological associations and luciferase experiments involving haplotypes

Overall, the inferences of epidemiological and luciferase analyses agreed in terms of whether there were statistically significant effects or not in six studies, and disagreed in five studies, while agreement varied in two studies (different results depending on whether allele- or genotype-based analyses were done; or depending on the disease outcome considered). In the remaining 10 studies, the investigators did not perform epidemiological analyses using the haplotypes examined in the luciferase experiments (N = 7 studies) or reported only on specific haplotype contrasts, without considering all haplotypes in the epidemiological analyses (N = 3 studies).

In nine studies, the investigators performed luciferase experiments on selected haplotypes only, and in six of these the selected haplotypes were not chosen with strict preference to the ones that were more common in the study population. In another three studies, the investigators tested in luciferase experiments haplotypes that were nonexistent in the study population (frequency = 0%).

In 10 studies, luciferase experiments were performed with two cell lines and the results were consistent in terms of whether overall statistical significance was present or not in 9 of them (both significant N = 7, both non-significant N = 2, discordant N = 1). In another study, 5 cell lines were evaluated and results agreed in terms of statistical significance with 4 of the 5 cell lines. However, with one exception, in all studies where several cell lines found statistically significant results, the highest luciferase activity was seen for different haplotypes across different cell lines.

In three studies, luciferase experiments were also done with different stimulation conditions. In one study, the results were similar with stimulated and unstimulated conditions,(55) while in the other two studies the same order of activity was seen across haplotypes, but the results became formally significant, while they were non-significant with unstimulated conditions.(68,70)


Continuous disease outcomes

Four of the 36 studies with bi-allelic markers and binary outcomes also evaluated association analyses for continuous traits that would represent the disease under study. Binary and continuous traits usually gave concordant inferences. VKORC1 -1639G>A was significantly associated both with binary-categorized warfarin sensitivity and with the dose of warfarin required.(51) Clock 3111 T>C was not significantly associated with either evening sleep preference or with the τ value for sleep.(38) UGT1A1 -3263 T>G was significantly associated with the risk of binary-categorized hyperbilurubinemia and was also significantly related with the levels of bilirubin in the control group.(46) Finally, IL-8 -251 A>T was significantly associated with the risk of gastric cancer and was also significantly associated with the antral atrophy and metaplasia score, although the latter was seen only in the younger subjects.(36)

Of the 23 studies evaluating haplotypes, one found a statistically significant association between RANTES promoter and a continuous outcome (CD-4 cell depletion), but this was not the same as the binary outcome examined in that same study (HIV infection) for which there was no significant association.(59) One other study of asthma also evaluated the continuous outcomes of forced expiratory volume at one second and bronchial hyper-responsiveness score, but tested only single markers (not haplotypes) for these outcomes.(6) Another study that addressed also associations with serum IgE levels but tested different haplotypes than those tested in the luciferase experiments.(57)


Other in vitro functional assays

In 11 studies, investigators examined also binding signals in electrophoretic mobility shift assays (EMSAs); seven of these studies had addressed in luciferase experiments single bi-allelic markers, three had addressed haplotypes, and one had addressed both. All 11 investigations claimed differences in binding affinity, but only two of them tried to quantify the difference in the signal intensity (described as 1.5-fold (24) and 1.8-fold (31) intensity difference), while the other 9 studies gave qualitative data on whether the signal was weaker, stronger, absent, or different with one of the two alleles.(16,25,37,41,42,49,54,69,71) In one study, three different cell lines were tested and results differed qualitatively across cell lines.(71)

There was modest concordance at best with the epidemiological data. Formally statistically significant epidemiological associations were seen in seven (16,25,31,41,54,69,71) of the 11 investigations.

The results of EMSAs were generally consistent with the inferences of luciferase assays. However, in the three studies where the respective luciferase assays had examined haplotypes, EMSAs did not examine all the polymorphisms involved in the luciferase-tested haplotypes; therefore the full correspondence of the results is difficult. Among the studies of bi-allelic markers, in two investigations (24,25) the luciferase assays did not show consistently significant differences between the two alleles except under special conditions.

Sparse data on other reporter constructs (one study(33)) and real time PCR quantification of mRNA in vitro (three studies (18,33,55)) showed consistent inferences with the respective luciferase data, but agreed with epidemiological inferences only in two (18,55) of the three studies.

Discussion

In the appraised sample of investigations, luciferase results could not tell whether the respective epidemiological association would be formally statistically significant or not. Moreover, larger luciferase activity ratios did not correlate with stronger epidemiological effects. Luciferase activity ratios tended to be qualitatively similar across cell lines and experimental conditions, but exceptions did occur. The available comparative data on other outcomes and functional assays suggested that binary and continuous disease outcomes usually gave concordant results; other in vitro methods, in particular EMSA, agreed with luciferase results.

There is no consensus in the literature on what constitutes a large enough luciferase activity ratio.(5) In theory, very small differences may become formally statistically significant, if many experiments are performed. Conversely, quite large differences may be dismissed as non-significant, if only one or few experiments are performed. Luciferase studies should explicitly describe how many independent experiments were performed and how many replicates were done in each experiment; this information was often difficult to decipher in the analyzed studies. A sufficient number of experiments is needed, since luciferase assays have some unavoidable variance. Nevertheless, in the assembled database a cutoff of 1.44 adequately differentiated significant from non-significant luciferase activity ratios. Efforts need to be made to standardize further functional assays and their interpretation across laboratories. Our finding does not necessarily mean that ratios as low as 1.5 are always biologically important. Such values are very low compared to what is typically seen for the effects of mutations in monogenetic disorders, but for multigenetic effects, relatively small differences should not be dismissed lightly.

In our analysis, we focused on in vitro functional data. Information on in vivo and ex vivo functional assays in the analyzed studies was very limited, but it suggested that there was modest to good concordance with epidemiological data (Appendix Table 3). Jais has conducted a far more comprehensive, extensive review of gene expression in healthy versus diseased tissues for genetic variants involved in replicated genetic associations.(4) This evaluation concluded that many epidemiological associations are accompanied by significant differences in tissue gene expression. As with our in vitro data, the absolute differences in biological signals were modest at best. It is reasonable to expect that biological effects measured ex vivo are likely to be closer to the epidemiological associations than in vitro functional effects. However, obtaining such ex vivo data are more difficult.

We observed some common problems in the literature that we analyzed. First, results were often selectively reported for particular genetic contrasts, variants, haplotypes, or population subgroups. Second, some studies used different genetic variants and contrasts in epidemiological versus functional analyses and thus these lines of evidence were not directly comparable. Third, luciferase experiments were often performed only for selected haplotypes, not necessarily the most frequent ones. As haplotypes analyses have now become the norm for investigations of human variation, these design and reporting problems can create confusion and spurious claims. Some investigators may have reported preferentially their best data(73,74) and may have strived to show that there is concordance between their epidemiological and biological data.(75)Thus, if anything, published data may be biased in favor of agreement between epidemiological and functional data. However, we found little concordance.

Most studies did not evaluate more than one functional assay. We should acknowledge that luciferase assays are one of many possible functional assays. Generalization across assays should be made cautiously. Different functional assays may provide complementary insights. Their results should not be forced to fit with those of other assays or clinical data using spurious contrasts and analyses. There is a continuum between binary disease categorizations, continuous traits, in vivo functional measurements, ex vivo experiments, and in vitro functional data. This continuum should be examined without preconceptions on whether results should agree across these different experimental levels. Comprehensive, comparable analyses with no selection bias in reporting should allow maximizing our insight about the credibility of postulated gene-disease associations and their biological background. Table 3 summarizes some suggestions on how to achieve this goal based on the empirical data that we examined. Moreover, it should be anticipated that in contrast to monogenetic disorders where functional approaches show large effects in line with very high odds ratios, for multigenetic heritability due to common genetic variation, both functional and epidemiological effects are likely to be very modest, and need careful design and optimal measurements.

TABLE 3: Considerations for studies addressing both epidemiological and functional effects of genetic variation

Some caveats should be discussed. First, our evaluation used a convenience sample of studies that involved both functional and epidemiological data. It would be impractical or even impossible to identify all studies that have performed both types of research. We simply used a systematic sample that would be large enough to answer our questions appropriately. Moreover, for some of these gene variants and associations, other investigators may have performed independent studies. However, we wanted to see whether there is concordance under what are, in theory, the most favorable circumstances, i.e., in the hands of the same team performing the epidemiological and biological analyses. This caveat reinforces our basic observation of lack of agreement.

We also found that the luciferase results were relatively robust to different experimental conditions. Selective reporting of best results is less likely to be a problem here. While discrepancies of epidemiology and biology may have been seen as unattractive to publish, several authors seemed to dwell with interest on the differential luciferase assay results obtained with different conditions and tried to build complex biological explanations around them.(19,32,36) However, while exceptions did occur, usually different cell lines gave largely similar inferences. Differences are more common with different stimulation conditions; for haplotypes analyses, the exact order of haplotypes in terms of luciferase activity varied across cell lines and experimental conditions, but full reversal of the order with different conditions was seen only in one study. Interpretation of such differences should be cautious. It is difficult to reproduce in an in vitro system the exact biological milieu that leads to a complex disease phenotype.(5) The same applies to more recently developed functional assays(76-78) and their reproducibility needs to be empirically evaluated across many studies, as we did for the luciferase reporter systems.

Functional gene variants are very common,(79) especially among promoter polymorphisms.(80,81) However, the link to specific postulated associations for pinpointed phenotypes is difficult. The lack of concordance between epidemiological and luciferase data may be due to many reasons. The epidemiological associations may not be accurate and may not even be replicated.(82-84) Even for well-documented functional variants, altered gene expression may have a different impact on the risk for different diseases, and often it is not possible to guess which disease would be most relevant for each functional variant. Alternatively, the luciferase experiments may not be capturing the biological effect, which may even involve a pathway other than transcription. For markers in linkage disequilibrium with the true functional variant, the luciferase assays may or may not capture the transcriptional effect, depending on whether the true marker is also included in the construct and whether the linkage disequilibrium is very strong or weak. Therefore, it should not be very surprising that these two lines of epidemiological and in vitro evidence provide largely independent information. Investigators in complex disease genetics should approach epidemiological and biological lines of evidence without any preconception or prejudice about their concordance. These lines of experimentation provide complementary evidence that needs to be carefully integrated rather than forced to fit.


Acknowledgements

This work was funded by a PENED grant from the General Secretariat for Research and Technology, Greece and the European Commission. Contributions: JPAI had the original idea and drafted the first protocol. Both authors worked on the final protocol, both performed data extraction and both analyzed the data and interpreted the results. JPAI wrote the manuscript and both authors worked on finalizing the manuscript.


References

  1. Glazier AM, Nadeau JH, Aitman TJ. Finding genes that underlie complex traits. Science 2002;298:2345-2349.
  2. Weiss KM, Terwilliger JD. How many diseases does it take to map a gene with SNPs? Nat Genet;26: 151-157.
  3. Rebbeck TR, Spitz M, Wu X. Assessing the function of genetic variants in candidate gene association studies. Nat Rev Genet 2004;5:589-597.
  4. Jais P. How frequent is altered gene expression among susceptibility genes to human complex disorders? Genet Med 2005;7:83-96.
  5. Stranger BE, Dermitzakis ET. The genetics of regulatory variation in the human genome. Hum Genomics 2005;2:126-131.
  6. Brem RB, Yvert G, Clinton R, Kruglyak L. Genetic dissection of transcriptional regulation in budding yeast. Science 2002;296:752-755.
  7. Ronald J, Brem RB, Whittle J, Kruglyak L. Local regulatory variation in Saccharomyces cerevisiae. PLoS Genet 2005;1:E25.
  8. Schadt EE, Monks SA, Drake TA, Lusis AJ, et al. Genetics of gene expression surveyed in maize, mouse and man. Nature 2003;422:297-302.
  9. Storey JD, Akey JM, Kruglyak L. Multiple locus linkage analysis of genomewide expression in yeast. PLoS Biol 2005;3:E267.
  10. Stranger BE, Forrest MS, Clark AG, Minichiello MJ, et al. Genome-wide associations of gene expression variation in humans. PLoS Genet 2005;1:E78.
  11. Nordeen SK. Luciferase reporter gene vectors for analysis of promoters and enhancers. Biotechniques 1998;6:454-458.
  12. Olansky L, Welling C, Giddings S, Adler S, et al. A variant insulin promoter in non-insulin-dependent diabetes mellitus. J Clin Invest 1992;89:1596-1602.
  13. The International HapMap Project. Nature 2003;426: 789-796.
  14. The International HapMap Consortium A haplotype map of the human genome. Nature 2005;437: 1299-1320.
  15. Mantel N, Haenszel W. Statistical aspects of the analysis of data from retrospective studies of disease. J Natl Cancer Inst 1959;22:719-748.
  16. An P, Nelson GW, Wang L, Donfield S, et al. Modulating influence on HIV/AIDS by interacting RANTES gene variants. Proc Natl Acad Sci U S A 2002;99:10002-10007.
  17. Cao H van der Veer E, Ban MR, Donfield S, et al. Promoter polymorphism in PCK1 (phosphoenolpyruvate carboxykinase gene) associated with type 2 diabetes mellitus. J Clin Endocrinol Metab 2004;89:898-903.
  18. Cattaruzza M, Guzik TJ, Slodowski W, Pelvan A, et al. Shear stress insensitivity of endothelial nitric oxide synthase expression as a genetic risk factor for coronary heart disease. Circ Res 2004;95:841-847.
  19. Donn R, Alourfi Z, De Benedetti F, Meazza C, et al. Mutation screening of the macrophage migration inhibitory factor gene: positive association of a functional polymorphism of macrophage migration inhibitory factor with juvenile idiopathic arthritis. Arthritis Rheum 2002;46:2402-2409.
  20. Fishman D, Faulds G, Jeffery R, Mohamed-Ali V, et al. The effect of novel polymorphisms in the interleukin-6 (IL-6) gene on IL-6 transcription and plasma IL-6 levels, and an association with systemic-onset juvenile chronic arthritis. J Clin Invest 1998;102:1369-1376.
  21. Gonzalez P, Diez-Juan A, Coto E, Alvarez V, et al. A single-nucleotide polymorphism in the human p27kip1 gene (-838C>A) affects basal promoter activity and the risk of myocardial infarction. BMC Biol 2004;2:5.
  22. Helbig KJ, George J, Beard MR. A novel I-TAC promoter polymorphic variant is functional in the presence of replicating HCV in vitro. J Clin Virol 2005;32:137-143.
  23. Horikawa Y, Yamasaki T, Nakajima H, Shingu R, et al. Identification of a novel variant in the phosphoenolpyruvate carboxykinase gene promoter in Japanese patients with type 2 diabetes. Horm Metab Res 2003;35:308-312.
  24. Juliger S, Bongartz M, Luty AJ, Kremsner PG, et al. Functional analysis of a promoter variant of the gene encoding the interferon-gamma receptor chain I. Immunogenetics 2003;54:675-680.
  25. Karban AS, Okazaki T, Panhuysen CI, Gallegos T, et al. Functional annotation of a novel NFKB1 promoter polymorphism that increases risk for ulcerative colitis. Hum Mol Genet 2004;13:35-45.
  26. Kim YS, Kang D, Kwon DY, Park WY, et al. Uteroglobin gene polymorphisms affect the progression of immunoglobulin A nephropathy by modulating the level of uteroglobin expression. Pharmacogenetics 2001;11:299-305.
  27. Li YH, Chen CH, Yeh PS, Lin HJ, et al. Functional mutation in the promoter region of thrombomodulin gene in relation to carotid atherosclerosis. Atherosclerosis 2001;154:713-719.
  28. Li Q, Athan ES, Wei M, Yuan E, et al. TP73 allelic expression in human brain and allele frequencies in Alzheimer's disease. BMC Med Genet 2004;5:14.
  29. Luedecking EK, DeKosky ST, Mehdi H, Ganguli M, et al. Analysis of genetic polymorphisms in the transforming growth factor-beta1 gene and the risk of Alzheimer's disease. Hum Genet 2000;106:565-569.
  30. Mori H, Okazawa H, Iwamoto K, Maeda E, et al. A polymorphism in the 5′ untranslated region and a Met229->Leu variant in exon 5 of the human UCP1 gene are associated with susceptibility to type II diabetes mellitus. Diabetologia 2001;44:373-376.
  31. Morris BJ, Markus A, Glenn CL, Adams DJ, et al. Association of a functional inducible nitric oxide synthase promoter variant with complications in type 2 diabetes. J Mol Med 2002;80:96-104.
  32. Nakamura S, Kugiyama K, Sugiyama S, Miyamoto S, et al. Polymorphism in the 5′-flanking region of human glutamate-cysteine ligase modifier subunit gene is associated with myocardial infarction. Circulation 2002;105:2968-2973.
  33. Neve B, Fernandez-Zapico ME, Ashkenazi-Katalan V, Dina C, et al. Role of transcription factor KLF11 and its diabetes-associated gene variants in pancreatic beta cell function. Proc Natl Acad Sci U S A 2005;102:4807-4812.
  34. Niesler B, Flohr T, Nothen MM, Fischer C, et al. Association between the 5′ UTR variant C178T of the serotonin receptor gene HTR3A and bipolar affective disorder. Pharmacogenetics 2001;11:471-475.
  35. Noguchi E, Nishimura F, Fukai H, Kim J, et al. An association study of asthma and total serum immunoglobin E levels for Toll-like receptor polymorphisms in a Japanese population. Clin Exp Allergy 2004;34:177-183.
  36. Ohyauchi M, Imatani A, Yonechi M, Asano N, et al. The polymorphism interleukin 8-251 A/T influences the susceptibility of Helicobacter pylori related gastric diseases in the Japanese population. Gut 2005;54:330-335.
  37. Riazanskaia N, Lukiw WJ, Grigorenko A, Korovaitseva G, et al. Regulatory region variability in the human presenilin-2 (PSEN2) gene: potential contribution to the gene activity and risk for AD. Mol Psychiatry 2002;7:891-898.
  38. Robilliard DL, Archer SN, Arendt J, Lockley SW, et al. The 3111 Clock gene polymorphism is not associated with sleep and circadian rhythmicity in phenotypically characterized human subjects. J Sleep Res 2002;11:305-312.
  39. Rusin M, Zientek H, Krzesniak M, Malusecka E, et al. Intronic polymorphism (1541-1542delGT) of the constitutive heat shock protein 70 gene has functional significance and shows evidence of association with lung cancer risk. Mol Carcinog 2004;39:155-163.
  40. Sasaki Y, Ihara K, Matsuura N, Kohno H, et al. Identification of a novel type 1 diabetes susceptibility gene, T-bet. Hum Genet 2004;115:177-184.
  41. Shin Y, Kim IJ, Kang HC, Park JH, et al. The E-cadherin -347G->GA promoter polymorphism and its effect on transcriptional regulation. Carcinogenesis 2004;25:895-899.
  42. Shin Y, Kim IJ, Kang HC, Park JH, et al. A functional polymorphism (-347 G->GA) in the E-cadherin gene is associated with colorectal cancer. Carcinogenesis 2004;25:2173-2176.
  43. Spurdle AB, Goodwin B, Hodgson E, Hopper JL, et al. The CYP3A4*1B polymorphism has no functional significance and is not associated with risk of breast or ovarian cancer. Pharmacogenetics 2002;12:355-366.
  44. Spurlock G, Heils A, Holmans P, Williams J, et al. A family based association study of T102C polymorphism in 5HT2A and schizophrenia plus identification of new polymorphisms in the promoter. Mol Psychiatry 1998;3:42-49.
  45. Su K, Wu J, Edberg JC, Li X, et al. A promoter haplotype of the immunoreceptor tyrosine-based inhibitory motif-bearing FcgammaRIIb alters receptor expression and associates with autoimmunity. I. Regulatory FCGR2B polymorphisms and their association with systemic lupus erythematosus. J Immunol 2004;172:7186-7191.
  46. Sugatani J, Yamakawa K, Yoshinari K, Machida T, et al. Identification of a defect in the UGT1A1 gene promoter and its association with hyperbilirubinemia. Biochem Biophys Res Commun 2002;292:492-497.
  47. Sugawara F, Yamada Y, Watanabe R, Ban N, et al. The role of the TSC-22 (-396) A/G variant in the development of diabetic nephropathy. Diabetes Res Clin Pract 2003;60:191-197.
  48. Tsunemi Y, Komine M, Sekiya T, Saeki H, et al. The -431C>T polymorphism of thymus and activation-regulated chemokine increases the promoter activity but is not associated with susceptibility to atopic dermatitis in Japanese patients. Exp Dermatol 2004;13:715-719.
  49. Wu J, Metz C, Xu X, Abe R, et al. A novel polymorphic CAAT/enhancer-binding protein beta element in the FasL gene promoter alters Fas ligand expression: a candidate background gene in African American systemic lupus erythematosus patients. J Immunol 2003;170:132-138.
  50. Yang B, Cross DF, Ollerenshaw M, Millward BA, et al. Polymorphisms of the vascular endothelial growth factor and susceptibility to diabetic microvascular complications in patients with type 1 diabetes mellitus. J Diabetes Complications 2003;17:1-6.
  51. Yuan HY, Chen JJ, Lee MT, Wung JC, et al. A novel functional VKORC1 promoter polymorphism is associated with inter-individual and inter-ethnic differences in warfarin sensitivity. Hum Mol Genet 2005;14:1745-1751.
  52. Ioannidis JP, Ntzani EE, Trikalinos TA. 'Racial' differences in genetic effects for complex diseases. Nat Genet 2004;36:1312-1318.
  53. Arinami T, Gao M, Hamaguchi H, Toru M. A functional polymorphism in the promoter region of the dopamine D2 receptor gene is associated with schizophrenia. Hum Mol Genet 1997;6:577-582.
  54. Fitze G, Appelt H, Konig IR, Gorgens H, et al. Functional haplotypes of the RET proto-oncogene promoter are associated with Hirschsprung disease (HSCR). Hum Mol Genet 2003;12:3207-3214.
  55. Giedraitis V, He B, Huang WX, Hillert J. Cloning and mutation analysis of the human IL-18 promoter: a possible role of polymorphisms in expression regulation. J Neuroimmunol 2001;112:146-152.
  56. Herrmann SM, Funke-Kaiser H, Schmidt-Petersen K, Nicaud V, et al. Characterization of polymorphic structure of cathepsin G gene: role in cardiovascular and cerebrovascular diseases. Arterioscler Thromb Vasc Biol 2001;21:1538-1543.
  57. Howard TD, Postma DS, Hawkins GA, Koppelman GH, et al. Fine mapping of an IgE-controlling gene on chromosome 2q: Analysis of CTLA4 and CD28. J Allergy Clin Immunol 2002;110:743-751.
  58. Kwok JB, Teber ET, Loy C, Hallupp M, et al. Tau haplotypes regulate transcription and are associated with Parkinson's disease. Ann Neurol 2004;55:329-334.
  59. Liu H, Chao D, Nakayama EE, Taguchi H, et al. Polymorphism in RANTES chemokine promoter affects HIV-1 disease progression. Proc Natl Acad Sci U S A 1999;96:4581-4585.
  60. Maruyama H, Toji H, Harrington CR, Sasaki K, et al. Lack of an association of estrogen receptor alpha gene polymorphisms and transcriptional activity with Alzheimer disease. Arch Neurol 2000;57:236-240.
  61. Meyer J, Saam W, Mossner R, Cangir O, et al. Evolutionary conserved microsatellites in the promoter region of the 5-hydroxytryptamine receptor 2C gene (HTR2C) are not associated with bipolar disorder in females. J Neural Transm 2002;109:939-946.
  62. Ono K, Goto Y, Takagi S, Baba S, et al. A promoter variant of the heme oxygenase-1 gene may reduce the incidence of ischemic heart disease in Japanese. Atherosclerosis 2004;173:315-319.
  63. Rosenzweig SD, Schaffer AA, Ding L, Sullivan R, et al. Interferon-gamma receptor 1 promoter polymorphisms: population distribution and functional implications. Clin Immunol 2004;112:113-119.
  64. Sayers I, Barton S, Rorke S, Beghe B, et al. Allelic association and functional studies of promoter polymorphism in the leukotriene C4 synthase gene (LTC4S) in asthma. Thorax 2003;58:417-424.
  65. Sayers I, Barton S, Rorke S, Sawyer J, et al. Promoter polymorphism in the 5-lipoxygenase (ALOX5) and 5-lipoxygenase-activating protein (ALOX5AP) genes and asthma susceptibility in a Caucasian population. Clin Exp Allergy 2003;33:1103-1110.
  66. Spence JP, Liang T, Eriksson CJ, Taylor RE, et al. Evaluation of aldehyde dehydrogenase 1 promoter polymorphisms identified in human populations. Alcohol Clin Exp Res 2003;27:1389-1394.
  67. Taniguchi K, Yang P, Jett J, Bass E, et al. Polymorphisms in the promoter region of the neutrophil elastase gene are associated with lung cancer development. Clin Cancer Res 2002;8:1115-1120.
  68. Torisu H, Kusuhara K, Kira R, Bassuny WM, et al. Functional MxA promoter polymorphism associated with subacute sclerosing panencephalitis. Neurology 2004;62:457-460.
  69. Wang H, Parry S, Macones G, Sammel MD, et al. Functionally significant SNP MMP8 promoter haplotypes and preterm premature rupture of membranes (PPROM). Hum Mol Genet 2004;13:2659-2669.
  70. Wu YR, Wang CK, Chen CM, Hsu Y, et al. Analysis of heat-shock protein 70 gene polymorphisms and the risk of Parkinson's disease. Hum Genet 2004;114:236-241.
  71. Yu C, Zhou Y, Miao X, Xiong P, et al. Functional haplotypes in the promoter of matrix metalloproteinase-2 predict risk of the occurrence and metastasis of esophageal cancer. Cancer Res 2004;64:7622-7628.
  72. Yuan X, Yamada K, Ishiyama-Shigemoto Koyama W, et al. Identification of polymorphic loci in the promoter region of the serotonin 5-HT2C receptor gene and their association with obesity and type II diabetes. Diabetologia 2000;43:373-376.
  73. Chan AW, Hrobjartsson A, Haahr MT, Gotzsche PC, et al. Empirical evidence for selective reporting of outcomes in randomized trials: comparison of protocols to published articles. JAMA 2004;291:2457-2465.
  74. Kyzas PA, Loizou KT, Ioannidis JP. Selective reporting biases in cancer prognostic factor studies. J Natl Cancer Inst 2005;97:1043-1055.
  75. Cardon LR, Bell JI. Association study designs for complex diseases. Nat Rev Genet 2001;2:91-99.
  76. Knight JC, Keating BJ, Rockett KA, Kwiatkowski DP. In vivo characterization of regulatory polymorphisms by allele-specific quantification of RNA polymerase loading. Nat Genet 2003;33:469-475.
  77. Ng PC, Henikoff S. Predicting deleterious amino acid substitutions. Genome Res 2001;11:863-874.
  78. Ramensky V, Bork P, Sunyaev S. Human non-synonymous SNPs: server and survey. Nucleic Acids Res 2002;30:3894-3900.
  79. The ENCODE (ENCyclopedia Of DNA Elements) Project. Science 2004;306:636-640.
  80. Buckland PR, Coleman SL, Hoogendoorn B, Guy C, et al. A high proportion of chromosome 21 promoter polymorphisms influence transcriptional activity. Gene Expr 2004;11:233-239.
  81. Hoogendoorn B, Coleman SL, Guy CA, Smith K, et al. Functional analysis of human promoter polymorphisms. Hum Mol Genet 2003;12:2249-2254.
  82. Ioannidis JP. Genetic associations: false or true? Trends Mol Med 2003;9:135-138.
  83. Ioannidis JP. Why most published research findings are false. PLoS Med 2005;2:E124.
  84. Ioannidis JP, Ntzani EE, Trikalinos TA, Contopoulos-Ioannidis DG. Replication validity of genetic association studies. Nat Genet 2001;29:306-309.
Page last reviewed: December 5, 2007 (archived document)
Content Source: National Office of Public Health Genomics
  Last Updated February 20, 2008