Division of Cancer Biology Home Page Office of the Director Cancer Cell Biology Branch Cancer Etiology Branch Cancer Immunology and Hematology Branch DNA and Chromosome Aberrations Branch Structural Biology and Molecular Applications Branch Tumor Biology and Metastasis Branch Integrative Cancer Biology Program Mouse Models of Human Cancer Consortium Tumor Microenvironment Network

Think Tank Home | Tumor Immunology | The Tumor Microenvironment | Tumor Stem Cells and Self-Renewal Genes | Cell Decisions in Response to DNA Damage | Cancer Etiology | Epigenetic Mechanisms in Cancer | Inflammation and Cancer | Cancer Susceptibility and Resistance

 

Executive Summary of the NCI Think Tank on Variation in Susceptibility to Cancers

Executive Summary of the NCI Think Tank on Variation in Susceptibility to Cancers

 

At present, we assume that cancer develops in a particular person because of her/his unique genetic composition and exposures, and the complex interplay among them.  Intensive studies of strongly cancer-prone families inheriting the same mutant allele of a cancer gene illustrate that more than just pre-disposing genes confer and modify risk of developing cancer.  Many parameters of the malignancy – severity, timing, therapeutic response, and even manifestation of the cancer itself – may differ markedly among cancer-prone siblings and between generations.  Technological advances enable us to identify many of the familial and non-familial genes and aberrant processes that confer some of the susceptibility to cancer, and to assess the probability that certain environmental exposures contribute to the likelihood of disease.  However, we require a fuller exposition of the genetic architecture of cancers as complex traits if we are to define population and individual cancer risk more accurately.  Genetic architecture is the fusion of the effects of interactions among genes and environmental perturbants; it is thus more than the sum of those individual factors.  To achieve significant advances in insight about cancer risk, we must propose and test new theoretical models of genetic architecture and use experimental model systems to test the resulting hypotheses about individual susceptibility and resistance factors, their interactions, and their manifestations.  This will require that we redirect many of our on-going epidemiologic and cancer modeling efforts toward coordinate activities that play to the strengths of each research community and its unique contribution to the collective effort.

 

In addition to epidemiologists and cancer modelers, the scientific redirection includes augmenting the intellectual inputs to include computational biologists, mathematical modelers, and evolutionary biologists, among others.  To encourage and sustain the new interactions requires access to human specimens and data, development of new analytical strategies to tease apart the contributors to genetic architecture and new algorithms to model genetic and non-genetic interactions, interdisciplinary training in computational analysis and modeling, and computer systems capable of rapid high-level data analyses.

 

Vision and Overall Goal:  To understand the complex interactions of genetic and environmental influences; to predict the risks of cancers for individuals and populations; to enhance prevention, perfect diagnosis, target treatment, improve prognosis prediction, and greatly reduce the burden of cancers.

 

To achieve these goals, the scientific community must: 

·        Integrate studies of experimental model systems, human populations, and computational models of molecular signatures in normal and dysregulated states relevant to particular features of cancer susceptibility.

·        Improve cancer-related phenotyping (i.e., the analysis of molecular and systemic changes in cancer progression) by applying emerging sensor, imaging, nano-and molecular technologies to individual humans and animals, biological specimens, cells, extracellular matrix components, and cell-cell interactions.

·        Identify sources of individual variation in cancer-related biological processes, both genetic and non-genetic, and model the genetic-environmental and stochastic interactions.

·        Develop and apply technologies to investigate genetic variation broadly, including alleles, SNPs, haplotypes, and epigenetic alterations of nuclear and mitochondrial genomes, miRNAs, highly repetitive sequences, transposons, gene copy number polymorphisms and other evolutionary molecular features.

 

Roadblocks and Challenges that currently hamper progress in understanding cancer susceptibility were identified.  Roadblocks include the cost and limited scope of current high-throughput technologies for the molecular characterization of tissues and tumors and the genotyping of individuals; the inadequate scope of systemic clinical measures of cancer initiation and progression; the difficulty of assessing environmental exposures relevant to cancer development; the lack of an adequately large human cohort for prospective studies of susceptibility, along with the necessary infrastructure and biological repositories; the need for long-standing informed consent valid for multiple studies; and the serious lack of infrastructure and requirements for data sharing and standardization.  The major challenges are the development of new statistical and computational approaches adequate to deal with the level of complexity believed to exist in cancer susceptibility, involving multiple genes and environmental factors, interacting in complex, non-linear ways; the integration of human studies with those in experimental systems to optimize understanding of carcinogenic mechanisms; and developing interdisciplinary training and research environments that incorporate the perspectives of biologists and computational scientists.

 

 

Introduction

 

The overall goal of defining cancer susceptibility in terms of the interaction of genetic variation and environmental factors is an ambitious one that cannot be achieved with a single line of investigation.  It will require integrating studies in experimental model systems, human populations, and computational models.  Studies are required not only on frank malignancies, but on normal physiology and dysregulated states along the continuum from the earliest phase of initiation through metastasis.  Several specific requirements were identified:

 

1.  Improve cancer-related phenotyping (i.e., the analysis of molecular and systemic changes in cancer progression) by applying emerging sensor, imaging, nano-and molecular technologies to individual humans and animals, biological specimens, cells, extracellular matrix components, and cell-cell interactions.

 

Cancers are a very heterogeneous collection of diseases; correlating phenotype with genotype depends on a detailed knowledge of the clinical manifestations of each cancer type.  Susceptibility studies are designed to find and test associations between genotype and a specific cancer phenotype.  Early studies of tumor classification through the use of DNA microarrays, as in NCI’s Director’s Challenge program, suggest that molecular phenotyping is a powerful approach to stratify cancers in ways not possible using histopathology and the few previously available molecular markers of cancer behavior.  Such disease stratification will sharpen the ability to study susceptibility.

 

Although microarray studies are useful, and allow researchers to ascertain the expression level of thousands of genes simultaneously, expression analysis alone does not meet the need for a complete phenotyping program.  Techniques to expose epigenetic changes and to define the serum and cellular proteome are the next frontiers.  Studies of epigenetic change are important because DNA sequence is not the only heritable change in cells and individuals.  Proteomics is important because mRNA levels correlate imperfectly with protein levels and proteins are subject to many post-translational modifications that strongly influence their functions.  Metabolomics will become increasingly important as the field of cancer phenotyping matures.  Moreover, there is a need to evaluate not only absolute levels of genes and gene products, but their rates of change.

 

Studies of normal tissues and the tumor microenvironment further demonstrate that it is insufficient to determine the phenotype of the cancer cell; the properties of tissues and alterations of those properties as tumors emerge are what is most relevant.  The behavior of a cancer cell is influenced by the cells and molecules with which it has contact; fibroblasts, infiltrating lymphoid and myeloid cells, endothelial cells, the extracellular matrix, and soluble molecules, such as growth factors, cytokines, chemokines, nutrients, and oxygen, all influence tumor phenotypes.

 

Characterizing tumor phenotypes at this level will benefit many areas of cancer research, and such studies are proceeding.  However, it is worthwhile emphasizing how critical these studies are for susceptibility research.

 

2.  Examine all cancer-related biological processes for sources of individual variation, both genetic and non-genetic, and model the genetic-environmental and stochastic interactions of genetic networks and cancer phenotypes.

                                

Susceptibility to cancer results from both genetic and non-genetic individual variation within a human population.  Genetic variation is commonly equated with polymorphisms in germline DNA sequence, but this is an over-simplification.  Among other potentially relevant genetic complexities are:  Imprinting; X-chromosome inactivation; mitochondrial DNA; segmental inversions; deletion or alteration of regulatory elements; jumping elements; tri-nucleotide repeats; and others.  Although susceptibility to many cancers is influenced by multiple genes, a recent study suggests that genetic variation is a minor component in cancer susceptibility, contributing perhaps 25% to the risk.  The rest is attributable to a wide range of environmental factors, including smoking, diet, hormonal status, chemical exposures, microbial flora, and history of inflammation.

 

Environmental factors add unsolved methodological complexities to the evaluation of cancer susceptibility.  Genotypes can be determined with high precision, but current ways of quantitating environmental exposures generally do not capture all the information needed for a large study.  Dietary histories, for example, are notoriously inaccurate.  Biomarkers of most chemical exposures are lacking.  Microbial flora and inflammation are increasingly recognized as factors in cancer development, but no consensus exists on which factors to measure or how to evaluate their impact.

 

With so many challenging variables to deal with in susceptibility studies, it is tempting to make the simplifying assumption that the contribution of each susceptibility factor is independent.  However, the available evidence indicates that this would be a mistake.  In Epistasis and the Evolutionary Process (2000), Templeton states that, using proper computational methods and algorithms, one finds complex interactions (epistasis) more often than additive, single-factor effects.  In studying susceptibility, it is necessary to ask complex questions and develop complex models.  A shift away from single gene or single environmental factor analysis to multi-gene, multi-factor analysis is required, but current analytical and statistical tools are inadequate to support such a shift. 

 

3.  Develop and apply technologies to investigate genetic variation broadly, including alleles, SNPs, haplotypes, epigenetic alterations of nuclear and mitochondrial genomes, miRNA, and other factors as they are identified.

 

A few relatively high-throughput technologies exist to evaluate some types of genetic variation, but improvements in existing technologies and the development of new ones are required.  Genome-scale sequencing efforts mean that complete genome sequences for quite a number of people will soon be available.  One spin-off of this sequencing work has been the development of methods for whole-genome scanning using microsatellites, and now single-nucleotide polymorphisms (SNPs).  SNP typing is rapidly becoming the technology of choice for genotyping as the cost is reduced, and the value of the technique will rise rapidly with further cost reductions and the development of analysis techniques that can handle the masses of data generated.  The HapMap project leads the way in identifying and validating human SNPs, and in organizing them into haplotypes to maximize the information content of SNP mapping.  Similar efforts are underway for model organisms.

 

However, more progress is needed in other areas of genetic analysis.  High-throughput assessment of epigenetic variation is at least a decade behind that of DNA sequence variation, and new techniques are required to interrogate the genomic architecture for regulatory features.  With the discovery of potential new sources of genetic variation, such as regulatory micro RNAs, the need for technology development continues to increase.

 

Specific Recommendations:

 

1.  Given the scale of data sets and complex biological systems, create new computational, mathematical, and statistical models and analytical tools that embrace, rather than simplify or ignore, the complexity of variation in susceptibility and resistance to cancers.  Support large-scale coordination of studies with calibration, interoperability, data federation, and standardization of data formats, natural language processing tools, and dataset query tools.

 

Current analytical tools and approaches are inadequate to deal with the complexities inherent in assessing susceptibility (see Requirement 2, above).  Orders of magnitude more data and types of data will need to be acquired, assembled, and analyzed into information, and knowledge extracted.  This will require development of new approaches to statistical analysis, integration of modeling efforts into the field, and changes in how data are collected, annotated, and stored.

 

Analytical Approaches- The currently accepted picture of cancer susceptibility, involving multiple genetic and environmental factors acting together in a non-additive fashion, cannot be adequately tested experimentally because of limitations in statistical methods and computational resources.  These limitations necessitate the introduction of simplifying assumptions that are likely to be invalid.  As a result, potentially critical variables are ignored or oversimplified.  The types of analytical methods that need to be developed involve combinatorial problems.  Epidemiological analysis is generally restricted to use of a linear modeling paradigm, but such statistical methods are unsuitable for the amount and complexity of data that will be needed to test realistic hypotheses.  It is necessary to shift from parametric methods to those that deal more successfully with complex interactions.  Significant computational resources are required to deal with high-dimensional data sets; although computing power is increasing, new algorithms and strategies are needed to solve the problems and model complex systems.  The methods depend on non-linearity and non-additivity and involve applied biostatistics, a field that moves forward slowly.  To anticipate the requirements of research five to ten years in the future, research is needed now in the way statistical methods and design algorithms are applied. 

 

Modeling- While simple problems can be analyzed directly, problems as complex as cancer susceptibility must be computationally modeled to understand what data needs to be collected and what analytical approaches are likely to be successful.  Modeling a biological process at early stages of a project can provide information about the amount of data needed to represent the process accurately, and ensure that enough data is collected to make analyses feasible.  Modeling can also help to determine what types of interactions among variables are most likely to affect the outcome, and focus computing power on the most productive problems.

 

Models and experiments are refined in an iterative process that requires regular interaction among model developers, model simulators, and biologists.  The proper integration of modeling into laboratory studies or epidemiologic projects requires building a mutually beneficial working relationship with experts in the quantitative discipline in which modelers are typically trained.  Because the empirical approach relies heavily on statistics, empirical scientists and statisticians have learned to communicate well.  Similarly productive communication between biologists and computational scientists is less common, limiting the development of modeling.

 

Data Issues:   Standardization, Calibration, Test Validation, and Integration

Standardized nomenclature, controlled vocabularies, and natural language processing are needed.  Many data sets cannot be pooled because they are not standardized.  Data collection must be standardized – especially in the large-scale cohort (see Recommendation 3, below) – and data sets integrated so that comparisons can be made.  One approach to standardization and facile data sharing is to aggregate the necessary technologies in one place.  This might involve a consortium of many academic institutions using the same standards, so that collection could be correlated with sample analysis.  Even modeling should be standardized to enable investigators access to several well-defined pathways to work with. 

 

Calibration is critical for pooling of some data types, but it is currently difficult because groups do not use the same controls, even where controls are possible.  In addition, in some fields such as structural and functional MRI, no calibration exists to permit cross-comparisons among instruments, so no valid comparisons are possible.

 

Test validation is needed, especially for the most fundamental data types.  If thousands of SNPs will be typed on DNA samples, the procedure must be validated.  The Think Tank participants recommend obtaining a uniform set of data using lymphoblastoid cells from a control population for this process.

 

Finally, mechanisms are needed to integrate new data into what already exists.  Scientific communities need to pool information in ways that maximize the use of the resources.  If a large-scale cohort were to be established, it will pose particular challenges of data integration because the database is likely to be a federated system, assembled by linking datasets at many separate locations.  The community’s computer development effort needs to build integrated models that can maintain security.

 

2.  Create a common and widely available repository with existing resources of well-selected, maintained, and annotated human biological specimens (immortalized lymphoblastoid cell lines, serum, plasma, urine, etc.) for genetic, epigenetic, proteomic, and metabolomic analyses of early-onset cancer cases, including those with positive family histories.  Create common databases.  Develop a corresponding resource from a special elderly subpopulation to study resistance to cancer development in individuals with high lifetime exposure to carcinogens.  Require data deposition in freely accessible databases, with resource sharing of repository materials.

 

Developments in analytical techniques, modeling, and data collection and storage, all described above, will become especially useful when applied to large, highly informative population studies.  The most ambitious such proposal is described in the next section, but there was general agreement at the Think Tank that at least two studies hold special promise in the short term.  The first is a study of early onset cancer patients and their families, to look for particularly influential susceptibility genes.

 

The second study would focus on cancer resistance, the opposite side of susceptibility.  The carcinogenic potential of smoking is very high, but some heavy smokers live to be 100 years of age, and eighty-five percent of smokers do not develop lung cancer.  Family studies should be developed based on unusual resistance to cancer.  For example, Peter Shields and others have cohorts of 90-year old smokers from Veterans’ Hospitals; these cohorts would be a valuable addition to a large cohort study.  Some individuals develop multiple cancers, but live a long life because their tumors do not metastasize.  The interactive genes that confer apparent resistance to metastasis should be identified.  Susceptibility or resistance to progression or metastasis should receive the same attention as susceptibility or resistance to initiation.

 

These two studies are important on their scientific merits, but they also provide an opportunity to test developments that need to be applied to all studies in cancer susceptibility.  In addition, they can serve to pilot high-quality repositories for biological samples, strategies for dealing with privacy issues, and ways to mandate data and resource sharing.  

 

Biological Resources/Repositories

For these studies and the large cohort study described below, a repository for human specimens is a necessity.  A common resource of biological specimens should be developed.  This should include a standard set for iterative studies, with enough samples for test training sets of data.  Repositories containing material useful in the study of cancer susceptibility exist throughout the world, but their optimal use by the scientific and clinical communities is hampered by impediments.  Beyond the existing repositories of samples from “common” types of cancer and populations, susceptibility studies require cell lines from families to investigate familial gene variation.  The success of ongoing efforts by the NCI to optimize the large-scale and efficient acquisition, storage, annotation, and distribution of biological samples is critical to progress in cancer susceptibility.

 

Resource Sharing

It is important to capitalize on existing resources, including large datasets and large specimen stores, many of which have not yet been analyzed.  Investigators could contribute these samples for analysis, and retain access to the data for a reasonable period of time (e.g., six months) before the data enters the public domain.  NCI could facilitate such resource sharing by paying for processing the material.

  

Privacy, Sample identification, and Patient Concerns

New mandates for patient privacy provide challenges for the study of cancer susceptibility.  It is critical to retain some identification with genetic information and to keep family data together, yet current requirements to de-identify data impede progress in assessing gene-environmental interactions.  Informed consent is also a potential problem.  Durable informed consent must be obtained so that samples are approved for varied, long-term uses.  These issues generated considerable discussion at the Think Tank, but the Think Tank participants were optimistic that, given sufficient attention, these problems could be dealt with.

 

Data Sharing

Little, if any, epidemiological data relevant to cancer susceptibility is freely available for computer downloading.  In addition, there are hundreds of linkage studies in human genetics from which conclusions have been published, but the underlying raw data remain unavailable.  As a result, experimentalists have no way to compare their data to that of other labs, and computer scientists developing algorithms useful for cancer research have no way to test them, short of entering into formal collaborations.  The NIH or NCI could, and should, exert pressure for release into the public domain of raw data underlying published analyses.  In a number of large-scale research projects funded by NCI or elsewhere at NIH, data release is required, and sometimes even facilitated.  This practice needs to be much more widespread, and reasonable schedules for data release need to be negotiated with input from the scientific community.  In coordination with funding agencies and professional societies, journals can facilitate data release by requiring release of underlying data as a condition of publishing the conclusions.        

                             

Datasets are unavailable because they are difficult and expensive to produce.  They can be reused by the original investigator in new studies, and no incentives exist for sharing them.  No credit is given in university promotion and tenure decisions for releasing data helpful to others in the field.  In fact, even collaborating with others to publish analyses of aggregated datasets results in multi-author publications for which most participants receive little credit.  The physics research community had no choice but to work together when experiments became exorbitantly expensive, and they have succeeded in changing the culture so that collaborations have intrinsic value.  This must happen in the cancer susceptibility research community if important problems are to be addressed.

 

3.  Develop a trans-NIH multiple-disease-oriented framework for initiation of a very large-scale human population study, with multiple ethnic groups, environmental exposures, and family histories.  Ensure widespread access to specimens, cell lines, and data.

 

A central discussion point of the Cancer Susceptibility Think Tank was the need to establish a million-person cohort from which standardized samples and data could be obtained and shared among research groups interested in complex diseases like cancer.  The rationale and logistics underlying such a large project, and its relationship to existing cohorts, elicited extensive discussion.  Among the considerations:

 

Why a million-person cohort?

Statisticians working on the proposal for the American Gene and Environment Study concluded that a cohort of this size should be adequate for analysis of gene-gene interactions of the type anticipated to play a role in cancer susceptibility.  An article by Francis Collins in Science mentioned the need for a large-scale longitudinal human cohort, taking tissue samples pre- and post-disease, to make best use of the technology – not just after the disease has spread.  Case control studies typically yield biased information because they are based on the surviving subset of people with that disease.  Few studies have collected long-term human tissue samples. 

 

To design such a large population study is obviously challenging – deciding what to analyze for and how to obtain a large enough dataset that the signal-to-noise ratio is robust in the face of real human variation.  The difficulties are compounded by the high cost of the undertaking, which necessitates that the cohort be applicable to research on a broad spectrum of diseases.  The studies must be very long-term; thus the information and biological samples obtained and the way in which they are stored must be carefully chosen to preclude their obsolescence.  Population characteristics can change dramatically in a short time, as illustrated by variations in colorectal cancer rates with environment and time. 

 

Building the cohort:  Recruitment and Time-frame

A trans-NIH multi-disease–oriented cohort should build on existing cohort studies, and pilot novel strategies for database construction and recruitment.  It is critical to define the minimum data set needed, and to make a centralized data base attractive to the cohort consortium.  Specimen resources should also be centralized, or at least federated, to ensure access to them.  It is also critical to put in place a mechanism for obtaining available tissue when a participant develops cancer and has a biopsy or surgery.

 

A million-person cohort will reach maximal utility within 5-10 years, but recruitment should start as soon as possible.  It is reasonable to anticipate that technology will advance, and sequencing and genotyping will become much easier to do.  The 5-year goal should be based on optimizing currently available technologies, while the 10-year goal should assume the availability of high-throughput techniques applicable to large populations.

 

Specimen Collection, Standardization, Data Analysis, and Accessibility

When investing in such a large study, one important step is to determine whether the current methods of defining diet and other environmental exposures are adequate to capture all the information that may be of value.  Exposure measurements must be refined.  It will also be critical to know the genetics of cohort participants, from the germline to somatic genetics and disease. 

 

Specimen collection should include fresh tissue, benign as well as malignant, associated with a minimum data set containing residency history and ethnicity.  A federated (virtual) facility would store and distribute tissue and data.  The “million cohort” database would be accessible to all.  Collection of tissue would be enabled, as would international interactions.  A grid system should be set up at NIH in which all data from the “million cohort” would be included, with enough computer power to do the modeling and analysis.  The level of quality control must give assurance of standardization and comparability of the data.

 

4.  Exploit animal models for identification and validation of cancer phenotypes

and susceptibility and resistance genes.  Foster comparative genomic and phenotypic studies of human and rodent (mouse and rat) cancer susceptibility and resistance. 

 

Many laboratories use inbred and outbred mouse and rat strains to disclose the genomic loci that are associated with complex traits, such as tobacco carcinogenesis, addiction to alcohol, tobacco and other drugs, diet-induced obesity, and hypertension, among others.  There are usually several dozen or more implicated loci, none of which has a major effect by itself.  Comparative genomics reveals that the majority of the implicated loci are conserved across three species – human, rat, and mouse – inspiring confidence that murine species can be informative for the genetic and environmental determinants of human disease susceptibility.  The genomes for other species have also recently been sequenced.  In particular, championship bloodlines of dogs display inherent susceptibility to a variety of cancers, and the exceptional documentation of these bloodlines will enable cross-comparisons between human populations and animals that share the same environment.  An expanding list of publications on the bases of susceptibility and resistance to cancer and other diseases attests to the fact that cross-species comparisons are not only useful, but also more facile, approaches to defining complex traits.

 

Researchers are more optimistic about understanding genotype to phenotype mapping using animal models rather than human models.  It is possible to generate animals in which the activity of several loci are diminished or increased at the same time, and have a strain with a very acute phenotype within a few months.  An unbiased set of phenotyping strategies can be applied to the strains to expose novel disease manifestations that may be overlooked in human population studies.  In addition to unaltered inbred or outbred mice, mice that are genetically altered to be sensitive or resistant to carcinogens are available.  These strains can be bred to others which differ in sensitivity to identify which of the many contributing genes is dominant.  Most wild mouse strains have a low incidence of cancer, and studies of them indicate that they have a number of dominant resistance genes.  These are only a few examples of existing mouse resources and approaches that can be applied to the problem of understanding cancer susceptibility.

 

Several years ago, mouse models were designed to map simple, single-gene–based traits; new mouse models are designed specifically for analysis of complex traits and the contributions of environmental effects.  An integrative approach using newer mouse models to illuminate results obtained with the million person cohort and to experimentally model the resulting hypotheses would be particularly powerful.  Even with the best methods to model gene-gene or genetic-environment interactions, statistical patterns that emerge from the mathematical models require experimental validation in animals.  This may require knockout mice for evaluating multi-genotype variations.  Not all the experimental techniques are in place yet, nor have mice been engineered to examine all the genotype interactions, but increased efforts to employ integrative human/animal/computational strategies are likely to pay substantial dividends in the study of human cancer susceptibility. 

 

5.  Establish a trans-NIH initiative for the development of common mouse resources for the analysis of genetic and environmental effects on development of cancer and other complex trait diseases.  Facilitate multi-level, multi-lab studies of specimens after environmental exposures for gene-environment interactions for the development of computational models representing the complex genetic and non-genetic networks associated with disease susceptibility and progression.

 

Using wild-derived strains and out-bred populations, it is possible to expand genetic variation far beyond that available in standard inbred strains.  Recombinant inbred (RI) lines are a powerful tool in the search for susceptibility genes, and, even more importantly, for the conduct of a systems biology approach to the correlation of genotype and phenotype.  A number of Think Tank participants expressed considerable enthusiasm for the creation of a large, trans-NIH resource of RI strains to facilitate study of cancer susceptibility and other similarly complex genetic traits, such as diet-induced obesity.  Several trans-NIH groups are evaluating common mouse resources to understand complex human diseases, and there are international efforts underway to coordinate the development and deployment of these resources.  The precise nature of common mouse resources for the analysis of complex gene-environment interactions in human diseases still requires consensus-building within the relevant research community.  These efforts are the subjects of a series of workshops currently in development.

 

6.  Radically modify training programs to provide immersive interdisciplinary learning environments for biologists, mathematicians, computer scientists, statisticians, epidemiologists, and clinical investigators.

 

The field of cancer susceptibility needs investigators who can understand and apply high-level computational and modeling techniques in analyzing multiple genetic and environmental interactions.  Think Tank participants identified several approaches for overcoming this problem.

 

Develop cross-training, interdisciplinary programs

Cross-training in biology and computational science is challenging.  Most of the participants expressed a preference for educating scientists with programming or statistical genetics or mathematical modeling backgrounds about biology, rather than teaching biologists the math and computational skills.  Thus, it is important to identify students with good quantitative skills at the undergraduate level who are comfortable with statistical concepts so that mathematics is a fundamental component of their broad education.  It is also important to educate biologists and epidemiologists how to generate and apply models.  Too often, data that are analyzed are not translated into a model, or the models that are created only verify the original assumptions and add nothing new. 

 

A major change is needed in graduate training to prepare scientists in quantitative and biological areas, but successful prototypes exist.  For example, an innovative Ph.D. program exists for mathematicians, physicists and computer scientists to learn biology as well as use their technical skills, and for biologists and chemists to spend most of their time learning how to model.  The field of Neuroscience is unusual in that it allows researchers to return at any stage of their career for cross-disciplinary training.  Similar opportunities would be of value in cancer research.

  

Provide grant support for interdisciplinary education

Most NIH training grant programs limit the opportunities for students to pursue interdisciplinary opportunities or learn computational methods, yet students should be rewarded for seeking training in other areas.  Competitive grants and awards designated for that purpose would encourage pursuit of interdisciplinary education.  A few NIH grants, awarded to an institution or an individual, do require interdisciplinary training, with two mentors and work in two labs in different areas.  Expansion of these programs is strongly encouraged, could be done immediately, and would have an impact within ten years.  Anything that increases cross-disciplinary training is an opportunity with a good pay-off.  Such training could be encouraged by NCI/NIH and implemented without additional funds – just reconfiguration of the present priorities. 

 

Support summer programs and/or workshops

Although interdisciplinary training is hard to find within an institution, some institutions convene special summer programs.  NCI or NIH could support a program of intensive lectures to introduce young people to concepts such as model building.  A program might involve 50-75 students, staff support, and facilities that can deal with animals, cells, human material, and computational materials.  Students could be familiarized with enough mathematics to feel comfortable, and high-end lecturers could be brought in, as is the practice at courses at Woods Hole and Cold Spring Harbor.  Courses of this type have not been implemented to provide interdisciplinary training for mathematicians and cancer biologists.  Another possibility for productive interaction would be a genetic analysis workshop, in which the participants could work with a shared dataset and apply their own methods to analyze the data.

 

Increase access to hardware and large-scale resources  

Students occasionally have problems with access to computers, but their main problem is accessing higher-level analyses.  Students who are unable to parallelize their data may take months to learn the algorithms, and are unable to perform analyses on the higher power machines.  In another three years, there will be more computers with unusual architectures; there is an immediate need to train biomedical scientists to use these new computing technologies and to provide more parallelization training so the next generation of scientists will be able to move into the new systems.  Grants to purchase instrumentation to be used for training would encourage institutions to provide training.

  

Encourage collaborative career development

At many universities, junior faculty members who publish with other people receive less credit than for a solo publication; this strongly discourages collaborations and interactions across disciplines.  Yet within academia, there are a few models like Biostatistics, in which people build careers with collaborative studies and still advance the core of their own discipline.  The work-style of NIH-supported individual investigator communities is generally counter to the kind of collaborations needed to make real progress.  NCI should investigate and implement incentives of recognition as well as financial rewards to encourage necessary culture change.

 

Biomedical scientists tend to utilize computer scientists as technicians, so computational scientists hesitate to get involved in collaborations in this area.  When they are recognized as a critical intellectual partner, computer scientists contribute novel ideas to the design and execution of the project, and analysis of the outcome.  It also helps to have a critical mass of computational people involved in biological research, so that they have a community to interact with on theoretical and development issues.


Specific Recommendations for the NCI: 

 

Realizing that the goal for cancer susceptibility research set out in the Executive Summary will be possible only with advances in basic knowledge about cancer and genetic variation, and the introduction of new tools and approaches, the following are recommended.

 

Requirements to achieve the goal:

·        Integrate studies of experimental model systems, human populations, and computational models of molecular signatures in normal and dysregulated states relevant to particular features of cancer susceptibility.

·        Improve cancer-related phenotyping (i.e., the analysis of molecular and systemic changes in cancer progression) by applying emerging sensor, imaging, nano-and molecular technologies to individual humans and animals, biological specimens, cells, extracellular matrix components, and cell-cell interactions

·        Identify sources of individual variation in cancer-related biological processes, both genetic and non-genetic, and model the genetic-environmental and stochastic interactions.

·        Develop and apply technologies to investigate genetic variation broadly, including alleles, SNPs, haplotypes, and epigenetic alterations of nuclear and mitochondrial genomes, miRNA, highly repetitive sequences, transposons, gene copy number polymorphisms and other evolutionary molecular features.

Specific recommendations:

·        Given the scale of datasets and complex biological systems, foster the creation of new computational, mathematical, and statistical models and analytical tools that embrace, rather than simplify or ignore the complexity of variation in susceptibility and resistance to cancers.  Support large-scale coordination of studies with calibration, interoperability, data federation, and standardization of data formats, natural language processing tools, and dataset query tools.

·        Create a common and widely available repository with existing resources of well-selected, maintained,  and annotated human biological specimens (immortalized lymphoblastoid cell lines, serum, plasma) for genetic, epigenetic, proteomic, and metabolomic analyses of early-onset cancer cases, including those with positive family histories.  Create common databases.  Develop a corresponding resource from a special elderly subpopulation to study resistance to cancer development in individuals with high lifetime exposure to carcinogens.  Require data deposition in freely accessible databases, and resource sharing of repository materials.

·        Develop a trans-NIH multiple-disease–oriented framework for initiation of a very large-scale human population study, with multiple ethnic groups, environmental exposures, and family histories.  Ensure widespread access to specimens, cell lines, and data.

·        Exploit existing animal models for identification and validation of cancer phenotypes and susceptibility and resistance genes.  Foster comparative genomic and phenotypic studies of human and rodent (mouse and rat) and other mammal cancer susceptibility and resistance. 

·        Establish a trans-NIH initiative for the development of common mouse resources for the analysis of genetic and environmental effects on development of cancer and other complex trait diseases.  Facilitate multi-level, multi-laboratory studies of specimens after environmental exposures for gene-environment interactions for the development of computational models representing the complex genetic and non-genetic networks associated with disease susceptibility and progression.

·        Radically modify training programs to provide immersive interdisciplinary learning environments for biologists, mathematicians, computer scientists, statisticians, epidemiologists, and clinical investigators