National Cancer Institute : Plans & Priorities for Cancer Research

Building the Nation's Cancer Research Capacity

Email this page

Developing Bioinformatics for Cancer Research

Goal

Create a cancer informatics infrastructure to support and integrate the full spectrum of cancer investigations.

On this page:

The Challenge
Progress Toward Meeting the Challenge
2004 Plan and Budget Increase Request

Bioinformatics: Linchpin of Translational Science

The Challenge

Bioinformatics is the art and science of electronically representing and integrating biomedical information in a way that makes it accessible and usable across the various fields of cancer research. Each discipline involved in the immense complex of cancer research generates volumes of data and research findings, all contributing vital threads of insight that bioinformatics specialists must weave together into a useful tapestry of knowledge. The challenge is to use electronic standards to devise systems that allow different sets of data to "talk" to one another. Such systems make it possible for scientists to use this new in silico biology to generate hypotheses; conduct virtual experiments using large collections of data from multiple sources; and create, manage, and communicate massive amounts of new knowledge in practical timeframes. In that way, it becomes possible to build computer models of biological structures and processes that represent the combined knowledge of many kinds of experts.

The challenge of integrating separate scientific vocabularies and insight is daunting because of the vastness and rapid evolution of the data. New models and tools are needed to allow scientists to bridge language, integrate concepts and information, and enable complex analysis. In this way, information systems will be a vital, dynamic tool in the hands of cancer researchers.

To address these challenges, NCI must pioneer a versatile informatics infrastructure that will provide the cancer research community with a framework to capture and use the rich data being generated through numerous projects and initiatives. To accomplish this, we will need to:

Establish strategic partnerships with commercial, academic, and other governmental groups engaged in bioinformatics research and development. Through these partnerships we will broaden our base of components, compatible tools, data, and infrastructure.
Help incubate the next generation of infrastructure, applications, and insights to support cancer research.
Expand training and support for scientists pioneering the application of bioinformatics in cancer research.

Top of Page

Progress Toward Meeting the Challenge

Constructing a Cancer Knowledge Resource
Collecting and Sharing Cancer Research Data
Building a Network of Interoperable Analytic Tools and Data Sources
Fostering Application of In Silico Biomedicine in Cancer Research and Care

Constructing a Cancer Knowledge Resource

For cancer bioinformatics to work, it is necessary to classify, locate, and distribute units of cancer-related information. Such a unit might be the description of a gene or of a process, the email address of a principal investigator, or a reference to a published paper. NCI is developing and deploying a bioinformatics infrastructure to support the capture, redistribution, and integration of diverse data generated in its allied research fields. This infrastructure tackles the disparate nature of these large data collections through the production of a knowledge "stack," an analogy to earlier days in computing when items in computer memory and the processing queue were conceptualized as being organized in stacks. Data processed through the knowledge stack are transformed into information components, which can then be combined to generate knowledge. Our system is called the cancer Common Ontologic Reference Environment or caCORE.

The caCORE is composed of three interacting layers. At its foundation is NCI's Enterprise Vocabulary Services (EVS), which organize and translate the distinct but overlapping vocabularies of disparate scientific projects via a common vocabulary. When it comes to describing genes, proteins, biological systems, and processes from the perspectives of the chemist, the physicist, the molecular biologist, the immunologist, the mathematician, and the geneticist, the likelihood that one thing will have many names, or that two different things will have the same name, grows exponentially. For example, one research paper may refer to "breast cancer." Another may describe "malignant carcinoma of the mammary gland." By translating one phrase into the other, a computer can search and find both of these articles.

The middle tier of caCORE contains standard ways of collecting scientific data or Common Data Elements (CDEs) that conform to the international standards used by other Federal organizations. This keystone effort will ensure that all data from NCI-supported programs, whether from clinical trials, animal model programs, basic research, or other disciplines, can be easily shared.

The top layer of caCORE is composed of models of information - the cancer Bioinformatics Infrastructure Objects - caBIO. This tier is being constructed to mimic the strategy nature uses to build complex systems. Biomedical concepts are captured as simple components. Complexity is generated through the joining of combinations of these components. The objects capture the expertise of the different disciplines that constitute cancer research, allowing the knowledge to be shared through computer tools. The current collection of objects captures knowledge in the areas of genomics, genetics, animal models, and clinical trials. caBIO is built with open source software, making it available to scientists without restriction.

Collecting and Sharing Cancer Research Data

The data and findings produced through NCI initiatives are valuable not only to the investigators who generate them, but also to other cancer researchers. They form a foundation on which they can build future studies and a resource for "in silico" biomedical experiments. Some examples follow.

We have constructed a repository for the mouse models developed by the cancer research community through the Mouse Models of Human Cancers Consortium. The repository is called the Cancer Models Database (CMD). It contains models that can be used to further explore cancer's origins and to test new therapies. The CMD allows researchers to share the insights they have gained in their investigations. It also permits additional investigators to extend the work of the model generators, building directly on the base they have constructed and accelerating the rate of discovery.
We have created a repository called the Gene Expression Database Portal (GEDP) to provide an integrated system for researcher access to tumor genetic and molecular taxonomy data enabled by initiatives such as the NCI's Directors Challenge: Toward a Molecular Classification of Cancer and the Mouse Models of Human Cancers Consortium.
We continue to expand the collection of publicly accessible genomic data generated by the Cancer Genome Anatomy Project (CGAP). New tools permit investigators to examine the large collections of data showing how various genes behave in different types of cancer.
Great strides continue to be made in expanding the use of state-of-the-art informatics tools to facilitate the conduct of clinical trials. The Cancer Therapy Evaluation Program (CTEP) is prototyping informatics tools that will facilitate much broader enrollment in cancer clinical trials. In combination with the NCI Center for Bioinformatics, CTEP is deploying infrastructure that is part of caCORE to facilitate sharing and integration of clinical trials information.

Building a Network of Interoperable Analytic Tools and Data Sources

NCI has begun the process of establishing a network of bioinformatics tools and data that interact to capture the innovation of the cancer research community and facilitate the use and re-use of its valuable resources, such as tissue repositories.

A first step in building the network has been the construction of Internet portals through which various network and consortia communities can share their tools and data. Portals serve as one-stop-shops where a community can electronically interact, exchange, and share resources.

NCI is working with the American Association of Cancer Institutes to build a clinical trials network that distributes information, available to all, on those trials that are conducted at participating Cancer Centers. The initial goal of this effort will be to develop infrastructure that allows Cancer Centers to communicate the clinical trials underway at their institutions and to permit patients to search all participating Cancer Centers for trials that meet their clinical and personal needs.
NCI is leading a consortium of industry, academia, and government agencies, aimed at developing interoperable infrastructure for biomedicine, the Interoperable Informatics Infrastructure Consortium.

Fostering Application of In Silico Biomedicine in Cancer Research and Care

Early applications of in silico biomedicine in cancer research supported by NCI confirm the promise of this exciting new scientific approach. These efforts demonstrate that basic science research information can be translated to guide the development of new therapeutics. They also show that bioinformatics can be used to refine diagnosis and improve screening. For example:

Facilitating Discovery through Integration Tools

The true promise of the application of information technology lies in its ability to integrate information among research disciplines. To explore this opportunity, NCI has undertaken the Cancer Molecular Analysis Project (CMAP) to facilitate the identification and evaluation of molecular targets in cancer by integrating comprehensive molecular characterizations of cancer. The CMAP application permits investigators to discover molecular targets, assess their validity and interaction with other targets, screen for possible toxicity, determine if there are therapeutic agents that can act on this target, and determine whether there are clinical trials evaluating these agents. CMAP currently draws data from a multitude of resources and both the data and infrastructure are publicly accessible.

Improving Cancer Diagnosis and Screening through Bioinformatics

Recent successes demonstrate the immense potential for the use of bioinformatics tools, specifically artificial intelligence, to improve cancer diagnosis and screening.

Effectively and safely treating pediatric acute lymphoblastic leukemia (ALL) requires detection of subtle distinctions among several subtypes. Pinpointing the subtype usually requires the combined efforts of a hematologist/oncologist, pathologist, and cytogeneticist. NCI-supported researchers have used artificial intelligence to analyze samples from patients whose diagnosis had already been determined. Their new test is 95 percent accurate. Moreover, as a byproduct of their research, they identified a new subtype of ALL.
NCI scientists have analyzed microarrays using an artificial intelligence technology called neural networks to distinguish among members of a family of childhood tumors that includes neuroblastoma, rhabdomyosarcoma, non-Hodgkin lymphoma, and Ewing tumors. The results of this kind of analysis hold promise for enabling healthcare providers to select appropriate treatment options and determine possible outcomes for patients.
NCI scientists have found that patterns of proteins found in patients' blood serum may reflect the presence of disease. NCI scientists have used serum proteins to detect ovarian cancer, even at early stages. The research, a joint effort involving the Food and Drug Administration, the NCI, and a private company, unites two exciting disciplines: proteomics, the study of the proteins inside cells, and artificial intelligence computer programs. Scientists were able to "train" the computer to distinguish between patterns of small proteins found in the blood of cancer patients and those of people not known to have the cancer. The artificial intelligence program identified exquisitely subtle differences in the patterns that may distinguish between women with ovarian cancer and women with non-cancerous conditions.

Top of Page

The Plan - Developing Bioinformatics for Cancer Research

Objectives, Milestones, and Funding Increases Required for Fiscal Year 2004

1.	Expand NCI's core informatics infrastructure to support and integrate NCI-supported basic, clinical, translational, and population research initiatives.	$49.0 M
2.	Create a community matrix of interoperable data sources, analytic tools, and computational resources that provide an extensible plug- and- play informatics capability for the cancer research community.	$21.5 M
3.	Expand the capacity of cancer research institutions to perform interdisciplinary informatics research.	$11.5 M
4.	Support bioinformatics training for both experienced and new scientists.	$3.0 M
Management and Support		$3.0 M
Total		$88.0 M

Top of Page

Objective 1: Expand NCI's core informatics infrastructure to support and integrate NCI-supported basic, clinical, translational, and population research initiatives.
Provide additional support and enhance integration of data and development of tools emanating from NCI's Extraordinary Opportunities. Facilitate information exchange within and between NCI-supported research initiatives. Provide support for SPOREs and Cancer Center efforts.	$29.0 M
Establish a toolbox of open-source informatics applications and services based on a common set of operating principles and standards that support NCI's diverse cancer research activities.	$5.0 M
Expand the research infrastructure that uses the NCI "knowledge stack," assembling common vocabulary, standard data elements, and information models to further the exchange of all types of cancer information and data among the cancer community.	$5.0 M
Expand information technology-based support services to enhance planning, execution, and communication of the wide-ranging research portfolio supported by NCI.	$10.0 M
TOTAL	$49.0 M

Top of Page

Objective 2: Create a community matrix of interoperable data sources, analytic tools, and computational resources that provide an extensible plug- and- play informatics capability for the cancer research community.
Enable other organizations' information systems to work seamlessly with ours. Establish a minimum of five academic, government, and commercial strategic partnerships in a research park setting where all partners can work together to address bioinformatics questions.	$15.0 M
Use a minimum of 20 investigator-based awards that build on the NCI informatics core to 1) deploy resources to the cancer research community to serve as the foundation for additional infrastructure and 2) facilitate rapid deployment of related new research initiatives.	$6.5 M
TOTAL	$21.5 M

Top of Page

Objective 3: Expand the capacity of cancer research institutions to perform interdisciplinary informatics research.
Establish a network of bioinformatics research centers to work with and through the NIH Biomedical Informatics Science and Technology Initiative, using a novel interdisciplinary management team to select and coordinate the centers.	$5.0 M
Expand institutional infrastructure by providing supplements to NCI-supported research organizations, supporting the growing need of investigator-initiated research to access state-of-the-art biocomputing tools and data.	$6.5 M
TOTAL	$11.5 M

Top of Page

Objective 4: Support bioinformatics training for both experienced and new scientists.
Recruit new scientists through 20 development awards.	$1.5 M
Cross train experienced scientists in a variety of life science specialties through transition awards.	$1.5 M
TOTAL	$3.0 M

Top of Page

Bioinformatics: Linchpin of Translational Science

Translating results from the laboratory bench to bedside delivery of patient care and communicating the lessons learned back to the bench, goes far beyond the high-speed calculation, the once-unimaginably complex modeling, and the massive storage and retrieval used for individual studies in individual institutions. It demands a whole system of linkages across laboratories, clinics, disciplines, and organizations, from purely research to regulatory, to journalistic, to advocacy, and beyond. At the National Cancer Institute, we recognize that bench-to-bedside translation is conceptually inseparable from informatics. Our success with one will continue to reciprocally drive our success with the other.

Developing the bioinformatics infrastructure requires that we build greater power and compatibility at several levels of focus, and at several points in the process, building pioneering systems that, in working together, make it easier for the scientists who use them to work together as well. The codes and logic that form the language for integrating these systems can then be standardized. In many cases, the standards themselves must be created, tested, and agreed upon.

Research highlights in this section illustrate application of these prototype informatics systems, ranging from the micro level for discovering meaningful patterns in bits of genetic material to the macro for monitoring clinical trial activity, to the most broadly focused for tracking and comparing activity, so that we can know how well our funding is meeting the full range of research requirements.

Top of Page