Building the Nation's Cancer Research Capacity |
Email this page |
|||
Developing Bioinformatics for Cancer Research
The ChallengeBioinformatics is the art and science of electronically representing and integrating biomedical information in a way that makes it accessible and usable across the various fields of cancer research. Each discipline involved in the immense complex of cancer research generates volumes of data and research findings, all contributing vital threads of insight that bioinformatics specialists must weave together into a useful tapestry of knowledge. The challenge is to use electronic standards to devise systems that allow different sets of data to "talk" to one another. Such systems make it possible for scientists to use this new in silico biology to generate hypotheses; conduct virtual experiments using large collections of data from multiple sources; and create, manage, and communicate massive amounts of new knowledge in practical timeframes. In that way, it becomes possible to build computer models of biological structures and processes that represent the combined knowledge of many kinds of experts. The challenge of integrating separate scientific vocabularies and insight is daunting because of the vastness and rapid evolution of the data. New models and tools are needed to allow scientists to bridge language, integrate concepts and information, and enable complex analysis. In this way, information systems will be a vital, dynamic tool in the hands of cancer researchers. To address these challenges, NCI must pioneer a versatile informatics infrastructure that will provide the cancer research community with a framework to capture and use the rich data being generated through numerous projects and initiatives. To accomplish this, we will need to:
Progress Toward Meeting the ChallengeConstructing a Cancer Knowledge ResourceCollecting and Sharing Cancer Research Data Building a Network of Interoperable Analytic Tools and Data Sources Fostering Application of In Silico Biomedicine in Cancer Research and Care Constructing a Cancer Knowledge ResourceFor cancer bioinformatics to work, it is necessary to classify, locate, and distribute units of cancer-related information. Such a unit might be the description of a gene or of a process, the email address of a principal investigator, or a reference to a published paper. NCI is developing and deploying a bioinformatics infrastructure to support the capture, redistribution, and integration of diverse data generated in its allied research fields. This infrastructure tackles the disparate nature of these large data collections through the production of a knowledge "stack," an analogy to earlier days in computing when items in computer memory and the processing queue were conceptualized as being organized in stacks. Data processed through the knowledge stack are transformed into information components, which can then be combined to generate knowledge. Our system is called the cancer Common Ontologic Reference Environment or caCORE. The caCORE is composed of three interacting layers. At its foundation is NCI's Enterprise Vocabulary Services (EVS), which organize and translate the distinct but overlapping vocabularies of disparate scientific projects via a common vocabulary. When it comes to describing genes, proteins, biological systems, and processes from the perspectives of the chemist, the physicist, the molecular biologist, the immunologist, the mathematician, and the geneticist, the likelihood that one thing will have many names, or that two different things will have the same name, grows exponentially. For example, one research paper may refer to "breast cancer." Another may describe "malignant carcinoma of the mammary gland." By translating one phrase into the other, a computer can search and find both of these articles. The middle tier of caCORE contains standard ways of collecting scientific data or Common Data Elements (CDEs) that conform to the international standards used by other Federal organizations. This keystone effort will ensure that all data from NCI-supported programs, whether from clinical trials, animal model programs, basic research, or other disciplines, can be easily shared. The top layer of caCORE is composed of models of information - the cancer Bioinformatics Infrastructure Objects - caBIO. This tier is being constructed to mimic the strategy nature uses to build complex systems. Biomedical concepts are captured as simple components. Complexity is generated through the joining of combinations of these components. The objects capture the expertise of the different disciplines that constitute cancer research, allowing the knowledge to be shared through computer tools. The current collection of objects captures knowledge in the areas of genomics, genetics, animal models, and clinical trials. caBIO is built with open source software, making it available to scientists without restriction. Collecting and Sharing Cancer Research DataThe data and findings produced through NCI initiatives are valuable not only to the investigators who generate them, but also to other cancer researchers. They form a foundation on which they can build future studies and a resource for "in silico" biomedical experiments. Some examples follow.
Building a Network of Interoperable Analytic Tools and Data SourcesNCI has begun the process of establishing a network of bioinformatics tools and data that interact to capture the innovation of the cancer research community and facilitate the use and re-use of its valuable resources, such as tissue repositories. A first step in building the network has been the construction of Internet portals through which various network and consortia communities can share their tools and data. Portals serve as one-stop-shops where a community can electronically interact, exchange, and share resources.
Fostering Application of In Silico Biomedicine in Cancer Research and CareEarly applications of in silico biomedicine in cancer research supported by NCI confirm the promise of this exciting new scientific approach. These efforts demonstrate that basic science research information can be translated to guide the development of new therapeutics. They also show that bioinformatics can be used to refine diagnosis and improve screening. For example: Facilitating Discovery through Integration ToolsThe true promise of the application of information technology lies in its ability to integrate information among research disciplines. To explore this opportunity, NCI has undertaken the Cancer Molecular Analysis Project (CMAP) to facilitate the identification and evaluation of molecular targets in cancer by integrating comprehensive molecular characterizations of cancer. The CMAP application permits investigators to discover molecular targets, assess their validity and interaction with other targets, screen for possible toxicity, determine if there are therapeutic agents that can act on this target, and determine whether there are clinical trials evaluating these agents. CMAP currently draws data from a multitude of resources and both the data and infrastructure are publicly accessible. Improving Cancer Diagnosis and Screening through BioinformaticsRecent successes demonstrate the immense potential for the use of bioinformatics tools, specifically artificial intelligence, to improve cancer diagnosis and screening.
The Plan - Developing Bioinformatics for Cancer ResearchObjectives, Milestones, and Funding Increases Required for Fiscal Year 2004 Bioinformatics: Linchpin of Translational ScienceTranslating results from the laboratory bench to bedside delivery of patient care and communicating the lessons learned back to the bench, goes far beyond the high-speed calculation, the once-unimaginably complex modeling, and the massive storage and retrieval used for individual studies in individual institutions. It demands a whole system of linkages across laboratories, clinics, disciplines, and organizations, from purely research to regulatory, to journalistic, to advocacy, and beyond. At the National Cancer Institute, we recognize that bench-to-bedside translation is conceptually inseparable from informatics. Our success with one will continue to reciprocally drive our success with the other. Developing the bioinformatics infrastructure requires that we build greater power and compatibility at several levels of focus, and at several points in the process, building pioneering systems that, in working together, make it easier for the scientists who use them to work together as well. The codes and logic that form the language for integrating these systems can then be standardized. In many cases, the standards themselves must be created, tested, and agreed upon. Research highlights in this section illustrate application of these prototype informatics systems, ranging from the micro level for discovering meaningful patterns in bits of genetic material to the macro for monitoring clinical trial activity, to the most broadly focused for tracking and comparing activity, so that we can know how well our funding is meeting the full range of research requirements. |