NCI Header
caBIG

Home  |  Search GO

 

 

 

 

 
www.cancer.gov National Cancer Institute National Human Genome Research Institute

caBIG™ Tools

caBIG™ offers a suite of software development toolkits, applications, database technologies, and Web-based applications to meet a wide range of needs for the cancer research community. Below is a partial list of such caBIG™ tools. For a more complete list, or to access these tools, please visit https://caBIG.nci.nih.gov/inventory.

CLINICAL TRIAL MANAGEMENT SYSTEMS

Challenge
The most expensive and time-consuming part of the development of new therapeutics is clinical development. The development process costs an estimated $800 million or more per product, and clinical trials take many years to design, implement, compile results, and submit to regulators. At present, there is no standardized, uniform system for clinical trial management, and no central registry of all cancer-related clinical trials. Moreover, only a small number of cancer patients actually participate in clinical trials of experimental therapies. "Legacy," or paper-based, information systems for clinical trials are slow and inefficient for the collection, analysis and sharing of data, resulting in unnecessary work delays, duplication of effort, human errors in data entry, and missed opportunities for data mining and secondary use of the data.

Applications

  • Cancer Central Clinical Database (C3D): This application is a library of standardized templates, including electronic Case Report Forms (eCRFs) for collecting specific clinical protocols-required data. Templates available through C3D can be tailored for reuse across multiple studies. Authorized investigators can review the data in real time, to pose queries and analyze results. The database uses standardized vocabularies and common data elements to manage medical information throughout the course of a clinical trial.
  • Cancer Central Clinical Patient Registry (C3PR): This tool tracks data about an individual patient across clinical trials and at multiple sites.
  • caMatch: This tool provides a system for matching eligible patients to clinical trials. An online version of this tool is currently available for breast cancer trials in the San Francisco Bay area.
    Please visit: https://www.breastcancertrials.org/bct/ for more information about this program.

Benefits

  • Reduced human error
  • Consistent format, language, and structure throughout the entire life cycle of the trial, enabling reliability and comparability of results
  • Better molecular sub-grouping of patients to improve clinical outcomes
  • More efficient and systemized administrative process, reducing costs
  • Greater potential accessibility to larger population of patients for faster enrollment

Back to Top

INTEGRATIVE CANCER RESEARCH

Challenge
Modern molecular-based research is generating vast amounts of complex data that must be collected, aggregated, and correlated in order to identify biological pathways and genetic signatures. Currently, there is no centralized or easy to use informatics infrastructure in place to handle this huge and growing tidal wave of information. Tools that can synthesize large amounts of data from disparate sources and present it in a visual format are enabling cancer researchers to spend more time innovating and less time collating numbers and connecting data sources.

Applications

  • caARRAY: This software consists of a microarray database repository that can be accessed for data analysis through visualization tools, enabling researchers to "see" connections and correlations between genes and proteins.
  • caWorkbench: This tool provides a platform for the integrated analysis of a variety of biomedical data, including microarrays, sequences, pathways, ontologies, and transcriptional factors.
  • Cancer Molecular Pages: This tool provides an automatically annotated catalog of those proteins that are of special interest to cancer research. The catalog integrates data from curated databases (such as Genbank), computer generated annotations (such as prediction of 3-D structure and similarity to other proteins of similar sequence), and user annotations from specialists about that protein. The tool also provides network interfaces for connecting the catalog to other similar online databases, and a set of Web-based visualization tools.
  • GeneConnect: As genomic analysis tools have proliferated, there now exists a large number of databases with no commonality of identifiers (i.e. the serial number) that name the object in the database. For example, a particular gene might be known by three different identifier numbers across multiple databases. GeneConnect is a caBIG™ genomic identifier mapping service that will interlink such identifiers among the hundreds of such online databases.
  • Magellan: Research data sets are not only complex and large; but they are also extremely variable in structure due to the diverse systems that generate such data. Magellan provides a common structure and context that bridges diverse sets of data so that knowledge can more easily be gleaned from the information. User-defined meta-data (information about the research data) is also uploaded to put a more refined structure and context onto the relevant fields in those entities. Lastly, Magellan provides an interface that permits researchers to apply their own analytical software algorithms to that data.
  • Proteomics Laboratory Information Management System (LIMS): This software tool is used to track the laboratory processes relevant to two-dimensional gel electrophoresis, with a design to support the addition of new data types as they emerge.
  • Q5: Q5 is an algorithm that supports probabilistic disease classification that distinguishes cancer cells from normal cells based on protein expression.
  • Quantitative Pathway Analysis in Cancer (QPACA): This tool is a pathway modeling and analysis system that supports exploration of quantitative biological data in the context of a pathway description.
  • Transcript Annotation Prioritization and Screening System (TrAPSS): This system includes several tools for researchers searching for the mutations that cause a defect or disease.
  • Visual and Statistical Data Analyzer (VISDA): This statistical analysis tool is for analyzing multivariate data sets. It includes cluster-modeling (the idea of grouping data elements within the set according to a particular concept – such as therapeutic response) and visualization tools to provide a graphical view of these analyses.

Benefits

  • Access to, and integrated analysis of, data from diverse sources
  • Increased efficiency in analyzing and visualizing results
  • Accelerated discovery of molecular signatures

Back to Top

IN VIVO IMAGING

Challenge
There is currently no system in place to share or archive images among clinicians and researchers, to validate clinical research hypotheses, or to facilitate development of new diagnostics, prognostics, or therapeutic algorithms. As a result, there is no "reference source" that embodies the community’s collective experience and knowledge.

Application

  • caIMAGE: This Web portal stores cancer images, enabling researchers and scientists to retrieve and submit image and image annotations, including species, organ, tissue, and diagnosis.

Related Resource

  • National Cancer Imaging Archives: This in vivo image repository provides the cancer research community, industry, and academia with access to image archives that can be used for many purposes, including aiding the development and validation of analytical software tools supporting lesion detection and classification software, accelerated diagnostic imaging decisions, and quantitative imaging assessment of drug response.

Benefits

  • Digitized format enables information to be merged with other data sources
  • Improved clinical decision support: researchers can see a patient’s response to treatment more quickly, accurately, and objectively

Back to Top

TISSUE BANKS AND PATHOLOGY TOOLS

Challenge
It is widely known that access to large quantities of high-quality, clinically annotated human biospecimens is critical to genomics-based research. Yet, at present, there is a lack of standardized, integrated systems for collecting, processing, archiving, and disseminating biospecimens and associated clinical data, either within or across multiple institutions. As a result, discovery research does not progress as rapidly as possible. While polices, conflicting regulations, and other non-technical issues contribute to this problem, creating readily available software supporting the highest standards of biospecimen and data collection is considered one of the biggest hurdles, and thus a key focus for caBIG™.

Applications

The caTISSUE Suite is comprised of:

  • caTIES (cancer Text Information Extraction System): Vast amounts of data (such as patient diagnoses, treatments, and the eventual outcome of treatment) are tied to pathology records and biospecimens archived in biorepositories. While these biorepositories have existed for decades, biospecimens can now be analyzed using modern genomic and proteomic technologies. As such, they are a valuable resource that can now be mined in order to correlate disease state with molecular profiles, thereby identifying molecular patterns of disease. The caTIES software tool provides a way of electronically capturing that pathology data – it takes information from pathology reports, and places it in structured electronic formats to enable different researchers and different institutions to catalog their biospecimens in a standardized way. The tool also removes the patient identifying information from samples, to protect the privacy and confidentiality of the patient.
  • caTISSUE core: Modern biorepositories have a significant logistical problem in collecting, managing, processing, and distributing the millions of biospecimens in many formats (i.e. fixed tissue, frozen tissue, cell lines, and DNA, RNA or protein derived from them) in their custody. This software provides a standard application for biorepositories to handle and track biospecimens, as well as conduct administrative tasks that are essential for biorepository operations.
  • caTISSUE Clinical Annotation Engine: The true value of biospecimens in modern biomedical research is only realized if the clinical information from the donor accompanies the biospecimen. These data must be electronic, detailed, and standardized to be useful to scientists who use data mining and statistical analysis to correlate clinical descriptions (including outcomes) to molecular data. This clinical data software enables biorepositories to centralize, standardize, and protect data for research purposes that is collected from the many external sources–such as basic medical records, drug treatments, surgery, radiology, tumor registries, and pathology laboratories–in a modern medical center.

Benefits

  • Expanded access to sets of statistically significant numbers of high-quality, clinically annotated specimens for molecular research available from a network of connected research facilities
  • Ethically-based chain of trust that ensures patient privacy and confidentiality, in compliance with human research subject and patient protection regulations

Back to Top

 

National Cancer InstituteNational Institutes of HealthDepartment of Health and Human ServicesFirstGov.gov