Infrastructure: The caBIG Informatics Backbone

The caBIG™ informatics backbone is based upon open source software products developed by the caBIG™ participants and the NCI Center for Bioinformatics (NCICB). caGrid, BRIDG, caDSR, EVS and caCORE SDK provide a common data management and application development framework, along with a robust services oriented infrastructure helps streamline bioinformatics development and research throughout the cancer community. The components of the infrastructure support the development and utilization of semantically interoperable data systems, ensuring that biomedical research data deployed within this framework is consistent and comparable The cancer Bioinformatics Objects model (caBIO) was the initial data system built using this infrastruture and provide a template for building semantically interoperabile systems.

	caGrid The goal of the cancer Biomedical Informatics Grid caBIG™ is to develop applications and the underlying systems architecture that connects together data, tools, scientists and organizations in an open federated environment. To meet this goal, caBIG™ will bring together data from many and diverse data sources. caGrid is the underlying service oriented infrastructure for caBIG™. caGrid enables numerous complex usage scenarios, but its basic technical goals are to: enable universal mechanisms for providing interoperable programmatic access to data and analytics in caBIG™, create a self-described infrastructure wherein the structure and semantics of data can be programmatically determined, and provide a powerful means by which services available in caBIG™ can be programmatically discovered and leveraged. caGrid implements grid technologies and methodologies that enable local organizations to have ultimate control over access and management. The caGrid 0.5 test bed infrastructure was released in September 2005, and included the initial set of software tools to effectively realize the goals of caBIG™. caGrid 1.0, released in December 2006, provided the implementation of the required core services, toolkits and wizards for the development and deployment of community provided services, Application Programming Interfaces (API) for building client applications, and reference implementations of applications and services available in the production grid. caGrid 1.1, released in September 2007, included important security enhancements, based on the security policies and procedures drafted by the caBIG™ Security Working Group. The latest release, caGrid 1.2, represents the continued enhancement of the caGrid Enterprise Architecture. In addition to bug fixes and several new services, it exemplifies efforts toward closer integration within the caGrid Core infrastructure and with other caBIG™ components.
	BRIDG The Biomedical Research Integrated Domain Group (BRIDG) project is a collaborative effort of stakeholders from the Clinical Data Interchange Standards Consortium (CDISC), the HL7 Regulated Clinical Research Information Management Technical Committee (RCRIM TC), the National Cancer Institute (NCI), and the US Food and Drug Administration (FDA) to produce a shared view of the dynamic and static semantics that collectively define a shared domain-of-interest, i.e. the domain of clincial and pre-clinical protocol-driven research and its associated regulatory artifacts.

	Clinical Trials Object Data System The Clinical Trials Object Data System (CTODS) has been developed within the cancer Biomedical Informatics Grid (caBIG™) program to enable the exchange of de-identified clinical trials data across multiple systems while supporting syntactic and semantic interoperability. CTODS provides a single, unified set of Application Programming Interfaces (APIs) that can access clinical data from multiple data sources. The software was created using the cancer Common Ontology Research Environment (caCORE) Software Development Kit (SDK) to provide an initial caBIG “silver” compliant system; this system can be deployed to caGrid, qualifying it for caBIG “gold” status. Also, the system includes and extends security and authorization features offered by the NCICB Common Security Module (CSM).

	cancer Bioinformatics Infrastructure Objects (caBIO) An information model of cancer biomedical objects is created to facilitate the communication and integration of information from the various initiatives supported by caBIG and NCI. The information model is described in UML and semantically annotated with description logic concepts from EVS. The meta-data describing the caBIO domain model is then registered in the caCORE metadata repository (caDSR see below) and implemented using Java 2 Enterprise Edition (J2EE). caBIO Model re-use is supported via a process that involves creation of UML domain models by application developers, annotating these model using the same description logic concepts from EVS and registration in a common meta-data repository known as caDSR. caCORE provides services for accessing caBIO data. The caBIO Java package is part of the caCORE client.jar and can be downloaded and used locally to retrieve data from NCI servers using Java-RMI. Alternatively, caBIO object data can be accessed using SOAP-XML or simple HTTP APIs from any programming environment.

	caCORE caCORE is the open source group of software products developed by the NCI Center for Bioinformatics (NCICB). By providing a common data management and application development framework, caCORE helps streamline the informatics development throughout the cancer community. The components of caCORE support the development and utilization of semantically interoperable data systems, ensuring that biomedical research data deployed within this framework is consistent and comparable. caCORE version 3.2 is available as of December 22, 2006. All caCORE resources, including the EVS vocabulary and caDSR metadata content, are dynamically accessible through web services and application programming interfaces (APIs).. This feature sets the stage for full realization of the caCORE vision: consistency, clarity, and comparability of biomedical research information. The caCORE 3.2 Release Notes describe what's new in version 3.2. Detailed information on caCORE architecture and content can be found in the caCORE Technical Guide. The major components of the caCORE infrastructure are: caGRID (see above), EVS, caDSR and caCORE SDK. The caBIO application provides the initial set of biomedical informatics objects upon which the caBIG™ continues to expand.

		cancer Data Standards Repository (caDSR) NCI supports a broad initiative to standardize the meta-data used for cancer research in the form of common data descriptors. These Common Data Elements (CDEs) are developed by caBIG& participants and various NCI-sponsored organizations, then centrally stored and managed at NCI in the Cancer Data Standards Repository (caDSR), an ISO 11179 compliant metadata registry, with extensions to support the caBIG community. The caDSR has web interfaces for developing content and, like the caBIO service, can be accessed using the caCORE client.jar, SOAP-XML web services or simple HTTP APIs from any programming environment.
		NCI Enterprise Vocabulary Services (EVS) At the foundation of the "standards stack" and semantic interoperabiity is controlled vocabulary. The NCI is meeting this need through a diverse collection of Enterprise Vocabulary Services (EVS). Standard vocabularies are developed for a variety of settings in the life sciences. We also work with vendors to create and improve tools for vocabulary development and curation. The NCI EVS is a collaborative effort of the Center for Bioinformatics and the NCI Office of Communications. The NCI Thesaurus, which is a biomedical thesaurus created specifically to meet the needs of the NCI, is produced by the NCI EVS project. The NCI Thesaurus is created using description logic concepts that describe the terminology needed for cancer clinical trails and biomedical research. The NCI Thesaurus is provided under an open content license. The EVS Project also produces the NCI Metathesaurus, which is based on NLM's Unified Medical Language System Metathesaurus supplemented with additional cancer-centric vocabulary. In addition the EVS Project provides NCI with licenses for MedDRA, SNOMED, ICD-O-3, and other proprietary vocabularies.
		caCORE Software Developers Kit (SDK) The caCORE Software Development Kit (SDK), developed by NCICB, is a set of tools to aid in the design and creation of a “caCORE-like” software system. caCORE-like means the software system is “semantically integrated” -- all exposed API elements have runtime accessible metadata defining the meaning of the elements using controlled terminology, also available at runtime. To achieve this integration, “caCORE-like” systems follow certain design practices including UML Modeling, n-tier architecture with Open APIs, Controlled Vocabularies, and Registered metadata. The SDK utilizes three tools to aid in the construction of 'caCORE-like' systems: *Semantic Integration Workbench(SIW)* supplies a set of user interfaces to assist in creating a semantically interoperable system by annotating the system's UML Model represented in XMI. Description logic concepts are associated with each element and used when registering the model in the caDSR metadata repository. SIW runs as a Java Webstart application launched from http://cadsrsiw.nci.nih.gov; *Code Generator* takes the same XMI representation of the UML model and creates a functional Java software system using Java JET technology. This software system runs in an appropriate web services container such as Apache Tomcat; and finally, the *UML Loader* takes the SIW software system's 'semantically annotated' XMI file and transforms it into registered caDSR metadata, programmatically leveraging the semantic annotation to recognize and reuse existing content. The combination of using semantically annotated XMI representations of the UML Model representing the software system, transformed and harmonized in caDSR registered metadata as the basis for generated APIs is what is referred to as 'semantically annotated APIs'. NCICB loads semantically annotated XMI files for users that have created systems via the caCORE SDK. Complete documentation can be found at the caCORE SDK link above.

Infrastructure Projects

LexBIG
The LexBIG vocabulary services represents a compressive set of software and services to load, publish, and access vocabulary. Cancer Centers can use the LexBIG package to install NCI Thesaurus and NCI Metathesaurus content queryable via a rich application programming interface (API). LexBIG services can be used in numerous applications wherever vocabulary content is needed. LexBIG vocabulary services provides caBIG™ with:

A flexible implementation for vocabulary storage and persistence, allowing for alternative mechanisms without impacting client applications or end users.
Standard tooling for load and distribution of vocabulary content. This includes, but is not limited to, support of standardized representations such as UMLS Rich Release Format (RRF), the OWL web ontology language, and Open Biomedical Ontologies (OBO).

Vocabulary Development
The caBIG™ community is actively involved in developing standard terminology for use by researchers and clinicians. This terminology is formally described using description logic concepts that are programmatically accessible to develop, implement and discover services and data that are interoperable, and relationships in data that were previously hidden. Subject matter experts from within the our community focus on specific vocabulary needs and work together with NCI lexicographers and vocabulary experts to expand the EVS infrastructure to meet the needs of the community.

Mouse Human Anatomy Mapping Ontology
The Mouse-Human Anatomy Project (MHAP) provides a mapping and harmonization of Human and Mouse anatomical descriptors as they are currently used for murine and human models by Mouse Genome Informatics and the NCI Thesaurus. This ontology will facilitate closer integration of human and mouse cancer data, promote the use of the mouse as a model for cancer research and to accelerate translation of basic research discoveries into new clinical therapies.
Cancer Nutrition Ontology
The purpose of the Cancer Nutrition Ontology project is to provide a publicly available, unified set of nutrition vocabularies, based on external standards, that would provide for vocabulary/conceptual uniformity across applications. The need for a nutrition ontology is driven from studies that search for nutritional factors that alter the risk of getting cancer. Clinical trials study chemopreventative agents and primary or adjuvant therapeutic agents, such as SELECT (Selenium, Vitamin E and prostate cancer). Development of this ontology involved gathering input from numerous experts and sources including USDA, InFoods, NCI Office of Dietary Supplements, IUPAC, University of Hawaii and others.

An open source platform
The NCICB has made a committment to open source software and open systems. We make most of our own caCORE technologies available under an open source licensing mechanism, and we use open source technologies wherever practical. caCORE is powered by a number of Apache projects, including: HTTP Server, Tomcat, Ant, Struts, OJB, Xalan, Xerces, XML-RPC, Jakarta Commons, and POI. We use ZOPE and Plone for web site management and JUnit for unit testing.

last modified 05-07-2008 03:21 PM

Related terms

Document Actions

Infrastructure: The caBIG Informatics Backbone

Infrastructure Projects