Skip Navigation
caBIG Core Concepts —
National Cancer Institute   U.S. National Institutes of Health www.cancer.gov
 
Home » About caBIG » caBIG Core Concepts
Document Actions

caBIG Core Concepts

There are a number of vital concepts to both understand and connect with caBIG™ . They are introduced briefly here.

Interoperability

Interoperability is the ability of information systems to both access and appropriately use data from a remote data resource. Interoperability is a key goal of the caBIG™ initiative.

There are two equally important requirements for interoperability.

  • Systems must “understand” the data exchanged. This “understanding” requires shared data models which, in turn, depend on standard (controlled) vocabularies and Common Data Elements (CDEs), This is called semantic interoperability.
  • Systems must be able to exchange data through shared interfaces - this is called syntactic interoperability.


caBIG™ Compatibility

A “caBIG™ compatible” tool is one that can be interoperable with other tools in the caBIG™ program.

To determine compatibility, four specific areas are assessed:

  • Information/Data Models
  • Controlled Vocabularies
  • Common Data Elements (CDE)
  • Programming and Messaging Interfaces

The caBIG™ community recognizes that there can be differing degrees of interoperability between systems; these are qualified in terms of four maturity levels: Legacy, Bronze, Silver and Gold.

caCORE Training - The purpose of caCORE Training is to teach members of the NCICB and caBIG™ community how the caDSR fits into the caCORE Infrastructure, the role of the CORE Infrastructure in caBIG™ compatibility, and how to use caDSR tools to search for, retrieve, analyze, and curate caDSR common data elements (CDEs). Tools covered include the CDE Browser, UML Model Browser, Curation Tool, Form Builder Tool, Sentinel Tool, caCORE Software Development Kit (SDK), and the Semantic Integration Workbench (SIW).


Information/Data Models

Information Models are developed to represent the interfaces of a system. Also called data models, they describe the relationships between the common data elements (CDEs) in a domain.

For example, an information model might specify that a patient can have multiple specimens, but that each specimen can have only one storage location.

caBIG™ uses the approach and tool called the Unified Modeling Language (UML) class diagrams to create data models.


Controlled Vocabularies

A controlled vocabulary is an agreed-upon set of standard terminology. For example, a controlled vocabulary might define “study” as “a detailed critical inspection” (rather than, say, “a room devoted to literary pursuits”).

Controlled vocabularies are one of three elements needed to provide shared meaning to data across systems. (The other two are Common Data Elements and Information/Data Models.) If we both use the same controlled vocabulary, the terms in my system will match those in yours.

One source for caBIG™ vocabularies is the NCI Thesaurus, an expanding, dynamic repository of over 60,000 terms.


Common Data Elements (CDE)

To provide shared meaning, there must be agreement on how the data will be represented. Common Data Elements (CDEs) use shared vocabularies and standard values and formats to define how data are to be collected. Controlled vocabularies provided a shared meaning, whereas CDEs provide a structure to that shared meaning.

Every CDE definition includes both data concepts (from a shared vocabulary) and information on how the data should be represented. For example, "Social Security Number" is defined as "a unique number issued by the United States Social Security Administration" and the format of this number is ###-##-####.

CDEs are described in a centralized electronic resource called the Cancer Data Standards Repository (caDSR) and can be searched via the CDE Browser.


Application Programming Interfaces (API)

Computer programs access resources from other programs through programming and messaging interfaces, also called Application Programming Interfaces (API). Agreement upon standards for these interfaces is necessary for interoperability.

The programming and messaging interface addresses the access part of interoperability. APIs are generated from the Information Model.


caGrid

caGrid is the underlying network architecture and platform that provides the basis for connectivity of caBIG™ tools. For example, it provides:

  • A "yellow pages" service for advertising and discovering data and analytical services connected to the Grid"
  • A security infrastructure allowing the owner of a tool or data to control who is allowed access to it.

The work done to develop a semantically rich API, based on an annotated information model and the metadata generated form that model, allow for the advertisement, discovery and querying of data services on caGrid.

Visit the caGrid Website.

last modified 08-28-2008 02:46 PM