caGrid 1.0
The goal of cancer Biomedical Informatics Grid caBIG™ is to develop applications and the underlying systems architecture that connects together data, tools, scientists and organizations in an open federated environment. In meeting this goal, caBIG will necessarily bring together data from many and diverse data sources. The underlying service oriented infrastructure for caBIG is caGrid. The first public version (0.5) of caGrid was released on September 9, 2005. caGrid 1.0 culminates the development of the federated infrastructure and will more fully support the needs of the cancer research community. caGrid defines two types of "grid services" that can be registered as nodes on the grid: Data Services and Analytical Services. caGrid provides a standard infrastructure for bioinformaticians to advertise their services thru common metadata defined in Unified Modeling Language (UML) domain information model. Users can access these grid services and data programmatically using locally managed access control policies and using strongly typed data objects in XML format. caGrid infrastructure also provides strong semantic specification thru binding to description logic terminology concepts that can be used by users to discover new and interesting scientific information using semantically aware searches. |
Software and Documentation Links
caGrid 1.0 Installer Instructions | caGrid 1.0 Installer Instructions |
caGrid 1.0 Installer | Install caGrid 1.0 |
caGrid 1.0 Source | Download caGrid 1.0 Source code |
caGrid 1.0 Users Guide | caGrid 1.0 Users Guide |
caGrid 1.0 Programmers Guide | caGrid 1.0 Programmers Guide |
caGrid 1.0 Release Notes | caGrid 1.0 Release Notes |
NCICB Download Site | NCICB Download Site |
caGrid wiki | caGrid wiki |
Project Site
caGrid 1.0 GForge Project Page | caGrid 1.0 - Project Page |
caGrid 1.0 GForge File Release Site | caGrid 1.0 - File Release Site |
caGrid 1.0 GForge Document Release Site | caGrid 1.0 - Document Release Site |
caGrid 1.0 Portal
You can launch caGrid 1.0 Portal that is part of the caGrid 1.0 release. This should be your starting point for monitoring and discovering services that are available in caGrid.
The tool provides visusal display of services on the caGrid infradstructure and also institutions that are participating in the caBIG program.
caGrid 1.0 Browser -Early Preview
The caGrid 1.0 browser is a web-based application that allows users to discover advertised caBIG grid resources and to query those resources for data of interest.
The tool uses caGrid 1.0 supported grid APIs for browsing for advertised services, discovering services based on metadata and registered objects in Cancer Data Standards Repository (caDSR) and concepts from Enterprise Vocabulary Service(EVS) and querying the deployed services using the caBIG XML query language.
Users can access the browser using their existing NCI user accounts (user name and password). For users that don't have NCI accounts, there is a provision in the tool to request for account. However, the approval of user accounts will be done in accordance with the caGrid security policies. The Security Working Group will determine appropriate policies for registering users.
Project Details
The caGrid 1.0 team is comprised of members from the following organizations:
- Ohio State University - Biomedical Informatics Department - Provided Overall Technical Leadership
- University of Chicago/Argonne National Laboratory
- Duke Comprehensive Cancer Center
- ScenPro, Inc
- SemanticBits, LLC
- Science Application International Corporation (SAIC)
- Booz Allen Hamilton - Provided Program Management
- National Cancer Institute Center for Bioinformatics (NCICB) - Provided Government Oversight
Significant number of enhancements has been incorporated into the caGrid 1.0 infrastructure. To mention a few highlights:
- Migrating the underlying infrastructure for supporting services using standard web service resource framework (WSRF) specification
- Complete overhaul of federated security infrastructure to satisfy caBIG security needs, incorporating many of the recommedations made in the caBIG™ Security White Paper Technology Evaluation
- New workflow capabilities to enable orchestration of services using industry standard Business Process Execution Language (BPEL)
- New Federated Query Processing (FQP) capability built in collaboration with the Cancer Translational Research Informatics Platform (caTRIP) project, a caBIG funded project
- Performance and scalability improvements to the services by implementing specifications such as WS-Enumeration into the underlying Globus Toolkit infrastructure
- Provision for grid wide object identifier support capability by integrating with The Handle System® service from Corporation for National Research Initiatives
- Extensive enhacements made to the metadata infrastructure, including standard grid service APIs to Global Model Exchange (GME), Cancer Data Standards Repository (caDSR) and Enterprise Vocabulary Service (EVS)
- Tighter integration with NCICB components used by caBIG funded projects including Common Security Module (CSM) and caCORE Software Development Kit (SDK)
- Development of extensive automated system testing framework to validate various components of the infrastructure
In addition to the above mentioned highlights, caGrid 1.0 infrastructure contains the following tools:
Introduce Toolkit: is a service creation toolkit built by caGrid team. It supports easy developement and deployment of caBIG compatible grid enabled data and analytical services. Introduce toolkit reduces the service developers needing to manage the low level details of the WSRF specification and integration with the Globus Toolkit.
Grid Authentication and Authorization of Reliably Distributed Services (GAARDS): provides services and tools for grid wide administration and security enforcement for services that are deployed on caGrid infrastructure. GAARDS consists of following security components:
- Dorian: allows for the provision and management of user accounts, providing an integration point between external security domains and the grid.
- Grid Grouper: provides a group-based authorization solution, wherein grid services and applications enforce authorization policy based upon group memberships defined and managed at the grid level.
- Grid Trust Services(GTS): provides a mechanism for maintaining and provisioning a federated trust fabric of certified authorities in caGrid
caGrid 1.0 Portal: provides a visual view of services running on the infrastructure. The portal provides:
- Geographic map of nodes runnning on caGrid infrastructure
- caBIG participating institution/ Service Provider information
- Dynamic status updates of grid services
Reference Implementations and Early Adopters
As part of the caGrid 1.0 infrastrucure release, the following projects have been working with the caGrid development team and are at various stages of completing their grid enablement process:
- GeneConnect - Extensible informatics platform that integrates diverse data types and supports interoperable analytic tools - Washington University
- GridImage - Grid application for viewing and evaluating images - Ohio State University
- caBIO - Cancer Bioinformatic Data Service - NCICB
- caArray - Microarray Data Services - NCICB
- caTRIP - Grid application that ties together disparate data resources in a metadata driven fashion – Duke Comprehensive Cancer Center
- GenePattern - GenePattern is a powerful analysis workflow tool developed to support multidisciplinary genomic research programs - Broad Institute
- geWorkBench - Grid enabled platform for integrated genomics - Columbia University
- Bioconductor - Analytical service for gene expression and other high-throughput analysis in molecular biology - Fred Hutchinson Cancer Research Center
External Technologies Used by caGrid
caGrid 1.0 leverages the following existing technologies:
- Globus Toolkit: provides the core grid infrastructure and supports service deployment, service registry, invocation and secure communication -From Globus Alliance
- Mobius GME: provides grid repository for XML Schemas of strongly typed objects transferred on caGrid - From Ohio State University
- Cancer Data Standards Repository (caDSR): provides repository for Common Data Elements and UML models - From National Cancer Institute Center for Bioinformatics
- Enterprise Vocabulary Services (EVS): provides controlled vocabularies - From National Cancer Institute Center for Bioinformatics
- ActiveBPEL™: provides an open source workflow engine whose implementation follows the Business Process Execution Language standard. - From Active Endpoints, Inc.
- The Handle System®: provides a general purpose distributed information system that provides efficient, extensible, and secure identifier and resolution services for use on caGrid – from Corporation for National Research Initiatives
- Grouper: provides ability to manage group information across integrated applications and repositories. – from Internet2
User Information
Subscribe to the caGrid Users Listserv
Contacts
Michael Keller - caBIG Architecture Workspace Lead
Scott Oster - caGrid Lead Architect - Ohio State University
Krishnakant (Avinash) Shanbhag - Director, Core Infrastructure Engineering - NCICB