NCI Header
caBIG

Home  |  Search GO

 

 

 

 

 
www.cancer.gov National Cancer Institute National Human Genome Research Institute

Streamlining Genomics Research
Columbia University Leads Development of Integrated Analysis Platform

The December 2007 issue of Science magazine labeled human genetic variation the "breakthrough of the year." The flood of data to illuminate disease is highly promising, but such rapid discoveries present significant analytical challenges for biologists trying to make sense of the complexity and volume of research data. Tasks such as juggling multiple analyses programs on the desktop, manually combining data from multiple sources, and/or managing computationally intensive programs locally can be daunting without the proper tools and infrastructure at your fingertips.

These challenges inspired the National Cancer Institute to support the development of geWorkbench, a desktop bioinformatics platform that enables users to pull together analysis and visualization tools to create a customized integrative genomics solution. This integrated platform provides the following benefits:

  • Use of one, unified graphical interface
  • Ability to combine data quickly from multiple datasets
  • Use of analysis and visualization tools for gene expression, sequences, pathways, and other biomedical data
  • Access to a vast array of computational analysis and visualization tools, such as t-test, hierarchical clustering, self organizing maps, regulatory networks reconstruction, cellular network visualization, BLAST searches, and pattern/motif discovery

Accessing analysis resources through geWorkbench and caGrid

At the Center for Computational Biology and Bioinformatics (C2B2), an interdepartmental center at Columbia University, doctors Aris Floratos and Andrea Califano are leading the charge to establish geWorkbench as a portal to all caGrid-enabled services.

By connecting geWorkbench to caGrid, the underlying infrastructure of caBIG™, Floratos and Califano believe that Columbia will achieve a higher level of analytical power by tapping into analytical services from around the globe—essentially a "World Wide Web" of biomedical resources.

"geWorkbench allows you to access resources that are not directly integrated on your desktop or on your local systems. It basically implements the notion of ‘gridification’ where your desktop can actually control a vast array of resources that may be distributed all over the world," Califano explains.

The C2B2 team uses case studies to test how data sets can be analyzed using Grid services. In one example, B-cells obtained from patients with different types of leukemias and lymphomas were analyzed using microarrays (Basso et al., 2005). To study the gene expression profiles of these cells, a researcher can use geWorkbench to access analytical and visualization tools. To search for tools beyond those found locally in geWorkbench, a researcher can click on the option to search for services on the Grid (Figure 1). In this illustration, the Grid service "Hierarchical Clustering" is retrieved, and an analysis of the data is performed (Figure 2).

"These services happen behind the scenes: geWorkbench will interact silently with the appropriate components of the caGrid infrastructure to collect the necessary information, invoke a Grid service, and retrieve the analysis results. The user never gets exposed to any of the complexities of the Grid," Floratos explains.

For example, a researcher using geWorkbench can reverse engineer a regulatory network using powerful, albeit remote computational resources from the Grid while using the local limited power of the desktop to identify the genes in the network that are differentially expressed in a particular cellular phenotype. Once specific services have been "registered" with the platform, these processes happen seamlessly without the researcher having to worry about what is happening locally versus what is happening on the Grid.

screenshot

Figure 1: geWorkbench users can easily choose to view "Local" or "Grid" services. The URL for "Hierarchical Clustering" through Grid Services is retrieved. (Courtesy: Aris Floratos)


Expanding geWorkbench on the Grid

The value of geWorkbench will grow exponentially as more institutions connect to the Grid and start sharing improved and alternative data management applications. Tools used to integrate structural information are also in development and will provide the first environment where structural, functional, and genomic data can be analyzed seamlessly.

Currently, more than 40 analytical programs have been developed for the geWorkbench framework, and developers predict that this number will grow in the coming months.

"At present we are working actively towards integrating tools for protein structure prediction," Floratos explains. "It is our goal to caGrid-enable many of the advanced analytical tools developed by investigators at C2B2 so that researchers can build increasingly more efficient and more robust analytics platforms through geWorkbench, as well as integrate and cross-interoperate with many other useful tools developed by the research community."

Examples of community tools that have been made interoperable with geWorkbench include GenePattern, an advanced platform for genomic data analysis developed at the Broad Institute, and Cytoscape, a network visualization and analysis platform developed by a consortium of academic institutions.


Figure 2: The hierarchical clustering analysis performed using geWorkbench Grid Services allows researchers to group genes by expression levels in samples from different cancers—one step towards developing cancer biomarkers. (Courtesy: Aris Floratos)

Learn more:

For more information on geWorkbench, please visit: https://cabig.nci.nih.gov/tools/geWorkbench/?searchterm=geWorkbench.

 

National Cancer InstituteNational Institutes of HealthDepartment of Health and Human ServicesFirstGov.gov