NCI Header
caBIG

Home  |  Search GO

 

 

 

 

 
www.cancer.gov National Cancer Institute National Human Genome Research Institute

GenePattern: Integrated Genomics

Genomic data analysis is one of the most complex and vital tasks in today’s biomedical research. The challenge of solving the molecular mysteries of diseases like cancer is being assisted by powerful, but intuitive, software tools. GenePattern, produced by the Broad Institute and now integrated into caBIG™, is one such software platform.

Widely acclaimed as a powerful platform for multidisciplinary genomic analysis, GenePattern enables researchers to more efficiently conduct complex, multi-step gene expression analysis and reproduce research methodologies—a critical part of successful investigations. The software is also constructed in such a way that researchers themselves can customize it to incorporate data sets and additional software applications. As of May 2007, there were over 4,500 registered users of GenePattern, which has been freely available from the Broad Institute since January 2004.

GenePattern developers have been working as part of the caBIG™ Integrative Cancer Research (ICR) Workspace to ensure integration of GenePattern with caGrid. caGrid is the underlying service-oriented architecture that enables seamless data access, exchange and consumption by key software applications in caBIG™. caGrid manages and securely shares information through common vocabularies so that different research tools can "talk" to each other.

Several GenePattern modules are currently accessible through caGrid, and more modules are expected to be added soon. The currently available modules enable researchers to do the following:

  1. Preprocess Dataset: perform statistical manipulations of microarray data that are commonly performed before analysis in other modules.
  2. Consensus Clustering: provide a computational analysis of the most stable and consistent way to group similar genes together.
  3. Comparative Marker Selection: determine genes that can discriminate between distinct classes of samples or phenotypes.

To learn more about the GenePattern analytic services available via caGrid, read the GenePattern Grid Extensions End User Manual.

Unique Features

Ted Liefeld, Senior Software Architect for the Broad Institute’s Cancer Program, is one of the developers of GenePattern. According to Liefeld, "GenePattern has been very well received in the cancer community." Liefeld, who is also the lead for the Analysis Services and Best Practices Working Group within the ICR Workspace of caBIG™, added, "GenePattern makes it much easier for the cancer research community to integrate a lot of tools and multiple data sets into a single environment without requiring a software engineer."

One feature unique to GenePattern is the ability to track the different steps that a researcher takes when analyzing a set of data. These computational and research steps, known as a "pipeline," are stored along with their parameters on the GenePattern server. Researchers with access to the dataset can request that the same analytical steps, or "pipeline," be repeated and/or combined with other pipelines. Reproducibility, an essential feature of many successful research projects, can be easily accomplished using GenePattern. Also, the ability to string together pipelines opens the door to new discoveries by enabling the exploration of increasingly complex hypotheses.

For more information and to download GenePattern click here.

Accelerating Discoveries

Investigators performing cutting-edge cancer research projects at the Broad Institute have made use of the many features of GenePattern. Jun Lu, Ph.D., at the Broad Institute’s Cancer Program, is the co-lead author of a study of novel means of creating low-cost cancer diagnostic tests through microRNA (Lu et al., Nature 2005). According to Dr. Lu, GenePattern, which is an essential tool for his study, "is a good integrated solution for bioinformatics. The output from one analysis can be easily loaded as input for another, without the hassle of file format compatibility issues. Both translate into time saving. GenePattern dramatically lowers the energy barrier for a biologist to get into bioinformatics analysis." Dr. Lu, who has as many as 30 research collaborations, finds that the ease of use of GenePattern allows him to send data to his collaborators who can download the GenePattern software and perform data analyses themselves, thus saving time.

Ben Ebert, M.D., also at the Broad Institute, has used GenePattern to classify cancer types according to gene expression (Ebert and Golub, Blood 2004), to analyze groups of genes in leukemia and lung cancer (Tamayo, et al., PNAS 2007), and to identify a gene expression signature that predicts response to Lenalidomide, a new cancer drug (paper in submission). He finds GenePattern easy to use and remarks, "the key is that I am able to do analyses that would otherwise require writing my own computer code."

"Many other uses for GenePattern are being or will be put into effect soon," said ICR Workspace lead Subha Madhavan. "One example is the National Institutes of Health’s (NIH) The Cancer Genome Atlas (TCGA), which has already begun to use two modules from GenePattern to analyze single nucleotide polymorphism (SNP) array datasets from glioblastoma samples."

Future of GenePattern on caGrid

GenePattern is well established in the biomedical research community as a valuable gene expression analysis tool. Now, through caBIG™, GenePattern will see even wider and more powerful usage. Researchers can seamlessly connect datasets, such as those available through caArray—a caGrid-enabled microarray data repository—to GenePattern modules for analysis. As more and more datasets are contributed across the grid, genomic analysis conducted with tools like GenePattern will have even greater statistical significance—accelerating the pace of life-saving discoveries.

Learn more:

 

National Cancer InstituteNational Institutes of HealthDepartment of Health and Human ServicesFirstGov.gov