Primary Navigation for the CDC Website
CDC en Español

Search:  

News & Highlights

Bayesian biomarker identification based on marker-expression proteomics data.

Bhattacharjee M, Botting CH, Sillanpää MJ.
Genomics (2008) doi:10.1016/j.ygeno.2008.06.006

Summary

Following the 2005 Cold Spring Harbor - Banbury Center CFS Computational Challenge (C3) Workshop, CDC provided data sets from the Wichita in-hospital clinical study to Duke University for use in the Sixth International Conference for the Critical Assessment of Microarray Data Analysis (CAMDA 2006).  Duke University founded CAMDA to provide a forum to critically assess different techniques used in microarray data mining.  CAMDA’s aim is to establish the state-of-the-art in microarray data mining and to identify progress and highlight the direction for future effort.  CAMDA utilizes a community-wide experiment approach, letting the scientific community analyze the same standard data sets.  Researchers worldwide are invited to take the CAMDA challenge and those whose results are accepted are invited to present a 25 minute oral presentation.  The 2006 CAMDA was the first to use a single common challenge data set, which contained all clinical, gene expression, SNP, and proteomics data from the Wichita clinical study.

To date 10 peer reviewed publications have resulted from the CAMDA challenge.  This publication from Dr. Bhattacharjee’s group at the University of St Andrews, Fife, Scotland utilized Bayesian statistical theory to attempt to identify diagnostic markers for CFS.  Their approach is unique because they used information from microarrays (gene expression), genetic markers (SNP data), and data on proteins in the blood (SELDI-TOF)  mathematical modeling approach to construct theoretical correlation networks based on gene activity levels.  The findings are what would be expected in persistent inflammation were involved in CFS.  This paper did not identify diagnostic markers but is extremely important because it explores aspects of the analytic technique in detail.

Abstract

We are studying variable selection in multiple regression models in which molecular markers and/or gene expression measurements as well as intensity measurements from protein spectra serve as predictors for the outcome variable (i.e., trait or disease state).  Finding genetic biomarkers and searching genetic epidemiological factors can be formulated as a statistical problem of variable selection, in which, from a large set of candidates, a small number of trait-associated predictors are identified.  We illustrate our approach by analyzing the data available for chronic fatigue syndrome (CFS).  CFS is a complex diseases from several aspects, e.g., it is difficult to diagnose and difficult to quantify.  To identify biomarkers we used microarray data and SELDI-TOFD-based proteomics data.  We also analyzed genetic marker information for a large number of SNPs for an overlapping set of individuals.  The objectives of the analyses were to identify markers specific to fatigue that are also possibly exclusive to CFS.  The use of such models can be motivated for example, by the search for new biomarkers for the diagnosis and prognosis of cancer and measures of response to therapy.  Generally, for this we use Bayesian hierarchical modeling and Markov Chain Monte Carlo computation.

Page last modified on January 14, 2009


Topic Contents

• Topic Contents


Additional Navigation for the CDC Website

“Safer Healthier People”
Centers for Disease Control and Prevention, 1600 Clifton Rd, Atlanta, GA 30333, USA
Tel: 404-639-3311  /  Public Inquiries: (404) 639-3534  /  (800) 311-3435