Primary Navigation for the CDC Website
CDC en Español

Search:  

News & Highlights

Improved model-based, platform-independent feature extraction for mass spectrometry.

Noy K, Fasulo C.
Bioinformatics 2007;23:2528-2535 doi:10.1093/bioinformatics/btm385.

Summary

Following the 2005 Cold Spring Harbor - Banbury Center CFS Computational Challenge (C3) Workshop, CDC provided data sets from the Wichita in-hospital clinical study to Duke University for use in the Sixth International Conference for the Critical Assessment of Microarray Data Analysis (CAMDA 2006).  Duke University founded CAMDA to provide a forum to critically assess different techniques used in microarray data mining.  CAMDA’s aim is to establish the state-of-the-art in microarray data mining and to identify progress and highlight the direction for future effort.  CAMDA utilizes a community-wide experiment approach, letting the scientific community analyze the same standard data sets.  Researchers worldwide are invited to take the CAMDA challenge and those whose results are accepted are invited to present a 25 minute oral presentation.  The 2006 CAMDA was the first to use a single common challenge data set, which contained all clinical, gene expression, SNP, and proteomics data from the Wichita clinical study.

To date 10 peer reviewed publications have resulted from the CAMDA challenge.  This publication was a collaborative effort between Siemens Corporate Research, Princeton, New Jersey and Ben Gurion University of the Negev, Beer Sheva, Israel.  The authors evaluated masss spectroscopy (SELDI) data from the Wichita in-hospital study and developed mathematic methods to prepare the data for interpretation.

Abstract

Motivation: Mass spectrometry (MS) is increasingly being used for biomedical research. The typical analysis of MS data consists of several steps. Feature extraction is a crucial step since subsequent analyses are performed only on the detected features. Current methodologies applied to low-resolution MS, in which features are peaks or wavelet functions, are parameter-sensitive and inaccurate in the sense that peaks and wavelet functions do not directly correspond to the underlying molecules under observation. In high resolution MS, the model-based approach is more appealing as it can provide a better representation of the MS signals by incorporating information about peak shapes and isotopic distributions. Current model-based techniques are computationally expensive; various algorithms have been proposed to improve the computational efficiency of this paradigm. However, these methods cannot deal well with overlapping features, especially when they are merged to create one broad peak. In addition, no method has been proven to perform well across different MS platforms.

Results: We suggest a new model-based approach to feature extraction in which spectra are decomposed into a mixture of distributions derived from peptide models. By incorporating kernel based smoothing and perceptual similarity for matching distributions, our statistical framework improves existing methodologies in terms of computational efficiency and the accuracy of the results. Our model is parameterized by physical properties and is therefore applicable to different MS instruments and settings. We validate our approach on simulated data, and show that the performance is higher than commonly used tools on real high- and low-resolution MS, and MS/MS data sets.

Page last modified on October 24, 2007


Topic Contents

• Topic Contents


Additional Navigation for the CDC Website

“Safer Healthier People”
Centers for Disease Control and Prevention, 1600 Clifton Rd, Atlanta, GA 30333, USA
Tel: 404-639-3311  /  Public Inquiries: (404) 639-3534  /  (800) 311-3435