Primary Navigation for the CDC Website
CDC en Español

Search:  

News & Highlights

Comparison of normalization methods for surface-enhanced laser desorption and ionization (SELD) time-of-flight (TOF) mass spectrometry data.

Meuleman W, Engwegen JYMN, Gast MCW, Beinjen JH, Reinders MJT, Wessels LFA.
BMC Bioinformatics 2008; 9:88 doi:10.1186/1471-2105/9/88
The complete electronic version of this article is available at http://www.biomedcentral.com/1471-2105/9/88

Summary

Following the 2005 Cold Spring Harbor - Banbury Center CFS Computational Challenge (C3) Workshop, CDC provided data sets from the Wichita in-hospital clinical study to Duke University for use in the Sixth International Conference for the Critical Assessment of Microarray Data Analysis (CAMDA 2006).  Duke University founded CAMDA to provide a forum to critically assess different techniques used in microarray data mining.  CAMDA’s aim is to establish the state-of-the-art in microarray data mining and to identify progress and highlight the direction for future effort.  CAMDA utilizes a community-wide experiment approach, letting the scientific community analyze the same standard data sets.  Researchers worldwide are invited to take the CAMDA challenge and those whose results are accepted are invited to present a 25 minute oral presentation.  The 2006 CAMDA was the first to use a single common challenge data set, which contained all clinical, gene expression, SNP, and proteomics data from the Wichita clinical study.

To date 10 peer reviewed publications have resulted from the CAMDA challenge.  This publication from the Information and Communication Theory Group at Delft University of Technology, The Netherlands utilized SELDI-TOF data from the Wichita study and describes the first systematic comparison of a variety of methods to prepare this complex data for analysis.

Abstract

Background: Mass spectrometry for biological data analysis is an active field of research, providing an efficient way of high-throughput proteome screening. A popular variant of mass spectrometry is SELDI, which is often used to measure sample populations with the goal of developing (clinical) classifiers. Unfortunately, not only is the data resulting from such measurements quite noisy, variance between replicate measurements of the same sample can be high as well. Normalization of spectra can greatly reduce the effect of this technical variance and further improve the quality and interpretability of the data. However, it is unclear which normalization method yields the most informative result.

Results: In this paper, we describe the first systematic comparison of a wide range of normalization methods, using two objectives that should be met by a good method. These objectives are minimization of inter-spectra variance and maximization of signal with respect to class separation. The former is assessed using an estimation of the coefficient of variation, the latter using the classification performance of three types of classifiers on real    world datasets representing two-class diagnostic problems. To obtain a maximally robust evaluation of a normalization method, both objectives are evaluated over multiple datasets and multiple configurations of baseline correction and peak detection methods. Results are assessed for statistical significance and visualized to reveal the performance of each normalization method, in particular with respect to using no normalization. The normalization methods described have been implemented in the freely available MASDA R-package.

Conclusion: In the general case, normalization of mass spectra is beneficial to the quality of data. The majority of methods we compared performed significantly better than the case in which no normalization was used. We have shown that normalization methods that scale spectra by a factor based on the dispersion (e.g., standard deviation) of the data clearly outperform those where a factor based on the central location (e.g., mean) is used. Additional improvements in performance are obtained when these factors are estimated locally, using a sliding window within spectra, instead of globally, over full spectra. The underperforming category of methods using a globally estimated factor based on the central location of the data includes the method used by the majority of SELDI users.

Page last modified on October 24, 2007


Topic Contents

• Topic Contents


Additional Navigation for the CDC Website

“Safer Healthier People”
Centers for Disease Control and Prevention, 1600 Clifton Rd, Atlanta, GA 30333, USA
Tel: 404-639-3311  /  Public Inquiries: (404) 639-3534  /  (800) 311-3435