Mouse Initiative Bibliography

Listed below are accepted and in press publications resulting from the work of the two consortia of the NCI Mouse Proteomic Technologies Initiative. These citations are also included in the main Scientific Bibliography of the Clinical Proteomic Technologies for Cancer website and designated with a "mouse models" symbol.


[ expand all abstracts ]   [ collapse all abstracts ]

2006

Biomarkers for cancer screening, diagnosis, and treatment: a systems approach.
Hartwell L, Mankoff D, Paulovich A, Ramsey S, and Swisher E.
Nature Biotechnology.
2006 August.

[ expand abstract ]

Biomarkers measured in a variety of patient samples, including blood, tissue, urine and cerebrospinal fluid, are used in a diverse array of clinical settings. Although many successful biomarkers have been developed to date, advances in genetics and proteomics promise to usher in a new era of abundant, informative biomarkers that could transform the application of molecular biology to human disease. The application of biomarkers to cancer is leading the way because of the unique association of genomic changes in cancer cells with the disease process. Consequently, DNA-based biomarkers are already becoming incorporated into routine patient management and are providing lessons on the value added by appropriate diagnostic tests. Moreover, cancer management illustrates the complexity of the disease process, which can potentially be distinguished through appropriate biomarkers applied to different individuals, different types of disease, the progression of disease states and the multi-step nature of cancer treatment.

Scenarios for the use of biomarker-based diagnostics for cancer include the following: risk assessment, noninvasive screening for early-stage disease, detection and localization, disease stratification and prognosis, response to therapy and, for those in remission, screening for disease recurrence. Cost and potential morbidity increase as we progress along this continuum. Our goals in applying diagnostic tests are (i) to identify persons harboring potentially life-threatening cancers at the earliest stage possible, (ii) to avoid false-positive tests and diagnosing of cancers that would otherwise not threaten a person's well-being to avoid psychological stress and unnecessary treatments, and (iii) to minimize the overall cost of the program. It is unlikely, however, that any single test will perfectly meet all of these goals.

A statistical method for chromatographic alignment of LC-MS data.
Wang P, Coram M, Tang H, Fitzgibbon M, Zhang H, Yi E, Aebersold R, and McIntosh M. Biostatistics. 2006 August 2.
[ expand abstract ]

Integrated liquid-chromatography mass-spectrometry (LC-MS) is becoming a widely used approach for quantifying the protein composition of complex samples. The output of the LC-MS system measures the intensity of a peptide with a specific masscharge ratio and retention time. In the last few years, this technology has been used to compare complex biological samples across multiple conditions. One challenge for comparative proteomic profiling with LC-MS is to match corresponding peptide features from different experiments.

In this paper, we propose a new method--Peptide Element Alignment (PETAL) that uses raw spectrum data and detected peak to simultaneously align features from multiple LC-MS experiments. PETAL creates spectrum elements, each of which represents the mass spectrum of a single peptide in a single scan. Peptides detected in different LC-MS data are aligned if they can be represented by the same elements. By considering each peptide separately, PETAL enjoys greater flexibility than time-warping methods. While most existing methods process multiple data sets by sequentially aligning each data set to an arbitrarily chosen template data set, PETAL treats all experiments symmetrically and can analyze all experiments simultaneously. We illustrate the performance of PETAL on example data sets.

Adenomatous polyposis coli (APC) is required for normal development of skin and thymus.
Kuraguchi M, Wang X, Bronson R, Rothenberg R, Ohene-Baah N, Lund J, Kucherlapati M, Maas R, and Kucherlapati R.
PLOS Genetics.
2006 July 28.

[ expand abstract ]

The tumor suppressor gene Apc (adenomatous polyposis coli) is a member of the Wnt signaling pathway that is involved in development and tumorigenesis. Heterozygous knockout mice for Apc have a tumor predisposition phenotype and homozygosity leads to embryonic lethality. To understand the role of Apc in development we generated a floxed allele. These mice were mated with a strain carrying Cre recombinase under the control of the human Keratin 14 (K14) promoter, which is active in basal cells of epidermis and other stratified epithelia. Mice homozygous for the floxed allele that also carry the K14-cre transgene were viable but had stunted growth and died before weaning. Histological and immunochemical examinations revealed that K14-cre mediated Apc loss resulted in aberrant growth in many ectodermally derived squamous epithelia including hair follicles, teeth and oral and corneal epithelia. In addition, squamous metaplasia was observed in various epithelial-derived tissues including the thymus. The aberrant growth of hair follicles and other appendages as well as the thymic abnormalities in K14-cre; ApcCKO/CKO mice suggest Apc gene is crucial in embryonic cells to specify epithelial cell fates in organs that require epithelial-mesenchymal interactions for their development.

General framework for developing and evaluating database scoring algorithms using the TANDEM search engine.
MacLean B, Eng J, Beavis R, and McIntosh M.
Bioinformatics.
2006 July 28.

[ expand abstract ]

MOTIVATION: Tandem mass spectrometry (MS/MS) identifies protein sequences using database search engines, at the core of which is a score that measures the similarity between peptide MS/MS spectra to a protein sequence database. The TANDEM application was developed as a freely available database search engine for the proteomics research community. To extend TANDEM as a platform for further research on developing improved database scoring methods, we modified the software to allow users to redefine the scoring function and replace the native TANDEM scoring function while leaving the remaining core application intact. Redefinition is performed at run time so multiple scoring functions are available to be selected and applied from a single search engine binary. We introduce the implementation of the pluggable scoring algorithm and also provide implementations of two TANDEM compatible scoring functions, one previously described scoring function compatible with PeptideProphet and one very simple scoring function that quantitative researchers may use to begin their development. This extension builds on the open-source TANDEM project and will facilitate research into and dissemination of novel algorithms for matching MS/MS spectra to peptide sequences. The pluggable scoring schema is also compatible with related search applications P3 and Hunter, which are part of the X! suite of database matching algorithms. The pluggable scores and the X! suite of applications are all written in C++. AVAILABILITY: Supplementary materials, including source code for the scoring functions, are available from http://proteomics.fhcrc.org.

A reagent resource to identify proteins and peptides of interest to the cancer community: A workshop report.
Haab B, Paulovich A, Anderson N, Clark A, Downing G, Hermjakob H, Labaer J, and Uhlen M.
Molecular and Cellular Proteomics.
2006 Jul 24.

[ expand abstract ]

On the basis of discussions with representatives from all sectors of the cancer research community, the NCI recognizes the immense opportunities to apply proteomic technologies to further cancer research. Validated and well-characterized affinity capture reagents (e.g., antibodies, aptamers, affibodies) will play a key role in proteomic research platforms for the prevention, early detection, treatment, and monitoring of cancer. To discuss ways to develop new resources and optimize current opportunities in this area, the National Cancer Institute (NCI) convened the "Proteomic Technologies Reagents Resource Workshop" in Chicago, IL on December 12-13, 2005. The workshop brought together leading scientists in proteomic research to discuss model systems for evaluating and delivering resources for reagents to support mass spectrometry (MS) and affinity capture platforms. Speakers discussed issues and identified action items related to an overall vision for and proposed models for a shared proteomics reagents resource, applications of affinity capture methods in cancer research, quality control and validation of affinity capture reagents, considerations for target selection, and construction of a reagents database. The meeting also featured presentations and discussion from leading private-sector investigators on state-of-the-art technologies and capabilities to meet the user community's needs. This workshop was developed as a component of the NCI's Clinical Proteomics Technologies Initiative for Cancer (CPTI ) a coordinated initiative that includes the establishment of reagent resources for the scientific community. This workshop report explores various approaches to develop a framework that will most effectively fulfill the needs of the NCI and the cancer research community.

Analysis of Acrylamide Labeled Serum Proteins by LC-MS/MS.
Faca V, Coram M, Phanstiel D, Glukhova V, Zhang Q, Fitzgibbon M, McIntosh M, and Hanash S.
Journal of Proteome Research.
2006 July 13.

[ expand abstract ]

Isotopic labeling of cysteine residues with acrylamide was previously utilized for relative quantitation of proteins by MALDI-TOF. Here, we explored and compared the application of deuterated and (13) C isotopes of acrylamide for quantitative proteomic analysis using LC-MS/MS and high-resolution FTICR mass spectrometry. The method was applied to human serum samples that were immunodepleted of abundant proteins. Our results show reliable quantitation of proteins across an abundance range that spans 5 orders of magnitude based on ion intensities and known protein concentration in plasma. The use of (13)C isotope of acrylamide had a slightly greater advantage relative to deuterated acrylamide, because of shifts in elution of deuterated acrylamide relative to its corresponding nondeuterated compound by reversed-phase chromatography. Overall, the use of acrylamide for differentially labeling intact proteins in complex mixtures, in combination with LC-MS/MS provides a robust method for quantitative analysis of complex proteomes.

Quality control metrics for LC-MS feature detection tools demonstrated on Saccharomyces cerevisiae proteomic profiles.
Piening B, Wang P, Bangur C, Whiteaker J, Zhang H, Feng L-C, Keane J, Eng J, Tang H, Prakash A, McIntosh M, and Paulovich A.
Journal of Proteome Research.
2006 July; 5(7): 1527-1534.

[ expand abstract ]

Quantitative proteomic profiling using liquid chromatography-mass spectrometry is emerging as an important tool for biomarker discovery, prompting development of algorithms for high-throughput peptide feature detection in complex samples. However, neither annotated standard data sets nor quality control metrics currently exist for assessing the validity of feature detection algorithms. We propose a quality control metric, Mass Deviance, for assessing the accuracy of feature detection tools. Because the Mass Deviance metric is derived from the natural distribution of peptide masses, it is machine-and proteome-independent and enables assessment of feature detection tools in the absence of completely annotated data sets. We validate the use of Mass Deviance with a second, independent metric that is based on isotopic distributions, demonstrating that we can use Mass Deviance to identify aberrant features with high accuracy. We then demonstrate the use of independent metrics in tandem as a robust way to evaluate the performance of peptide feature detection algorithms. This work is done on complex LC-MS profiles of Saccharomyces cerevisiae which present a significant challenge to peptide feature detection algorithms.

Mass Spectrometry-Based Study of the Plasma Proteome in a Mouse Intestinal Tumor Model.
Hung K, Kho A, Sarracino D, Georgeon R, Krastins B, Forrester S, Haab B, Kohane I, and Kucherlapati R.
Journal of Proteome Research.
2006 June 27.

[ expand abstract ]

Early detection of cancer can greatly improve prognosis. Identification of proteins or peptides in the circulation, at different stages of cancer, would greatly enhance treatment decisions. Mass spectrometry (MS) is emerging as a powerful tool to identify proteins from complex mixtures such as plasma that may help identify novel sets of markers that may be associated with the presence of tumors. To examine this feature we have used a genetically modified mouse model, Apc(Min), which develops intestinal tumors with 100% penetrance. Utilizing liquid chromatography-tandem mass spectrometry (LC-MS/MS), we identified total plasma proteome (TPP) and plasma glycoproteome (PGP) profiles in tumor-bearing mice. Principal component analysis (PCA) and agglomerative hierarchial clustering analysis revealed that these protein profiles can be used to distinguish between tumor-bearing Apc(Min) and wild-type control mice. Leave-one-out cross-validation analysis established that global TPP and global PGP profiles can be used to correctly predict tumor-bearing animals in 17/19 (89%) and 19/19 (100%) of cases, respectively. Furthermore, leave-one-out cross-validation analysis confirmed that the significant differentially expressed proteins from both the TPP and the PGP were able to correctly predict tumor-bearing animals in 19/19 (100%) of cases. A subset of these proteins was independently validated by antibody microarrays using detection by two color rolling circle amplification (TC-RCA). Analysis of the significant differentially expressed proteins indicated that some might derive from the stroma or the host response. These studies suggest that mass spectrometry-based approaches to examine the plasma proteome may prove to be a valuable method for determining the presence of intestinal tumors.

Compression of LC/MS Proteomic Data.
Miguel A, Keane J, Whiteaker J, Zhang H, and Paulovich A.
Proceedings of the 19th IEEE International Symposium on Computer-Based Medical Systems.
Conference held 2006 June 22-23; 925-930.

[ expand abstract ]

The unrelenting growth ofmass spectrometry (MS) based proteomic data to gigabytes per sample and terabytes per experiment motivates this investigation into compression methods suited to MS signal sources. The data for this study was derived from peptides of hand-mixed protein samples passed through a high performance liquid chromatography system (HPLC) and an electrospray ionization time-of-flight (ESI-TOF) mass spectrometer. Several lossless data compression methods were applied and yielded up to a 25:1 compression ratio relative to the original files containing base64 encoding of the data.

A suite of algorithms for the comprehensive analysis of complex protein mixtures using high-resolution LC-MS.
Bellew M, Coram M, Fitzgibbon M, Igra M, Randolph T, Wang P, May D, Eng J, Fang R, Lin CW, Chen J, Goodlett D, Whiteaker J, Paulovich A, and McIntosh M.
Bioinformatics.
2006 June 9.

[ expand abstract ]

MOTIVATION: Comparing two or more complex protein mixtures using liquid chromatography mass spectrometry (LC-MS) requires multiple analysis steps to locate and quantitate natural peptides within a single experiment and to align and normalize findings across multiple experiments. RESULTS: We describe msInspect, an open-source application comprising algorithms and visualization tools for the analysis of multiple LC-MS experimental measurements. The platform integrates novel algorithms for detecting signatures of natural peptides within a single LC-MS measurement and combines multiple experimental measurements into a peptide array, which may then be mined using analysis tools traditionally applied to genomic array analysis. The platform supports quantitation by both label-free and isotopic labeling approaches. The software implementation has been designed so that many key components may be easily replaced, making it useful as a workbench for integrating other novel algorithms developed by a growing research community. AVAILABILITY: The msInspect software is distributed freely under an Apache 2.0 license. The software as well as a Zip file with all peptide feature files and scripts needed to generate the tables and figures in this article are available at http://proteomics.fhcrc.org/.

Novel gene and gene model detection using a whole genome open reading frame analysis in proteomics.
Fermin D, Allen B, Blackwell T, Menon R, Adamski M, Xu Y, Ulintz P, Omenn GS, and States D.
Genome Biology.
2006 May; 7(4): R35.

[ expand abstract ]

Defining the location of genes and the precise nature of gene products remains a fundamental challenge in genome annotation. Interrogating tandem mass spectrometry data using genomic sequence provides an unbiased method to identify novel translation products. A six-frame translation of the entire human genome was used as the query database to search for novel blood proteins in the data from the Human Proteome Organization Plasma Proteome Project. Because this target database is orders of magnitude larger than the databases traditionally employed in tandem mass spectra analysis, careful attention to significance testing is required. Confidence of identification is assessed using our previously described Poisson statistic, which estimates the significance of multi-peptide identifications incorporating the length of the matching sequence, number of spectra searched and size of the target sequence database. RESULTS: Applying a false discovery rate threshold of 0.05, we identified 282 significant open reading frames, each containing two or more peptide matches. There were 627 novel peptides associated with these open reading frames that mapped to a unique genomic coordinate placed within the start/stop points of previously annotated genes. These peptides matched 1,110 distinct tandem MS spectra. Peptides fell into four categories based upon where their genomic coordinates placed them relative to annotated exons within the parent gene. CONCLUSION: This work provides evidence for novel alternative splice variants in many previously annotated genes. These findings suggest that annotation of the genome is not yet complete and that proteomics has the potential to further add to our understanding of gene structures.

Challenges in deriving high-confidence protein identifications from data gathered by a HUPO plasma proteome collaborative study.
States D, Omenn G, Blackwell T, Fermin D, Eng J, Speicher D, and Hanash S.
Nature Biotechnology.
2006 March; 24(3): 333-338.

[ expand abstract ]

The Human Proteome Organization (HUPO) recently completed the first large-scale collaborative study to characterize the human serum and plasma proteomes. The study was carried out in different locations and used diverse methods and instruments to compare and integrate tandem mass spectrometry (MS/MS) data on aliquots of pooled serum and plasma from healthy subjects. Liquid chromatography (LC)-MS/MS data sets from 18 laboratories were matched to the International Protein Index database, and an initial integration exercise resulted in 9,504 proteins identified with one or more peptides, and 3,020 proteins identified with two or more peptides. This article uses a rigorous statistical approach to take into account the length of coding regions in genes, and multiple hypothesis-testing techniques. On this basis, we now present a reduced set of 889 proteins identified with a confidence level of at least 95%. We also discuss the importance of such an integrated analysis in providing an accurate representation of a proteome as well as the value such data sets contain for the high-confidence identification of protein matches to novel exons, some of which may be localized in alternatively spliced forms of known plasma proteins and some in previously non-annotated gene sequences.

Computational Proteomics Analysis System (CPAS): An extensible open source analytic system for evaluating and publishing proteomic data and high-throughput biological experiments.
Rauch A, Bellew M, Eng J, Fitzgibbon M, Holzman T, Hussey P, Igra M, Maclean B, Lin C, Detter A, Fang R, Faca V, Gafken P, Zhang H, Whitaker J, States D, Hanash S, Paulovich P, and McIntosh M.
Journal of Proteome Research.
2006 Jan-Feb; 5(1): 112-21.

[ expand abstract ]

The open-source Computational Proteomics Analysis System (CPAS) contains an entire data analysis and management pipeline for Liquid Chromatography Tandem Mass Spectrometry (LC-MS/MS) proteomics, including experiment annotation, protein database searching and sequence management, and mining LC-MS/MS peptide and protein identifications. CPAS architecture and features, such as a general experiment annotation component, installation software, and data security management, make it useful for collaborative projects across geographical locations and for proteomics laboratories without substantial computational support.

Normalization regarding non-random missing values in high-throughput mass spectrometry data.
Wang P, Tang H, Zhang H, Whiteaker J, Paulovich A, and McIntosh M.
Proceedings of the Pacific Symposium on Biocomputing.

Conference held Jan 3-7 2006; 11: 315-326.

[ expand abstract ]

We propose a two-step normalization procedure for a high-throughput mass spectrometry (MS) data, which is a necessary step in biomarker clustering or classification. First, a global normalization step is used to remove sources of systematic variation between MS profiles due to, for instance, varying amounts of sample degradation over time. A probability model is then used to investigate the intensity-dependent missing events and provides possible substitutions for the missing values. We illustrate the performance of the method wit ha LC-MS data set of synthetic protein mixtures.