CPTC Scientific Publications

Scientific Bibliography

Rodriguez H.
International Summit on Proteomics Data Release and Sharing Policy.
J. Proteome Res.; 2008 Nov; 7(11) pp 4609 - 4609; (Editorial) [Epub 2008 Oct 7]

Show/Hide

No abstract available.

Qiu J, Choi G, Li L, Wang H, Pitteri SJ, Pereira-Faca SR, Krasnoselsky AL, Randolph TW, Omenn GS, Edelstein C, Barnett MJ, Thornquist MD, Goodman GE, Brenner DE, Feng Z, Hanash SM.
Occurrence of Autoantibodies to Annexin I, 14-3-3 Theta and LAMR1 in Prediagnostic Lung Cancer Sera.
J Clin Oncol. 2008 Sep 15 [Epub ahead of print]

Show/Hide

PURPOSE: We have implemented a high throughput platform for quantitative analysis of serum autoantibodies, which we have applied to lung cancer for discovery of novel antigens and for validation in prediagnostic sera of autoantibodies to antigens previously defined based on analysis of sera collected at the time of diagnosis. Materials and METHODS: Proteins from human lung adenocarcinoma cell line A549 lysates were subjected to extensive fractionation. The resulting 1,824 fractions were spotted in duplicate on nitrocellulose-coated slides. The microarrays produced were used in a blinded validation study to determine whether annexin I, PGP9.5, and 14-3-3 theta antigens previously found to be targets of autoantibodies in newly diagnosed patients with lung cancer are associated with autoantibodies in sera collected at the presymptomatic stage and to determine whether additional antigens may be identified in prediagnostic sera. Individual sera collected from 85 patients within 1 year before a diagnosis of lung cancer and 85 matched controls from the Carotene and Retinol Efficacy Trial (CARET) cohort were hybridized to individual microarrays. RESULTS: We present evidence for the occurrence in lung cancer sera of autoantibodies to annexin I, 14-3-3 theta, and a novel lung cancer antigen, LAMR1, which precede onset of symptoms and diagnosis. CONCLUSION: Our findings suggest potential utility of an approach to diagnosis of lung cancer before onset of symptoms that includes screening for autoantibodies to defined antigens.

Burgess EF, Ham AJ, Tabb DL, Billheimer DL, Roth BJ, Chang SS, Cookson MS, Hinton TJ, Cheek KL, Hill S, Pietenpol JA.
Prostate cancer serum biomarker discovery through proteomic analysis of alpha-2 macroglobulin protein complexes.
Proteomics-Clin. Applic. 2008 Sep; 2(9): 1223-1233

Show/Hide

Alpha-2 macroglobulin (A2M) functions as a universal protease inhibitor in serum and is capable of binding various cytokines and growth factors. In this study, we investigated if immunoaffinity enrichment and proteomic analysis of A2M protein complexes from human serum could improve detection of biologically relevant and novel candidate protein biomarkers in prostate cancer. Serum samples from six patients with androgen-independent, metastatic prostate cancer and six control patients without malignancy were analyzed by immunoaffinity enrichment of A2M protein complexes and MS identification of associated proteins. Known A2M substrates were reproducibly identified from patient serum in both cohorts, as well as proteins previously undetected in human serum. One example is heat shock protein 90 alpha (HSP90 ), which was identified only in the serum of cancer patients in this study. Using an ELISA, the presence of HSP90 in human serum was validated on expanded test cohorts and found to exist in higher median serum concentrations in prostate cancer (n = 18) relative to control (n = 13) patients (median concentrations 50.7 versus 27.6 ng/mL, respectively, p = 0.001). Our results demonstrate the technical feasibility of this approach and support the analysis of A2M protein complexes for proteomic-based serum biomarker discovery.

Tabb DL, Ma ZQ, Martin DB, Ham AJL, Chambers MC.
DirecTag: accurate sequence tags from peptide MS/MS through statistical scoring.
J Proteome Res. 2008 Sep;7(9):3838-46 [Epub 2008 Jul 17]

Show/Hide

In shotgun proteomics, tandem mass spectra of peptides are typically identified through database search algorithms such as Sequest. We have developed DirecTag, an open-source algorithm to infer partial sequence tags directly from observed fragment ions. This algorithm is unique in its implementation of three separate scoring systems to evaluate each tag on the basis of peak intensity, m/ z fidelity, and complementarity. In data sets from several types of mass spectrometers, DirecTag reproducibly exceeded the accuracy and speed of InsPecT and GutenTag, two previously published algorithms for this purpose. The source code and binaries for DirecTag are available from http://fenchurch.mc.vanderbilt.edu.

Belov ME, Clowers BH, Prior DC, Danielson WF 3rd, Liyu AV, Petritis BO, Smith RD.
Dynamically multiplexed ion mobility time-of-flight mass spectrometry.
Anal Chem. 2008 Aug 1;80(15):5873-83 [Epub 2008 Jun 18]

Show/Hide

Ion mobility spectrometry-time-of-flight mass spectrometry (IMS-TOFMS) has been increasingly used in analysis of complex biological samples. A major challenge is to transform IMS-TOFMS to a high-sensitivity, high-throughput platform, for example, for proteomics applications. In this work, we have developed and integrated three advanced technologies, including efficient ion accumulation in an ion funnel trap prior to IMS separation, multiplexing (MP) of ion packet introduction into the IMS drift tube, and signal detection with an analog-to-digital converter, into the IMS-TOFMS system for the high-throughput analysis of highly complex proteolytic digests of, for example, blood plasma. To better address variable sample complexity, we have developed and rigorously evaluated a novel dynamic MP approach that ensures correlation of the analyzer performance with an ion source function and provides the improved dynamic range and sensitivity throughout the experiment. The MP IMS-TOFMS instrument has been shown to reliably detect peptides at a concentration of 1 nM in the presence of a highly complex matrix, as well as to provide a 3 orders of magnitude dynamic range and a mass measurement accuracy of better than 5 ppm. When matched against human blood plasma database, the detected IMS-TOF features were found to yield approximately 700 unique peptide identifications at a false discovery rate (FDR) of approximately 7.5%. Accounting for IMS information gave rise to a projected FDR of approximately 4%. Signal reproducibility was found to be greater than 80%, while the variations in the number of unique peptide identifications were <15%. A single sample analysis was completed in 15 min that constitutes almost 1 order of magnitude improvement compared to a more conventional LC-MS approach.

Zheng C, Li C, Higginbotham JN, Franklin JL, Tabb DL, Graves-Deal R, Hill S, Cheek K, Lapierre LA, Goldenring JR, Ham AJL, Coffey RJ.
Use of Fluorescence-Activated Vesicle Sorting for Isolation of Naked2-Associated, Basolaterally-Targeted Exocytic Vesicles for Proteomic Analysis.
Mol. Cell. Proteomics. 2008 May 25; 7:1651-1667.

Show/Hide

By interacting with the cytoplasmic tail of a Golgi-processed form of transforming growth factor-α(TGFα), Naked2 coats TGFα-containing exocytic vesicles and directs them to the basolateral corner of polarized epithelial cells where the vesicles dock and fuse in a Naked2 myristoylation-dependent manner. These TGFα-containing Naked2-associated vesicles are not directed to the subapical Sec6/8 exocyst complex as has been reported for other basolateral cargo, and thus they appear to represent a distinct set of basolaterally targeted vesicles. To identify constituents of these vesicles, we exploited our finding that myristoylation-deficient Naked2 G2A vesicles are unable to fuse at the plasma membrane. Isolation of a population of myristoylation-deficient, green fluorescent protein-tagged G2A Naked2-associated vesicles was achieved by biochemical enrichment followed by flow cytometric fluorescence-activated vesicle sorting. The protein content of these plasma membrane de-enriched, flow-sorted fluorescent G2A Naked2 vesicles was determined by LC/LC-MS/MS analysis. Three independent isolations were performed, and 389 proteins were found in all three sets of G2A Naked2 vesicles. Rab10 and myosin IIA were identified as core machinery, and Na⁺/K⁺-ATPase α1 was identified as an additional cargo within these vesicles. As an initial validation step, we confirmed their presence and that of three additional proteins tested (annexin A1, annexin A2, and IQGAP1) in wild-type Naked2 vesicles. To our knowledge, this is the first large scale protein characterization of a population of basolaterally targeted exocytic vesicles and supports the use of fluorescence-activated vesicle sorting as a useful tool for isolation of cellular organelles for comprehensive proteomics analysis.

Xu P, Peng J.
Characterization of polyubiquitin chain structure by middle-down mass spectrometry.
Anal Chem. 2008 May 1;80(9):3438-44. [Epub 2008 Mar 20]

Show/Hide

Ubiquitin (Ub) is a 76 amino acid polypeptide that modifies a wide range of proteins in the types of monomer or polymers, and functional consequence of ubiquitination is modulated by the length and topologies of polyUb chains. Whereas polyUb chains are usually analyzed by fully trypsin digestion and mass spectrometry (MS), we present here a middle-down strategy to characterize the structure of polyUb chains by high-resolution mass spectrometry (MS). Under optimized condition, native folded polyUb is partially trypsinized exclusively at the R74 residue, generating a large Ub fragment (1-74 residues termed UbR74) and its ubiquitinated form with a diglycine tag (UbR74-GG). The molar ratio between UbR74 and UbR74-GG reflects the length of homogeneous polyUb chains (i.e., 1:1 for the dimer, 1:2 for the trimer, 1:3 for the tetramer, and so on). Moreover, lysine residues in ubiquitin used for chain linkages are detectable by MS/MS and MS/MS/MS of large GG-tagged Ub fragments. The strategy was validated using a number of ubiquitin polymers, including K48-linked human di-Ub, K63-linked human tetra-Ub, as well as His-tagged polyUb chains purified from yeast under native condition. The potential of this strategy to analyze polyUb chains with mixed linkages (e.g., forked chains) is also discussed. Together, this middle-down MS strategy provides a novel complementary method for studying the length and linkages of complex polyUb chain structures

Clowers BH, Belov ME, Prior DC, Danielson WF 3rd, Ibrahim Y, Smith RD.
Pseudorandom sequence modifications for ion mobility orthogonal time-of-flight mass spectrometry.
Anal Chem. 2008 Apr 1;80(7):2464-73. [Epub 2008 Mar 1]

Show/Hide

Due to the inherently low duty cycle of ion mobility spectrometry (IMS) experiments that sample from continuous ion sources, a range of experimental advances have been developed to maximize ion utilization efficiency. The use of ion trapping and accumulation approaches prior to the ion mobility drift tube has demonstrated significant gains over discrete sampling from continuous sources but have traditionally relied upon a signal averaging (SA) to attain analytically useful signal-to-noise ratios (SNR). Multiplexed (MP) techniques based upon the Hadamard transform offer an alternative experimental approach by which ion utilization efficiency can be elevated from ~1 to ~ 50%. Recently, our research group demonstrated a unique multiplexed ion mobility time-of-flight (MP-IMS-TOF) approach that incorporates ion trapping and can extend ion utilization efficiency beyond 50%. However, the spectral reconstruction of the multiplexed signal using this experiment approach requires the use of sample-specific weighting designs. Such general weighting designs have been shown to significantly enhance ion utilization efficiency using this MP technique, but cannot be universally applied. By modifying both the ion trapping and the pseudorandom sequence (PRS) used for the MP experiment, we have eliminated the need for complex weighting matrices. For both simple and complex mixtures, SNR enhancements of up to 13 were routinely observed as compared to the SA-IMS-TOF approach. In addition, this new class of PRS provides a 2-fold enhancement in the number of ion gate pulses per unit time compared to the traditional HT-IMS experiment

Tracy MB, Chen H, Weaver DM, Malyarenko DI, Sasinowski M, Cazares LH, Drake RR, Semmes OJ, Tracy ER, Cooke WE.
Precision enhancement of MALDI-TOF MS using high resolution peak detection and label-free alignment.
Proteomics. 2008 Apr;8(8):1530-8

Show/Hide

We have developed an automated procedure for aligning peaks in multiple TOF spectra that eliminates common timing errors and small variations in spectrometer output. Our method incorporates high-resolution peak detection, re-binning, and robust linear data fitting in the time domain. This procedure aligns label-free (uncalibrated) peaks to minimize the variation in each peak's location from one spectrum to the next, while maintaining a high number of degrees of freedom. We apply our method to replicate pooled-serum spectra from multiple laboratories and increase peak precision (^tls_t) to values limited only by small random errors (with s_t less than one time count in 89 out of 91 instances, 13 peaks in seven datasets). The resulting high precision allowed for an order of magnitude improvement in peak m/z reproducibility. We show that the CV for m/z is 0.01% (100 ppm) for 12 out of the 13 peaks that were observed in all datasets between 2995 and 9297 Da

Peng J.
Evaluation of proteomic strategies for analyzing ubiquitinated proteins.
BMB Rep. 2008 Mar 31;41(3):177-83.

Show/Hide

Ubiquitin is an essential, highly-conserved small regulatory protein in eukaryotic cells. It covalently modifies a wide variety of targeted proteins in the forms of monomer and polymers, altering the conformation and binding properties of the proteins and thus regulating proteasomal delivery, protein activities and localization. Mass spectrometry has emerged as an indispensable tool for in-depth characterization of protein ubiquitination. Ubiquitinated proteins in cell lysates are usually enriched by affinity chromatography and subsequently analyzed by mass spectrometry for identification and quantification. Ubiquitin-conjugated amino acid residues can be determined by unique mass shift caused by the modification. Moreover, the complex structure of polyubiquitin chains on substrates can be dissected by bottom-up and middle-down mass spectrometric approaches, revealing potential novel functions of polyubiquitin linkages. Here I review the advances and caveats of these strategies, emphasizing caution in the validation of ubiquitinated proteins and in the interpretation of raw data

Page JS, Tang K, Kelly RT, Smith RD.
Subambient pressure ionization with nanoelectrospray source and interface for improved sensitivity in mass spectrometry.
Anal Chem. 2008 Mar 1;80(5):1800-5. [Epub 2008 Feb 1]

Show/Hide

A nanoelectrospray ionization mass spectrometry (ESI-MS) source and interface has been designed that enables efficient ion production and transmission in a 30 Torr pressure environment using solvents compatible with typical reversed-phase liquid chromatography (RPLC) separations. In this design, the electrospray emitter is located inside the mass spectrometer in the same region as an electrodynamic ion funnel. This avoids the use of a conductance limiting ion inlet, as required by a conventional atmospheric pressure ESI source, and allows more efficient ion transmission to the mass analyzer. The new subambient pressure ionization with nanoelectrospray (SPIN) source improves instrument sensitivity and enables new electrospray interface designs, including the use of multi-emitter approaches. Performance of the SPIN source was evaluated by electrospraying standard solutions at 300 nL/min and comparing results with those obtained from a standard atmospheric pressure ESI source that used a heated capillary inlet. This initial study demonstrated an ~5-fold improvement in sensitivity when the SPIN source was used compared to a standard atmospheric pressure ESI source. The importance of desolvation was also investigated by electrospraying at different flow rates, which showed that the ion funnel provided an effective desolvation region to aid the creation of gas-phase analyte ions

Villanueva J, Nazarian A, Lawlor K, Yi SS, Robbins RJ, Tempst P.
A sequence-specific exopeptidase activity test (SSEAT) for "functional" biomarker discovery.
Mol Cell Proteomics. 2008 Mar;7(3):509-18. [Epub 2007 Nov 6]

Show/Hide

One form of functional proteomics entails profiling of genuine activities, as opposed to surrogates of activity or active "states," in a complex biological matrix: for example, tracking enzyme-catalyzed changes, in real time, ranging from simple modifications to complex anabolic or catabolic reactions. Here we present a test to compare defined exoprotease activities within individual proteomes of two or more groups of biological samples. It tracks degradation of artificial substrates, under strictly controlled conditions, using semiautomated MALDI-TOF mass spectrometric analysis of the resulting patterns. Each fragment is quantitated by comparison with double labeled, non-degradable internal standards (all-D-amino acid peptides) spiked into the samples at the same time as the substrates to reflect adsorptive and processing-related losses. The full array of metabolites is then quantitated (coefficients of variation of 6.3–14.3% over five replicates) and subjected to multivariate statistical analysis. Using this approach, we tested serum samples of 48 metastatic thyroid cancer patients and 48 healthy controls, with selected peptide substrates taken from earlier standard peptidomics screens (i.e. the "discovery" phase), and obtained class predictions with 94% sensitivity and 90% specificity without prior feature selection (24 features). The test all but eliminates reproducibility problems related to sample collection, storage, and handling as well as to possible variability in endogenous peptide precursor levels because of hemostatic alterations in cancer patients

Wan J, Kang S, Tang C, Yan J, Ren Y, Liu J, Gao X, Banerjee A, Ellis LB, Li T.
Meta-prediction of phosphorylation sites with weighted voting and restricted grid search parameter selection.
Nucleic Acids Res. 2008 Mar;36(4):e22. [Epub 2008 Jan 30]

Show/Hide

Meta-predictors make predictions by organizing and processing the predictions produced by several other predictors in a defined problem domain. A proficient meta-predictor not only offers better predicting performance than the individual predictors from which it is constructed, but it also relieves experimentally researchers from making difficult judgments when faced with conflicting results made by multiple prediction programs. As increasing numbers of predicting programs are being developed in a large number of fields of life sciences, there is an urgent need for effective meta-prediction strategies to be investigated. We compiled four unbiased phosphorylation site datasets, each for one of the four major serine/threonine (S/T) protein kinase families—CDK, CK2, PKA and PKC. Using these datasets, we examined several meta-predicting strategies with 15 phosphorylation site predictors from six predicting programs: GPS, KinasePhos, NetPhosK, PPSP, PredPhospho and Scansite. Meta-predictors constructed with a generalized weighted voting meta-predicting strategy with parameters determined by restricted grid search possess the best performance, exceeding that of all individual predictors in predicting phosphorylation sites of all four kinase families. Our results demonstrate a useful decision-making tool for analysing the predictions of the various S/T phosphorylation site predictors. An implementation of these meta-predictors is available on the web at: http://MetaPred.umn.edu/MetaPredPS/

Fenselau C, Havey C, Teerakulkittipong N, Swatkoski S, Laine O, Edwards N.
Identification of beta-lactamase in antibiotic-resistant Bacillus cereus spores.
Appl Environ Microbiol. 2008 Feb;74(3):904-6. [Epub 2007 Dec 7]

Show/Hide

ß-Lactamase type I is reported for the first time to occur in the sporulated form in a penicillin-resistant Bacillus species. The enzyme was readily characterized from the B. cereus 5/B line (ATCC 13061) by mass spectrometry and two-dimensional gel electrophoresis

Robinson S, Niles RK, Witkowska HE, Rittenbach KJ, Nichols RJ, Sargent JA, Dixon SE, Prakobphol A, Hall SC, Fisher SJ, Hardt M.
A mass spectrometry-based strategy for detecting and characterizing endogenous proteinase activities in complex biological samples.
Proteomics. 2008 Feb;8(3):435-45

Show/Hide

Endogenous proteinases in biological fluids such as human saliva produce a rich peptide repertoire that reflects a unique combination of enzymes, substrates, and inhibitors/activators. Accordingly, this subproteome is an interesting source of biomarkers for disease processes that either directly or indirectly involve proteolysis. However, the relevant proteinases, typically very low abundance molecules, are difficult to classify and identify. We hypothesized that a sensitive technique for monitoring accumulated peptide products in an unbiased, global manner would be very useful for detecting and profiling proteolytic activities in complex biological samples. Building on the longstanding use of 18O isotope-based approaches for the classification of proteolytic and other enzymatic processes we devised a new method for evaluating endogenous proteinases. Specifically, we showed that upon ex vivo incubation endogenous proteinases in human parotid saliva introduced 18O from isotopically enriched water into the C-terminal carboxylic groups of their peptide products. Subsequent peptide sequence determination and inhibitor profiling enabled the detection of discrete subsets of proteolytic products that were generated by different enzymes. As a proof-of-principle we used one of these fingerprints to identify the relevant activity as tissue kallikrein. We termed this technique PALeO. Our results suggest that PALeO is a rapid and highly sensitive method for globally assessing proteinase activities in complex biological samples

Swatkoski S, Gutierrez P, Wynne C, Petrov A, Dinman JD, Edwards N, Fenselau C.
Evaluation of microwave-accelerated residue-specific acid cleavage for proteomic applications.
J Proteome Res. 2008 Feb;7(2):579-86. [Epub 2008 Jan 12]

Show/Hide

Microwave-accelerated proteolysis using acetic acid has been shown to occur specifically on either or both sides of aspartic acid residues. This chemical cleavage has been applied to ovalbumin and several model peptides to test the effect on some of the more common post-translational modifications. No oxidation of methionine or cysteine was observed; however, hydrolysis of phosphate groups proceeds at a detectable rate. Acid cleavage was also extended to the yeast ribosome model proteome, where it provided information on 74% of that proteome. Aspartic acid occurs across the proteome with approximately half the frequency of the combined occurrence of the trypsin residues lysine and arginine, and implications of this are considered

Shen C, Wang Z, Shankar G, Zhang X, Li L.
A hierarchical statistical model to assess the confidence of peptides and proteins inferred from tandem mass spectrometry.
Bioinformatics. 2008 Jan 15;24(2):202-8. [Epub 2007 Nov 17]

Show/Hide

MOVATION: Statistical evaluation of the confidence of peptide and protein identifications made by tandem mass spectrometry is a critical component for appropriately interpreting the experimental data and conducting downstream analysis. Although many approaches have been developed to assign confidence measure from different perspectives, a unified statistical framework that integrates the uncertainty of peptides and proteins is still missing. RESULTS: We developed a hierarchical statistical model (HSM) that jointly models the uncertainty of the identified peptides and proteins and can be applied to any scoring system. With data sets of a standard mixture and the yeast proteome, we demonstrate that the HSM offers a reliable or at least conservative false discovery rate (FDR) estimate for peptide and protein identifications. The probability measure of HSM also offers a powerful discriminating score for peptide identification.

Alves P, Arnold RJ, Clemmer DE, Li Y, Reilly JP, Sheng Q, Tang H, Xun Z, Zeng R, Radivojac P.
Fast and accurate identification of semi-tryptic peptides in shotgun proteomics.
Bioinformatics. 2008 Jan 1;24(1):102-9. [Epub 2007 Nov 22]

Show/Hide

MOTIVATION: One of the major problems in shotgun proteomics is the low peptide coverage when analyzing complex protein samples. Identifying more peptides, e.g. non-tryptic peptides, may increase the peptide coverage and improve protein identification and/or quantification that are based on the peptide identification results. Searching for all potential non-tryptic peptides is, however, time consuming for shotgun proteomics data from complex samples, and poses a challenge for a routine data analysis. RESULTS: We hypothesize that non-tryptic peptides are mainly created from the truncation of regular tryptic peptides before separation. We introduce the notion of truncatability of a tryptic peptide, i.e. the probability of the peptide to be identified in its truncated form, and build a predictor to estimate a peptide's truncatability from its sequence. We show that our predictions achieve useful accuracy, with the area under the ROC curve from 76% to 87%, and can be used to filter the sequence database for identifying truncated peptides. After filtering, only a limited number of tryptic peptides with the highest truncatability are retained for non-tryptic peptide searching. By applying this method to identification of semi-tryptic peptides, we show that a significant number of such peptides can be identified within a searching time comparable to that of tryptic peptide identification.

Kelly RT, Page JS, Zhao R, Qian WJ, Mottaz HM, Tang K, Smith RD.
Capillary-based multi nanoelectrospray emitters: improvements in ion transmission efficiency and implementation with capillary reversed-phase LC-ESI-MS.
Anal Chem. 2008 Jan 1;80(1):143-9. [Epub 2007 Nov 29]

Show/Hide

We describe the coupling of liquid chromatography (LC) separations with mass spectrometry (MS) using nanoelectrospray ionization (nano-ESI) multiemitters. The array of 19 emitters reduced the flow rate delivered to each emitter, allowing the enhanced sensitivity that is characteristic of nano-ESI to be extended to higher flow rate separations. The signal for tryptic fragments from proteins spiked into a human plasma sample increased 11-fold on average when the multiemitters were employed, due to increased ionization efficiency and improved ion transfer efficiency through a newly designed heated multicapillary MS inlet. Additionally, the LC peak signal-to-noise ratio increased ~7-fold when the multiemitter configuration was used. The low dead volume of the emitter arrays preserved peak shape and resolution for robust capillary LC separations using total flow rates of 2 microL/min

Choi H, Ghosh D, Nesvizhskii AI.
Statistical validation of peptide identifications in large-scale proteomics using the target-decoy database search strategy and flexible mixture modeling.
J Proteome Res. 2008 Jan;7(1):286-92. [Epub 2007 Dec 14]

Show/Hide

Reliable statistical validation of peptide and protein identifications is a top priority in large-scale mass spectrometry based proteomics. PeptideProphet is one of the computational tools commonly used for assessing the statistical confidence in peptide assignments to tandem mass spectra obtained using database search programs such as SEQUEST, MASCOT, or X! TANDEM. We present two flexible methods, the variable component mixture model and the semiparametric mixture model, that remove the restrictive parametric assumptions in the mixture modeling approach of PeptideProphet. Using a control protein mixture data set generated on an linear ion trap Fourier transform (LTQ-FT) mass spectrometer, we demonstrate that both methods improve parametric models in terms of the accuracy of probability estimates and the power to detect correct identifications controlling the false discovery rate to the same degree. The statistical approaches presented here require that the data set contain a sufficient number of decoy (known to be incorrect) peptide identifications, which can be obtained using the target-decoy database search strategy

Choi H, Nesvizhskii AI.
Semisupervised model-based validation of peptide identifications in mass spectrometry-based proteomics.
J Proteome Res. 2008 Jan;7(1):254-65. [Epub 2007 Dec 27]

Show/Hide

Development of robust statistical methods for validation of peptide assignments to tandem mass (MS/MS) spectra obtained using database searching remains an important problem. PeptideProphet is one of the commonly used computational tools available for that purpose. An alternative simple approach for validation of peptide assignments is based on addition of decoy (reversed, randomized, or shuffled) sequences to the searched protein sequence database. The probabilistic modeling approach of PeptideProphet and the decoy strategy can be combined within a single semisupervised framework, leading to improved robustness and higher accuracy of computed probabilities even in the case of most challenging data sets. We present a semisupervised expectation-maximization (EM) algorithm for constructing a Bayes classifier for peptide identification using the probability mixture model, extending PeptideProphet to incorporate decoy peptide matches. Using several data sets of varying complexity, from control protein mixtures to a human plasma sample, and using three commonly used database search programs, SEQUEST, MASCOT, and TANDEM/k-score, we illustrate that more accurate mixture estimation leads to an improved control of the false discovery rate in the classification of peptide assignments

Choi H, Nesvizhskii AI.
False discovery rates and related statistical concepts in mass spectrometry-based proteomics.
J Proteome Res. 2008 Jan;7(1):47-50. [Epub 2007 Dec 8]

Show/Hide

Development of statistical methods for assessing the significance of peptide assignments to tandem mass spectra obtained using database searching remains an important problem. In the past several years, several different approaches have emerged, including the concept of expectation values, target-decoy strategy, and the probability mixture modeling approach of PeptideProphet. In this work, we provide a background on statistical significance analysis in the field of mass spectrometry-based proteomics, and present our perspective on the current and future developments in this area

Gong W, Zhou D, Ren Y, Wang Y, Zuo Z, Shen Y, Xiao F, Zhu Q, Hong A, Zhou X, Gao X, Li T.
PepCyber:P~PEP: a database of human protein protein interactions mediated by phosphoprotein-binding domains.
Nucleic Acids Res. 2008 Jan;36(Database issue):D679-83. [Epub 2007 Dec 26]

Show/Hide

Phosphoprotein-binding domains (PPBDs) mediate many important cellular and molecular processes. Ten PPBDs have been known to exist in the human proteome, namely, 14-3-3, BRCT, C2, FHA, MH2, PBD, PTB, SH2, WD-40 and WW. PepCyber:PPEP is a newly constructed database specialized in documenting human PPBD-containing proteins and PPBD-mediated interactions. Our motivation is to provide the research community with a rich information source emphasizing the reported, experimentally validated data for specific PPBD–PPEP interactions. This information is not only useful for designing, comparing and validating the relevant experiments, but it also serves as a knowledge-base for computationally constructing systems signaling pathways and networks. PepCyber:PPEP is accessible through the URL, http://www.pepcyber.org/PPEP/. The current release of the database contains 7044 PPBD-mediated interactions involving 337 PPBD-containing proteins and 1123 substrate proteins

McLerran DF, Feng Z, Semmes OJ, Cazares L, Randolph TW.
Signal detection in high-resolution mass spectrometry data.
J Proteome Res. 2008 Jan;7(1):276-85.

Show/Hide

Mass spectrometry data from high-resolution time-of-flight instruments often contain a vast number of noninformative background-ion peaks whose signal is similar to that of peptide peaks. Consequently, seeking peptide signal in these spectra based on a signal-to-noise ratio will remove signal peaks as well as noise. This work characterizes the background as a precursor to seeking peptide-related features. Robust-regression methods are used to estimate distributions for null (background) peak intensities and locations. Defining signal peaks as outliers with respect to these distributions leads to more precision in detecting the isotopic envelope of peaks from low-abundance peptides in high-resolution spectra

Mueller LN, Brusniak M, Mani DR, AebersoldR.
An Assessment of Software Solutions for the Analysis of Mass Spectrometry Based Quantitative Proteomics Data.
J Proteome Res. 2008 Jan;7(1):51-61 [Epub 2008 Jan 4]

Show/Hide

Over the past decade, a series of experimental strategies for mass spectrometry based quantitative proteomics and corresponding computational methodology for the processing of the resulting data have been generated. We provide here an overview of the main quantification principles and available software solutions for the analysis of data generated by liquid chromatography coupled to mass spectrometry (LC-MS). Three conceptually different methods to perform quantitative LC-MS experiments have been introduced. In the first, quantification is achieved by spectral counting, in the second via differential stable isotopic labeling, and in the third by using the ion current in label-free LC-MS measurements. We discuss here advantages and challenges of each quantification approach and assess available software solutions with respect to their instrument compatibility and processing functionality. This review therefore serves as a starting point for researchers to choose an appropriate software solution for quantitative proteomic experiments based on their experimental and analytical requirements

Searle BC, Turner M, Nesvizhskii AI.
Improving sensitivity by probabilistically combining results from multiple MS/MS search methodologies.
J Proteome Res. 2008 Jan;7(1):245-53.

Show/Hide

Database-searching programs generally identify only a fraction of the spectra acquired in a standard LC/MS/MS study of digested proteins. Subtle variations in database-searching algorithms for assigning peptides to MS/MS spectra have been known to provide different identification results. To leverage this variation, a probabilistic framework is developed for combining the results of multiple search engines. The scores for each search engine are first independently converted into peptide probabilities. These probabilities can then be readily combined across search engines using Bayesian rules and the expectation maximization learning algorithm. A significant gain in the number of peptides identified with high confidence with each additional search engine is demonstrated using several data sets of increasing complexity, from a control protein mixture to a human plasma sample, searched using SEQUEST, Mascot, and X! Tandem database-searching programs. The increased rate of peptide assignments also translates into a substantially larger number of protein identifications in LC/MS/MS studies compared to a typical analysis using a single database-search tool

Tabb DL.
What's driving false discovery rates?
J Proteome Res. 2008 Jan;7(1):45-6. [Epub 2007 Dec 15]

Show/Hide

The “Paris Guidelines” have begun the process of standardizing reporting for proteomics. New bioinformatics tools have improved the process for estimating error rates of peptide identifications. This perspective seeks to consider these advances in the context of proteomics’ short history. As increasing numbers of proteomics papers come from biologists rather than technologists, developing consensus standards for estimating error will be increasingly necessary. Standardizing this assessment should be welcomed as a reflection of the growing impact of proteomic technologies

Ulintz PJ, Bodenmiller B, Andrews PC, Aebersold R, Nesvizhskii AI.
Investigating MS²/MS³ matching statistics: a model for coupling consecutive stage mass spectrometry data for increased peptide identification confidence.
Mol Cell Proteomics. 2008 Jan;7(1):71-87. [Epub 2007 Sep 13]

Show/Hide

Improvements in ion trap instrumentation have made n-dimensional mass spectrometry more practical. The overall goal of the study was to describe a model for making use of MS² and MS³ information in mass spectrometry experiments. We present a statistical model for adjusting peptide identification probabilities based on the combined information obtained by coupling peptide assignments of consecutive MS² and MS³ spectra. Using two data sets, a mixture of known proteins and a complex phosphopeptide-enriched sample, we demonstrate an increase in discriminating power of the adjusted probabilities compared with models using MS² or MS³ data only. This work also addresses the overall value of generating MS³ data as compared with an MS²-only approach with a focus on the analysis of phosphopeptide data

Feng J, Wong KY, Lynch GC, Gao X, Pettitt BM.
Peptide conformations for a microarray surface-tethered epitope of the tumor suppressor p53.
J Phys Chem B. 2007 Dec 13;111(49):13797-806. [Epub 2007 Nov 16]

Show/Hide

Peptides or proteins near surfaces exhibit different structural properties from those present in a homogeneous solution, and these differences give rise to varied biological activity. Therefore, understanding the detailed molecular structure of these molecules tethered to a surface is important for interpreting the performance of the various microarrays based on the activities of the immobilized peptides or proteins. We performed molecular dynamics simulations of a pentapeptide, RHSVV, an epitope of the tumor suppressor protein p53, tethered via a spacer on a functionalized silica surface and free in solution, to study their structural and conformational differences. These calculations allowed analyses of the peptide-surface interactions, the sequence orientations, and the translational motions of the peptide on the surface to be performed. Conformational similarities are found among dominant structures of the tethered and free peptide. In the peptide microarray simulations, the peptide fluctuates between a parallel and tilted orientation driven in part by the hydrophobic interactions between the nonpolar peptide residues and the methyl-terminated silica surface. The perpendicular movement of the peptide relative to the surface is also restricted due to the hydrophobic nature of the microarray surface. With regard to structures available for recognition and binding, we find that similar conformations to those found in solution are available to the peptide tethered to the surface, but with a shifted equilibrium constant. Comparisons with experimental results show important implications of this for peptide microarray design and assays

Kini HK, Walton SP.
In vitro binding of single-stranded RNA by human Dicer.
FEBS Lett. 2007 Dec 11;581(29):5611-6. [Epub 2007 Nov 20]

Show/Hide

While Dicer alone has been shown to form stable complexes with double-stranded RNAs and short interfering RNAs, its interactions with single-stranded RNAs (ssRNAs) have not been characterized. Here, we show that recombinant human Dicer alone can bind 21-nt ssRNAs in vitro, independent of their sequence and structure. We also demonstrate that Dicer binds ssRNAs having a 5'-phosphate with greater affinity versus those with a 5'-hydroxyl. In addition, 3'-biotinylated ssRNAs are bound by Dicer with lower affinity than 3'-hydroxyl ssRNAs. The stability of ssRNA-Dicer complexes was found to depend on divalent cations. Together, our results suggest a role for the PAZ domain of Dicer in binding ssRNAs and may indicate roles for Dicer in cellular function beyond those currently known

Keshishian H, Addona T, Burgess M, Kuhn E, Carr SA.
Quantitative, multiplexed assays for low abundance proteins in plasma by targeted mass spectrometry and stable isotope dilution.
Mol Cell Proteomics. 2007 Dec;6(12):2212-29. [Epub 2007 Oct]

Show/Hide

Biomarker discovery produces lists of candidate markers whose presence and level must be subsequently verified in serum or plasma. Verification represents a paradigm shift from unbiased discovery approaches to targeted, hypothesis-driven methods and relies upon specific, quantitative assays optimized for the selective detection of target proteins. Many protein biomarkers of clinical currency are present at or below the nanogram/milliliter range in plasma and have been inaccessible to date by MS-based methods. Using multiple reaction monitoring coupled with stable isotope dilution mass spectrometry, we describe here the development of quantitative, multiplexed assays for six proteins in plasma that achieve limits of quantitation in the 1–10 ng/ml range with percent coefficients of variation from 3 to 15% without immunoaffinity enrichment of either proteins or peptides. Sample processing methods with sufficient throughput, recovery, and reproducibility to enable robust detection and quantitation of candidate biomarker proteins were developed and optimized by addition of exogenous proteins to immunoaffinity depleted plasma from a healthy donor. Quantitative multiple reaction monitoring assays were designed and optimized for signature peptides derived from the test proteins. Based upon calibration curves using known concentrations of spiked protein in plasma, we determined that each target protein had at least one signature peptide with a limit of quantitation in the 1–10 ng/ml range and linearity typically over 2 orders of magnitude in the measurement range of interest. Limits of detection were frequently in the high picogram/milliliter range. These levels of assay performance represent up to a 1000-fold improvement compared with direct analysis of proteins in plasma by MS and were achieved by simple, robust sample processing involving abundant protein depletion and minimal fractionation by strong cation exchange chromatography at the peptide level prior to LC-multiple reaction monitoring/MS. The methods presented here provide a solid basis for developing quantitative MS-based assays of low level proteins in blood

Wang J, Gutierrez P, Edwards N, Fenselau C.
Integration of ¹⁸O labeling and solution isoelectric focusing in a shotgun analysis of mitochondrial proteins.
J Proteome Res. 2007 Dec;6(12):4601-7. [Epub 2007 Nov 10]

Show/Hide

Forward and reverse ¹⁸O labeling are integrated with solution isoelectric focusing and capillary LC-tandem mass spectrometry to evaluate a new strategy for quantitative proteomics and to study abundance changes in mitochondrial proteins associated with drug resistance in MCF-7 human cancer cells. Galectin-3 binding protein, which is involved in apoptosis, was detected only in the resistant cell line, as a result of reverse labeling. Among 278 proteins identified, 12 were detected with abundances altered at least 2-fold

Gatlin-Bunai CL, Cazares LH, Cooke WE, Semmes OJ, Malyarenko DI.
Optimization of MALDI-TOF MS detection for enhanced sensitivity of affinity-captured proteins spanning a 100 kDa mass range.
J Proteome Res. 2007 Nov;6(11):4517-24. [Epub 2007 Oct 5]

Show/Hide

Analysis of complex biological samples by MALDI-TOF mass spectrometry has been generally limited to the detection of low-mass protein (or protein fragment) peaks. We have extended the mass range of MALDI-TOF high-sensitivity detection by an order of magnitude through the combined optimization of instrument parameters, data processing, and sample preparation procedures for affinity capture. WCX, C3, and IMAC magnetic beads were determined to be complementary and most favorable for broad mass range protein profiling. Key instrument parameters for extending mass range included adjustment of the ADC offset and preamplifier filter values of the TOF detector. Data processing was improved by a combination of constant and quadratic down-sampling, preceded by exponential baseline subtraction, to increase sensitivity of signal peaks. This enhancement in broad mass range detection of protein signals will be of direct benefit in MS expression profiling studies requiring full linear range mass detection

Padliya ND, Garrett WM, Campbell KB, Tabb DL, Cooper B.
Tandem mass spectrometry for the detection of plant pathogenic fungi and the effects of database composition on protein inferences.
Proteomics. 2007 Nov;7(21):3932-42.

Show/Hide

LC-MS/MS has demonstrated potential for detecting plant pathogens. Unlike PCR or ELISA, LC-MS/MS does not require pathogen-specific reagents for the detection of pathogen-specific proteins and peptides. However, the MS/MS approach we and others have explored does require a protein sequence reference database and database-search software to interpret tandem mass spectra. To evaluate the limitations of database composition on pathogen identification, we analyzed proteins from cultured Ustilago maydis, Phytophthora sojae, Fusarium graminearum, and Rhizoctonia solani by LC-MS/MS. When the search database did not contain sequences for a target pathogen, or contained sequences to related pathogens, target pathogen spectra were reliably matched to protein sequences from nontarget organisms, giving an illusion that proteins from nontarget organisms were identified. Our analysis demonstrates that when database-search software is used as part of the identification process, a paradox exists whereby additional sequences needed to detect a wide variety of possible organisms may lead to more cross-species protein matches and misidentification of pathogens

S. Swatkoski, P. Gutierrez, J. Ginter, A. Petrov, J.D. Dinman, N. Edwards, and C. Fenselau.
Integration of residue specific acid cleavage into proteomic workflows.
J Proteome Res. 2007 Nov;6(11):4525-7. [Epub 2007 Sep]

Show/Hide

Microwave-accelerated proteolysis using acetic acid has been shown to occur specifically on either or both sides of aspartate residues. This chemical cleavage is applied to the yeast ribosome proteome to evaluate its suitability for incorporation into high-throughput automated workflows. Peptide product mixtures were analyzed using either an HPLC-ESI-LTQ-Orbitrap or an HPLC-MALDI-TOF2. The peptides were readily identified, using MASCOT with a modified enzyme rule, and provided information about 73% of the proteome. Implications are considered of the extended length and the presence of multiple basic residues in these peptides

Ibrahim Y, Belov ME, Tolmachev AV, Prior DC, Smith RD.
Ion funnel trap interface for orthogonal time-of-flight mass spectrometry.
Anal Chem. 2007 Oct 15;79(20):7845-52. [Epub 2007 Sep 13]

Show/Hide

A combined electrodynamic ion funnel and ion trap coupled to an orthogonal acceleration (oa)-time-of-flight mass spectrometer was developed and characterized. The ion trap was incorporated through the use of added terminal electrodynamic ion funnel electrodes enabling control over the axial dc gradient in the trap section. The ion trap operates efficiently at a pressure of ~1 Torr, and measurements indicate a maximum charge capacity of ~3 × 107 charges. An order of magnitude increase in sensitivity was observed in the analysis of low concentration peptides mixtures with orthogonal acceleration (oa)-time-of-flight mass spectrometry (oa-TOF MS) in the trapping mode as compared to the continuous regime. A signal increase in the trapping mode was accompanied by reduction in the chemical background, due to more efficient desolvation of, for example, solvent related clusters. Controlling the ion trap ejection time was found to result in efficient removal of singly charged species and improving signal-to-noise ratio (S/N) for the multiply charged analytes

Nesvizhskii AI, Vitek O, Aebersold R.
Analysis and validation of proteomic data generated by tandem mass spectrometry.
Nat Methods. 2007 Oct;4(10):787-97.

Show/Hide

The analysis of the large amount of data generated in mass spectrometry–based proteomics experiments represents a significant challenge and is currently a bottleneck in many proteomics projects. In this review we discuss critical issues related to data processing and analysis in proteomics and describe available methods and tools. We place special emphasis on the elaboration of results that are supported by sound statistical arguments

Witze ES, Old WM, Resing KA, Ahn NG.
Mapping protein post-translational modifications with mass spectrometry.
Nat Methods. 2007 Oct;4(10):798-806.

Show/Hide

Post-translational modifications of proteins control many biological processes, and examining their diversity is critical for understanding mechanisms of cell regulation. Mass spectrometry is a fundamental tool for detecting and mapping covalent modifications and quantifying their changes. Modern approaches have made large-scale experiments possible, screening complex mixtures of proteins for alterations in chemical modifications. By profiling protein chemistries, biologists can gain deeper insight into biological control. The aim of this review is introduce biologists to current strategies in mass spectrometry–based proteomics that are used to characterize protein post-translational modifications, noting strengths and shortcomings of various approaches

Wu X, Tseng CW, Edwards N.
HMMatch: peptide identification by spectral matching of tandem mass spectra using hidden Markov models.
J Comput Biol. 2007 Oct;14(8):1025-43.

Show/Hide

Peptide identification by tandem mass spectrometry is the dominant proteomics workflow for protein characterization in complex samples. The peptide fragmentation spectra generated by these workflows exhibit characteristic fragmentation patterns that can be used to identify the peptide. In other fields, where the compounds of interest do not have the convenient linear structure of peptides, fragmentation spectra are identified by comparing new spectra with libraries of identified spectra, an approach called spectral matching. In contrast to sequence-based tandem mass spectrometry search engines used for peptides, spectral matching can make use of the intensities of fragment peaks in library spectra to assess the quality of a match. We evaluate a hidden Markov model approach (HMMatch) to spectral matching, in which many examples of a peptide's fragmentation spectrum are summarized in a generative probabilistic model that captures the consensus and variation of each peak's intensity. We demonstrate that HMMatch has good specificity and superior sensitivity, compared to sequence database search engines such as X!Tandem. HMMatch achieves good results from relatively few training spectra, is fast to train, and can evaluate many spectra per second. A statistical significance model permits HMMatch scores to be compared with each other, and with other peptide identification tools, on a unified scale. HMMatch shows a similar degree of concordance with X!Tandem, Mascot, and NIST's MS Search, as they do with each other, suggesting that each tool can assign peptides to spectra that the others miss. Finally, we show that it is possible to extrapolate HMMatch models beyond a single peptide's training spectra to the spectra of related peptides, expanding the application of spectral matching techniques beyond the set of peptides previously observed

Page JS, Kelly RT, Tang K, Smith RD.
Ionization and transmission efficiency in an electrospray ionization-mass spectrometry interface.
J Am Soc Mass Spectrom. 2007 Sep;18(9):1582-90. [Epub 2007 Jun 2]

Show/Hide

The ionization and transmission efficiencies of an electrospray ionization (ESI) interface were investigated to advance the understanding of how these factors affect mass spectrometry (MS) sensitivity. In addition, the effects of the ES emitter distance to the inlet, solution flow rate, and inlet temperature were characterized. Quantitative measurements of ES current loss throughout the ESI interface were accomplished by electrically isolating the front surface of the interface from the inner wall of the heated inlet capillary, enabling losses on the two surfaces to be distinguished. In addition, the ES current lost to the front surface of the ESI interface was spatially profiled with a linear array of 340-µm-diameter electrodes placed adjacent to the inlet capillary entrance. Current transmitted as gas-phase ions was differentiated from charged droplets and solvent clusters by measuring sensitivity with a single quadrupole mass spectrometer. The study revealed a large sampling efficiency into the inlet capillary (>90% at an emitter distance of 1 mm), a global rather than a local gas dynamic effect on the shape of the ES plume resulting from the gas flow conductance limit of the inlet capillary, a large (>80%) loss of analyte ions after transmission through the inlet arising from incomplete desolvation at a solution flow rate of 1.0 µL/min, and a decrease in analyte ions peak intensity at lower temperatures, despite a large increase in ES current transmission efficiency

Zhang B, Chambers MC, Tabb DL.
Proteomic parsimony through bipartite graph analysis improves accuracy and transparency.
J Proteome Res. 2007 2007 Sep;6(9):3549-57. [Epub 2007 Aug 4]

Show/Hide

Assembling peptides identified from LC-MS/MS spectra into a list of proteins is a critical step in analyzing shotgun proteomics data. As one peptide sequence can be mapped to multiple proteins in a database, naïve protein assembly can substantially overstate the number of proteins found in samples. We model the peptide-protein relationships in a bipartite graph and use efficient graph algorithms to identify protein clusters with shared peptides and to derive the minimal list of proteins. We test the effects of this parsimony analysis approach using MS/MS data sets generated from a defined human protein mixture, a yeast whole cell extract, and a human serum proteome after MARS column depletion. The results demonstrate that the bipartite parsimony technique not only simplifies protein lists but also improves the accuracy of protein identification. We use bipartite graphs for the visualization of the protein assembly results to render the parsimony analysis process transparent to users. Our approach also groups functionally related proteins together and improves the comprehensibility of the results. We have implemented the tool in the IDPicker package. The source code and binaries for this protein assembly pipeline are available under Mozilla Public License at the following URL: http://www.mc.vanderbilt.edu/msrc/bioinformatics/

Liu J, Kang S, Tang C, Ellis LB, Li T.
Meta-prediction of protein subcellular localization with reduced voting.
Nucleic Acids Res. 2007;35(15):e96. [Epub 2007 Aug 1]

Show/Hide

Meta-prediction seeks to harness the combined strengths of multiple predicting programs with the hope of achieving predicting performance surpassing that of all existing predictors in a defined problem domain. We investigated meta-prediction for the four-compartment eukaryotic subcellular localization problem. We compiled an unbiased subcellular localization dataset of 1693 nuclear, cytoplasmic, mitochondrial and extracellular animal proteins from Swiss-Prot 50.2. Using this dataset, we assessed the predicting performance of 12 predictors from eight independent subcellular localization predicting programs: ELSPred, LOCtree, PLOC, Proteome Analyst, PSORT, PSORT II, SubLoc and WoLF PSORT. Gorodkin correlation coefficient (GCC) was one of the performance measures. Proteome Analyst is the best individual subcellular localization predictor tested in this four-compartment prediction problem, with GCC = 0.811. A reduced voting strategy eliminating six of the 12 predictors yields a meta-predictor (RAW-RAG-6) with GCC = 0.856, substantially better than all tested individual subcellular localization predictors (P = 8.2 x 10–6, Fisher's Z-transformation test). The improvement in performance persists when the meta-predictor is tested with data not used in its development. This and similar voting strategies, when properly applied, are expected to produce meta-predictors with outstanding performance in other life sciences problem domains

Hu J, He X, Baggerly KA, Coombes KR, Hennessy BT, Mills GB.
Non-parametric quantification of protein lysate arrays.
Bioinformatics. 2007 Aug 1;23(15):1986-94. [Epub 2007 Jun]

Show/Hide

MOVATION: Proteins play a crucial role in biological activity, so much can be learned from measuring protein expression and post-translational modification quantitatively. The reverse-phase protein lysate arrays allow us to quantify the relative expression levels of a protein in many different cellular samples simultaneously. Existing approaches to quantify protein arrays use parametric response curves fit to dilution series data. The results can be biased when the parametric function does not fit the data.

RESULTS: We propose a non-parametric approach which adapts to any monotone response curve. The non-parametric approach is shown to be promising via both simulation and real data studies; it reduces the bias due to model misspecification and protects against outliers in the data. The non-parametric approach enables more reliable quantification of protein lysate arrays.

AVAILABILITY: Code to implement the proposed method in the statistical package R is available at: http://odin.mdacc.tmc.edu/jhu/lysatearray-analysis/

Peng IX, Shiea J, Ogorzalek Loo RR, Loo JA.
Electrospray-assisted laser desorption/ionization and tandem mass spectrometry of peptides and proteins.
Rapid Commun Mass Spectrom. 2007 August 20;21(16):2541-6 [Epub 2007 Jul 17]

Show/Hide

We have constructed an electrospray-assisted laser desorption/ionization (ELDI) source which utilizes a nitrogen laser pulse to desorb intact molecules from matrix-containing sample solution droplets, followed by electrospray ionization (ESI) post-ionization. The ELDI source is coupled to a quadrupole ion trap mass spectrometer and allows sampling under ambient conditions. Preliminary data showed that ELDI produces ESI-like multiply charged peptides and proteins up to 29 kDa carbonic anhydrase and 66 kDa bovine albumin from single-protein solutions, as well as from complex digest mixtures. The generated multiply charged polypeptides enable efficient tandem mass spectrometric (MS/MS)-based peptide sequencing. ELDI-MS/MS of protein digests and small intact proteins was performed both by collisionally activated dissociation (CAD) and by nozzle-skimmer dissociation (NSD). ELDI-MS/MS may be a useful tool for protein sequencing analysis and top-down proteomics study, and may complement matrix-assisted laser desorption/ionization (MALDI)-based measurements

Kelly RT, Page JS, Tang K, Smith RD.
Array of chemically etched fused-silica emitters for improving the sensitivity and quantitation of electrospray ionization mass spectrometry.
Anal Chem. 2007 Jun 1;79(11):4192-8. [Epub 2007 May 2]

Show/Hide

An array of emitters has been developed for increasing the sensitivity of electrospray ionization mass spectrometry (ESI-MS). The linear array consists of 19 chemically etched fused-silica capillaries arranged with 500 m (center-to-center) spacing. The multiemitter device has a low dead volume to facilitate coupling to capillary liquid chromatography (LC) separations. The high aspect ratio of the emitters enables operation at flow rates as low as 20 nL/min/emitter, effectively extending the benefits of nanoelectrospray to higher flow rate analyses. To accommodate the larger ion current produced by the emitter array, a multicapillary inlet to the mass spectrometer was also constructed. The inlet, which matched the dimensions of the emitter array, preserved ion transmission efficiency. Standard reserpine solutions of varying concentration were electrosprayed at 1 microL/min using the multiemitter/multi-inlet combination, and the results were compared to those from a standard, single-emitter configuration. A 9-fold sensitivity enhancement was observed for the multiemitter relative to the single emitter. A bovine serum albumin tryptic digest was also analyzed, and a sensitivity increase ranging from 2.4- to 12.3-fold for the detected tryptic peptides resulted; the varying response was attributed to reduced ion suppression under the nanoESI conditions afforded by the emitter array. An equimolar mixture of leucine enkephalin and maltopentaose was studied to verify that ion suppression is indeed reduced for the multiplexed ESI (multi-ESI) array relative to a single emitter over a range of flow rates

Fenyo D, Phinney BS, Beavis RC.
Determining the overall merit of protein identification data sets: rho-diagrams and rho-scores.
J Proteome Res. 2007 May;6(5):1997-2004. [Epub 2007 Mar 31]

Show/Hide

This paper described a simple heuristic method for determining the merit of a set of peptide sequence assignments made using tandem mass spectra. The method involved comparing a prediction based on the known stochastic behavior of a sequence assignment algorithm with the assignments generated from a particular data set. A particular formulation of this comparison was defined through the construction of a plot of the data, the rho-diagram, as well as a parameter derived from this plot, the rho-score. This plot and parameter were shown to be able to readily characterize the relative quality of a set of peptide sequence assignments and to allow the straightforward determination of probability threshold values for the interpretation of proteomics data. This plot is independent of the algorithm or scoring scheme used to estimate the statistical significance of a set of experimental results; rather, it can be used as an objective test of the correctness of those estimates. The rho-score can also be used as a parameter to evaluate the relative merit of protein identifications, such as those made across proteome species taxonomic categories

Edwards NJ.
Novel peptide identification from tandem mass spectra using ESTs and sequence database compression.
Mol Syst Biol. 2007;3:102. [Epub 2007 Apr 17]

Show/Hide

Peptide identification by tandem mass spectrometry is the dominant proteomics workflow for protein characterization in complex samples. Traditional search engines, which match peptide sequences with tandem mass spectra to identify the samples' proteins, use protein sequence databases to suggest peptide candidates for consideration. Although the acquisition of tandem mass spectra is not biased toward well-understood protein isoforms, this computational strategy is failing to identify peptides from alternative splicing and coding SNP protein isoforms despite the acquisition of good-quality tandem mass spectra. We propose, instead, that expressed sequence tags (ESTs) be searched. Ordinarily, such a strategy would be computationally infeasible due to the size of EST sequence databases; however, we show that a sophisticated sequence database compression strategy, applied to human ESTs, reduces the sequence database size approximately 35-fold. Once compressed, our EST sequence database is comparable in size to other commonly used protein sequence databases, making routine EST searching feasible. We demonstrate that our EST sequence database enables the discovery of novel peptides in a variety of public data sets

Tabb DL, Fernando CG, Chambers MC.
MyriMatch: highly accurate tandem mass spectral peptide identification by multivariate hypergeometric analysis.
J Proteome Res. 2007 Feb;6(2):654-61. [Epub 2007 January 18]

Show/Hide

Shotgun proteomics experiments are dependent upon database search engines to identify peptides from tandem mass spectra. Many of these algorithms score potential identifications by evaluating the number of fragment ions matched between each peptide sequence and an observed spectrum. These systems, however, generally do not distinguish between matching an intense peak and matching a minor peak. We have developed a statistical model to score peptide matches that is based upon the multivariate hypergeometric distribution. This scorer, part of the "MyriMatch" database search engine, places greater emphasis on matching intense peaks. The probability that the best match for each spectrum has occurred by random chance can be employed to separate correct matches from random ones. We evaluated this software on data sets from three different laboratories employing three different ion trap instruments. Employing a novel system for testing discrimination, we demonstrate that stratifying peaks into multiple intensity classes improves the discrimination of scoring. We compare MyriMatch results to those of Sequest and X!Tandem, revealing that it is capable of higher discrimination than either of these algorithms. When minimal peak filtering is employed, performance plummets for a scoring model that does not stratify matched peaks by intensity. On the other hand, we find that MyriMatch discrimination improves as more peaks are retained in each spectrum. MyriMatch also scales well to tandem mass spectra from high-resolution mass analyzers. These findings may indicate limitations for existing database search scorers that count matched peaks without differentiating them by intensity. This software and source code is available under Mozilla Public License at this URL: http://www.mc.vanderbilt.edu/msrc/bioinformatics/

Zhu Q, Hong A, Sheng N, Zhang X, Matejko A, Jun K-Y, Srivannavit O, Gulari E, Gao X, Zhou X.
Paraflo^™ Biochip for Nucleic Acid and Protein Analysis Methods in Molecular Biology.
Microarrays. Vol. 2, Applications and data analysis (2nd Ed.). Ed. Rampal JB, The Humana Press Inc.

Show/Hide

No abstract available