pmc logo imageJournal ListSearchpmc logo image
Logo of plosbiolPLoS BiologyView this ArticleSubmit to PLoSGet E-mail AlertsContact UsPublic Library of Science (PLoS)
PLoS Biol. 2009 April; 7(4): e1000096.
Published online 2009 April 28. doi: 10.1371/journal.pbio.1000096.
PMCID: PMC2672614
Global Functional Atlas of Escherichia coli Encompassing Previously Uncharacterized Proteins
Pingzhao Hu,#1 Sarath Chandra Janga,#1,2 Mohan Babu,#1 J. Javier Díaz-Mejía,#1,3 Gareth Butland,#1¤ Wenhong Yang,1 Oxana Pogoutse,1 Xinghua Guo,1 Sadhna Phanse,1 Peter Wong,1 Shamanta Chandran,1 Constantine Christopoulos,1 Anaies Nazarians-Armavil,1 Negin Karimi Nasseri,1 Gabriel Musso,1 Mehrab Ali,1 Nazila Nazemof,4 Veronika Eroukova,4 Ashkan Golshani,4 Alberto Paccanaro,5 Jack F Greenblatt,1 Gabriel Moreno-Hagelsieb,3* and Andrew Emili1*
1 Banting and Best Department of Medical Research, Terrence Donnelly Center for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada
2  Medical Research Council Laboratory of Molecular Biology, Cambridge, United Kingdom
3 Department of Biology, Wilfrid Laurier University, Waterloo, Ontario, Canada
4  Department of Biology and Ottawa Institute of Systems Biology, Carleton University, Ottawa, Canada
5 Department of Computer Science, Royal Holloway, University of London, Egham, United Kingdom
Andre Levchenko, Academic Editor
Johns Hopkins University, United States of America
#Contributed equally.
* To whom correspondence should be addressed. E-mail: gmoreno/at/wlu.ca (GM-H); Email: andrew.emili/at/utoronto.ca (AE)
Received October 21, 2008; Accepted March 16, 2009.
Abstract
One-third of the 4,225 protein-coding genes of Escherichia coli K-12 remain functionally unannotated (orphans). Many map to distant clades such as Archaea, suggesting involvement in basic prokaryotic traits, whereas others appear restricted to E. coli, including pathogenic strains. To elucidate the orphans' biological roles, we performed an extensive proteomic survey using affinity-tagged E. coli strains and generated comprehensive genomic context inferences to derive a high-confidence compendium for virtually the entire proteome consisting of 5,993 putative physical interactions and 74,776 putative functional associations, most of which are novel. Clustering of the respective probabilistic networks revealed putative orphan membership in discrete multiprotein complexes and functional modules together with annotated gene products, whereas a machine-learning strategy based on network integration implicated the orphans in specific biological processes. We provide additional experimental evidence supporting orphan participation in protein synthesis, amino acid metabolism, biofilm formation, motility, and assembly of the bacterial cell envelope. This resource provides a “systems-wide” functional blueprint of a model microbe, with insights into the biological and evolutionary significance of previously uncharacterized proteins.
Author Summary
One goal of modern biology is to chart groups of proteins that act together to perform biological processes via direct and indirect interactions. Such groupings are sometimes called functional modules. The types of protein interactions within modules include physical interactions that generate protein complexes and biochemical associations that make up metabolic pathways. We have combined proteomic and bioinformatic tools, and used them to decipher a large number of protein interactions, complexes, and functional modules with high confidence. In addition, exploring the topology of the resulting interaction networks, we successfully predicted specific biological roles for a number of proteins with previously unknown functions, and identified some potential drug targets. Although our work is focused on E. coli, our phylogenetic projections suggest that a considerable fraction of our observations and predictions can be extrapolated to many other bacterial taxa. As all the data derived from this study are publicly available, others may build on our work for further hypothesis-driven studies of gene function discovery.