JGI Home

Why GEBA?

A Genomic Encyclopedia for Bacteria and Archaea | Why GEBA? | Pilot Project

Genome sequencing has revolutionized our understanding of microorganisms and the role they play in important processes, including pathogenesis; energy production; bioremediation; global nutrient cycles; and the origins, evolution, and diversity of life. Currently, there are more than 1000 complete or nearly complete genome sequences of microbes available. These have been generated both from small-scale projects focused on specific scientific questions and from large-scale projects attempting to sequence genomes in a more coordinated manner (e.g., the Fungal Genome Initiative, the NIAID Pathogen Sequencing Program, the NHGRI’s Human Gut Microbiome Program, the Moore Foundation’s Marine Microbial Genome Sequencing Program, and others). Together these small- and large-scale projects have produced genome sequences from organisms with a wide diversity of phenotypes, including pathogens, extremophiles, endosymbionts, gut commensals, nitrogen fixers, carbon fixers, and others. Analysis of these data in turn has provided many fundamental insights into biological processes carried out or influenced by microbes.

However, there is a glaring gap in microbial genome sequence availability which has been highly illuminated by the recent advancements in environmental genomics. The currently available genome sequences show a highly biased phylogenetic distribution compared to the extent of microbial diversity known today. This bias has resulted in a major gap in our knowledge of microbial genome complexity and our understanding of the evolution, physiology, and metabolic capacity of microbes. This is surprising given that there are systematic efforts to sequence genomes from diverse groups in animals (e.g., the NHGRI programs) and plants and fungi. Although there have been small efforts in this arena for microbes (e.g., eight genomes from novel branches are being sequenced as part of an NSF Tree of Life project), there are no systematic efforts. We therefore believe there is a strong need for a large-scale systematic effort to sequence genomes to fill in genomic gaps in the tree of life.

This phylogenomic approach will be of great value in multiple areas of public and general scientific interest. The potential benefits include (a) improved identification of protein families and orthology groups across species, which will improve annotation of other microbial genomes, (b) improved phylogenetic anchoring of metagenomic data, (c) gene discovery (which tends to be maximized by selecting phylogenetically novel organisms), (d) a better understanding of the processes underlying the evolutionary diversification of microbes (e.g., lateral gene transfer and gene duplication), (e) a better understanding of the classification and evolutionary history of microbial species, and (f) improved correlations of phenotype and genotype in microbes.