Astrobiology: Life in the Universe

Astrobiology Science and Technology for Exploring Planets (ASTEP)


  1. Exploration of Sequence Space and the Evolution of the Genetic Code

    PI: Gogarten, Peter

    Proteins consist of linear sequences of amino acids. These sequences can be thought of as inhabiting a space consisting of the set of all possible protein sequences given the 20 amino acids used in the genetic code. Similar sequences are closer to each other in this protein space; evolution can be thought of as an exploratory walk connecting neighboring sequences. Current protein space is so large that no imaginable process can even approach exploring it thoroughly. Did an earlier, simpler genetic code allow a more thorough exploration of protein space? What was the nature of this primitive genetic code, and how did it expand to the modern schema? We propose two approaches to investigate the mechanism and importance of the expansion of the genetic code during early evolution.

    Our first approach is a novel methodology called Compositional Stratigraphy. We will analyze existing proteins to illuminate the order in which amino acids were added to the genetic code. Amino acid positions that came and remained under purifying selection early in the evolution of life will be conserved in all or most homologous versions of the protein: clearly, if tryptophan is a late addition to the genetic code, it could not have been under purifying selection in a given sequence position before it was present. Our analysis will focus on genes that evolved early in evolution and show a high amount of sequence conservation. One group of proteins selected for study is ribosomal proteins. Arguably, these were among the first proteins to evolve. The cenancesteral ribosome (i.e., the common ancestor of all extant ribosomes) existed after the code was already fully developed. However, the cenancestral ribosome spent a larger part of its purifying selection under a still-evolving, simpler genetic code, in contrast to the ribosome ancestral to specific domains of life. The fraction of early amino acids among positions conserved from the cenancestral ribosome is therefore expected to be higher than in positions that came under purifying selection more recently. Using ancient paralogous sets of genes, we can reach back even further to a time before the most recent common ancestor. The identity of conserved residues in these ancient proteins provides an untapped window into the history of the expansion of the genetic code.

    Second, we will attempt to explore the distribution of proteins in sequence space. Existing protein folds are too divergent for standard alignments to reveal their common ancestry, thereby making it difficult to study relationships between them in deep evolutionary history. The use of random oligopeptide sequence motifs as dimensions of protein space addresses this problem by expanding the concept of sequence space to sequences that cannot be reliably aligned. This approach allows us to investigate the global organization of protein sequence space in such a way that the impact of the expansion of the genetic code can be potentially detected and analyzed.

  1. Tell us what you think!


    It's your Astrobiology Program: please help us out by sending comments on what's here, and ideas for new features.

Page Feedback

Email (optional)
Comment