Genome Informatics Section 

DOE Human Genome Program Contractor-Grantee Workshop VII 
January 12-16, 1999  Oakland, CA


119. Ribosomal RNA Alignment Using Stochastic Context Free Grammars 

Michael P.S. Brown 
University of California at Santa Cruz, Santa Cruz, California 
mpbrown@cse.ucsc.edu 

I present a method for aligning ribosomal RNA using a well principled probabilistic method that models pairwise interactions in a computationally efficient manner, Stochastic Context-Free Grammars (SCFG's). I show this method has superior performance characteristics in relation to several other alignment methods. This method has applications in areas such as phylogenetic tree reconstruction. A webserver is located at http://www.cse.ucsc.edu/research/compbio/ ssurrna.html. 

SCFG's have been used previously for modeling structures such as tRNA (Sakakibara94, Eddy+Durbin94) and have been demonstrated to have the highest specificity of any method (Lowe97). This performance comes from SCFG's pairwise modeling ability as well as it's probabilistic foundations that allow specific estimations of parameters such as gap and mutation costs. Unfortunately SCFG's require a relatively high computational cost, O(n^3), where n is the length of the sequence. Previous work to reduce this cost has been done by preprocessing databases with a fast approximate method and presenting only likely strings to the SCFG for further processing (Lowe97). I extend this idea in a new direction using Hidden Markov Models (HMM's). 

HMM's are used not only to preprocess the database but to also constrain the SCFG computation in a principled way using posterior decodings. These constraints allow the analysis of large molecules such as rRNA to be done using the full power of complex SCFG models in a reasonable amount of time. I analyze several methods for RNA structure prediction and show that SCFG's have the highest specificity and generalization capabilities using the Ribosomal Database Project alignment of small subunit rRNA as a gauge (Maidak97). 

Alignment of ribosomal RNA is important for several reasons. Historically, rRNA was used by Carl Woese to relate all organisms and reconstruct the tree of life (Woese77). Recently, Norman Pace pointed to an opportunity for an environmental genome survey in which rRNA is gathered from the environment to provide a sequence based snapshot of the microbial biodiversity (Pace97). 

In order to relate organisms based on their biosequence identity, a multiple sequence alignment is necessary. Indeed, alignment is a very important process in correct phylogenetic tree reconstruction (Morrison97). Current methods of computing this alignment involve a combination of computer alignment with human fine tuning (O'Brien98). This leads to a computational bottleneck as evidenced by the large number of unaligned rRNA sequences in the Ribosomal Database Project. Full analysis of widescale environmental biodiversity projects will exacerbate this problem. 

Stochastic Context-Free Grammars are an automatic method of determining RNA alignment using a well principled probabilistic model that accounts for pairwise interactions in a computationally efficient manner. SCFG's have superior performance properties in relation to other methods and have several important application areas including phylogenetic tree reconstruction. 

----- 

(Sakakibara94) Y.Sakakibara et. al. Nucleic Acids Research. (22)5112-5120. (1994). 
(Eddy+Durbin94) S.R.Eddy and R.Durbin. Nucleic Acids Research. (22)2049-2088. (1994). 
(Lowe97) T.Lowe and S.Eddy. Nucleic Acids Research. (25)955-964. (1997).  
(Maidak97) B.L.Maidak et al. Nucleic Acids Research. (25)109-111. (1997).  
(Woese77) C.R. Woese and G.E. Fox. Proc. Natl. Acad. Sci. USA. (74)5088. (1977).  
(Pace97) N.R. Pace. Science. (276)734-740. (1997). 
(Morrison97) D. Morrison and J. Ellis. Mol. Biol. Evol (14)428-441. (1997).  
(OBrien98) E. O'Brien et. al. Bioinformatics. (14)332-341. (1998). 


 
Home Sequencing Functional Genomics
Author Index Sequencing Technologies Microbial Genome Program
Search Mapping Ethical, Legal, & Social Issues
Order a copy Informatics Infrastructure