Bioinformatics Section 

DOE Human Genome Program Contractor-Grantee Workshop VIII
February 27-March 2, 2000  Santa Fe, NM


Home
Author Index
Sequencing
Table of Contents
Abstracts   
Instrumentation
Table of Contents
Abstracts
Mapping 
Table of Contents
Abstracts
Bioinformatics
Table of Contents
Abstracts
Function and cDNA Resources
Table of Contents
Abstracts

Microbial Genome Program
Table of Contents
Abstracts
Ethical, Legal, and Social Issues
Table of Contents
Abstracts
Infrastructure
Table of Contents
Abstracts

Ordering Information

Abstracts from
Past Meetings

70. Identification of Novel Functional RNA Genes in Genomic DNA Sequences

S.R. Holbrook, C. Mayor, and I. Dubchak

Physical Biosciences Division and National Energy Research Scientific Computing Center, Lawrence Berkeley National Laboratory, Berkeley, CA 94720

ildubchak@lbl.gov

Finding the location of functional RNA genes in genomic sequences is much more difficult than the assignment of ORFs as potential protein coding genes. To date, the only method of identifying functional RNA genes is by homology.

Our initial approach to locating novel RNA genes was based on the premise that all stable, functional RNAs share common structural elements and that sequences corresponding to these elements occur preferentially in RNA genes. These elements include tetraloops, uridine turns, tetraloop receptors, adenosine platforms, and a high percentage of double helical base pairing. We have also used the free energy of folding as a structural parameter representing double helicity in RNA sequences. Since the frequency of occurrence of RNA structural elements can not be expected to identify non-RNA sequences in a positive manner, we identified additional sequence preferences based on global sequence descriptors (previously applied to protein fold prediction) to discriminate RNA genes from non-RNA genes. These descriptors include composition, distribution, and transition parameters.

A total of 610 examples of E.coli sequence windows (305 from RNA genes, 305 from non-assigned regions) were used to calculate the descriptors and train neural networks. In order to optimize prediction, we used a voting procedure in which predictions were accepted only when predicted by both types of networks. The accuracy of RNA gene prediction using different combinations of global and structural parameters was estimated by the cross-validation test. Similarly we trained neural network to recognize RNA genes in other species.

Using trained neural networks we have predicted putative RNA genes in complete genomes of E. coli, M. genitalium, M. pneumoniae, and P. horikoshii. The weights from the trained neural networks are now used in a public web server to allow users to make predictions using their sequences. We will be enlarging the number of organisms present in the database of our server, including other bacteria, lower eukaryotes such as yeast and ultimately human.

 


The online presentation of this publication is a special feature of the Human Genome Project Information Web site.