DOE Genomes
Human Genome Project Information  Genomics:GTL  DOE Microbial Genomics  home
-
HGP Home
Archive Edition

logo

DOE Human Genome Program Contractor-Grantee Workshop IV

Santa Fe, New Mexico, November 13-17, 1994

Introduction to the Workshop
URLs Provided by Attendees

Abstracts
Mapping
Informatics
Sequencing
Instrumentation
Ethical, Legal, and Social Issues
Infrastructure

The electronic form of this document may be cited in the following style:
Human Genome Program, U.S. Department of Energy, DOE Human Genome Program Contractor-Grantee Workshop IV, 1994.

Abstracts scanned from text submitted for November 1994 DOE Human Genome Program Contractor-Grantee Workshop. Inaccuracies have not been corrected.

Combinatorial Methods for Gene Recognition

Mikhail Gelfand [1,2], Pavel Pevzner [2], Michael Roytberg [3]
[1]Institute of Protein Research, Russian Academy of Sciences
[2]Department of Computer Science and Engineering, The Pennsylvania State University
[3]Institute of Mathematical Biology, Russian Academy of Sciences

One of the main problems in prediction of complete genes in large-scale DNA sequencing projects is the combinatorial explosion of the number of potential exon-intron structures. The standard dynamic programming cannot be applied here, because there are no a priori reasons to assume the linearity of the structure-scoring functional [Gelfand, 1994]. We have suggested an algorithm that constructs the Pareto-optimal set of exon structures which is guaranteed to contain the best structure for any scoring functional and contains no unnecessary structures [Gelfand and Roytberg, 1993].

The pilot version of the GREAT (Gene Recognition and Exon Assembly Tool) software, which uses the simplest site recognition and codon usage objective functions, has been tested on an independent set of genes and benchmarked against GRAIL-2. The results were comparable: GREAT demonstrated higher sensitivity but smaller specificity than GRAIL-2 [Gelfand et al., 1994]. From this perspective it might be useful to combine the strengths of both approaches in a hybrid system.

Another issue we address is related to on-line database search for gene recognition. Existence of introns makes it necessary to use gapped matches, leading to new combinatorial problems [Pevzner and Waterman, 1994]. On the other hand, combination of exon assembly and similarity search approaches leads to the problem of finding the best local similarity between a set of candidate exon-intron structures and a sequence from the database. An efficient algorithm for this problem, related to approximate matching of regular expressions, has been constructed [Rubinov et al., 1994].

Further development will be directed towards combining of on-line database search and exon assembly, development of new coding potentials accounting to genome inhomogeneity, improvement of the structure scoring schemes, backtrack assignment of prediction certainty to individual exons, and creation of programs for analysis of large sequence fragments containing multiple genes.

The work is supported by DOE under the grant DE-FG02-94ER61919.

Gelfand M.S., Roytberg M.A. (1993) A dynamic programming algorithm for prediction of the exon-intron structure. BioSystems 30, 173-182.
Gelfand M.S., Podolsky L.I., Astakhova T.V., Roytberg M.A. (1994) Recognition of genes in human DNA sequences. Technical Report CSE-94-059, The Pennsylvania State University.
Gelfand M.S. (1994) Prediction of function in DNA sequence analysis. Technical Report CSE-94-060, The Pennsylvania State University.
Pevzner P.A., Waterman M.S. (1994) Multiple filtration and approximate pattern matching. Algorithmica (to appear).
Rubinov A.R., Gelfand M.S., Pevzner P.A. (1994) Exon assembly and fast database search (in preparation).

Send the url of this page to a friend


Last modified: Wednesday, October 29, 2003

Home * Contacts * Disclaimer

Base URL: www.ornl.gov/hgmis

Office of Science Site sponsored by the U.S. Department of Energy Office of Science, Office of Biological and Environmental Research, Human Genome Program