![]() |
|
![]() |
![]() |
|
Archive Edition | |
Sponsored
by the U.S. Department of
Energy Human Genome Program
|
Santa Fe, New Mexico, November 13-17, 1994
Introduction to the Workshop
The electronic form of this document may be cited in the following style: Abstracts scanned from text submitted for November 1994 DOE Human Genome Program Contractor-Grantee Workshop. Inaccuracies have not been corrected. |
Combinatorial Methods for Gene RecognitionMikhail Gelfand [1,2], Pavel Pevzner [2], Michael Roytberg [3][1]Institute of Protein Research, Russian Academy of Sciences [2]Department of Computer Science and Engineering, The Pennsylvania State University [3]Institute of Mathematical Biology, Russian Academy of Sciences One of the main problems in prediction of complete genes in large-scale DNA sequencing projects is the combinatorial explosion of the number of potential exon-intron structures. The standard dynamic programming cannot be applied here, because there are no a priori reasons to assume the linearity of the structure-scoring functional [Gelfand, 1994]. We have suggested an algorithm that constructs the Pareto-optimal set of exon structures which is guaranteed to contain the best structure for any scoring functional and contains no unnecessary structures [Gelfand and Roytberg, 1993]. The pilot version of the GREAT (Gene Recognition and Exon Assembly Tool) software, which uses the simplest site recognition and codon usage objective functions, has been tested on an independent set of genes and benchmarked against GRAIL-2. The results were comparable: GREAT demonstrated higher sensitivity but smaller specificity than GRAIL-2 [Gelfand et al., 1994]. From this perspective it might be useful to combine the strengths of both approaches in a hybrid system. Another issue we address is related to on-line database search for gene recognition. Existence of introns makes it necessary to use gapped matches, leading to new combinatorial problems [Pevzner and Waterman, 1994]. On the other hand, combination of exon assembly and similarity search approaches leads to the problem of finding the best local similarity between a set of candidate exon-intron structures and a sequence from the database. An efficient algorithm for this problem, related to approximate matching of regular expressions, has been constructed [Rubinov et al., 1994]. Further development will be directed towards combining of on-line database search and exon assembly, development of new coding potentials accounting to genome inhomogeneity, improvement of the structure scoring schemes, backtrack assignment of prediction certainty to individual exons, and creation of programs for analysis of large sequence fragments containing multiple genes. The work is supported by DOE under the grant DE-FG02-94ER61919. Gelfand M.S., Roytberg M.A. (1993) A dynamic programming algorithm for prediction of the exon-intron structure. BioSystems 30, 173-182.
|
Send the url of this page to a friend
Last modified: Wednesday, October 29, 2003
Home * Contacts * Disclaimer
Base URL: www.ornl.gov/hgmis
Site sponsored by the U.S. Department of Energy
Office of Science, Office
of Biological and Environmental Research, Human
Genome Program