![]() |
|
![]() |
![]() |
|
Archive Edition | |
Sponsored
by the U.S. Department of
Energy Human Genome Program
|
Santa Fe, New Mexico, November 13-17, 1994
Introduction to the Workshop
The electronic form of this document may be cited in the following style: Abstracts scanned from text submitted for November 1994 DOE Human Genome Program Contractor-Grantee Workshop. Inaccuracies have not been corrected. |
The Estimation, Encoding, and Robust Utilization of Uncertainty Information in DNA SequencesJohn R. Hartman [1], Dianne M. Marsh [1], and Jeffrey S. Chamberlain [2][1]Computational Biosciences, Inc.; P.O. Box 2090; Ann Arbor, MI 48106-2090 [2]Department of Human Genetics, The University of Michigan Medical School; Ann Arbor, MI 48109-0618 Among the stated goals of the Human Genome Project are dramatic improvements in DNA sequencing technologies and corresponding reductions in cost per finished base. As these goals are realized, genome sequencing is becoming a more automated, production-oriented activity, and similar economies will be demanded in the process of assuring and documenting the quality of the data produced. In an environment where a thorough, expert manual validation of new sequence data may often be prohibitively expensive, it would be a great benefit to consumers of sequence data if the quality of base calls were provided in databases with the calls themselves. For this to be practical, uncertainty information must be both generated and utilized in an automatic and unobtrusive manner. In the current research, (a) algorithms and heuristics for the estimation of base identity probability vectors (IPVs) from sequencing gel lane traces are being implemented and evaluated, (b) alternative schemes for the compact storage of this information is being explored, and (c) contig assembly software will be prototyped that utilizes such information at the input and generates statistically consistent representations of finished contigs at the output. To these ends, longitudinal data from a sequencing project at the University of Michigan Human Genome Center - consisting of raw ABI 373A trace files through corrected contig assemblies - will be used to construct a quantitative model from which IPVs can automatically be estimated. Its success will promote improvements in the robustness and reliability of sequence data while reducing its cost through longer usable fragment reads and greater validation efficiency. The results of this work will be used to extend X/Gene, a comprehensive sequence analysis package supporting distributed processing on Unix networks. Thus enhanced, it will include facilities for automatically estimating, storing, disseminating, and robustly utilizing uncertainty information in a broad range of sequence analysis applications. This work is supported by the U.S. Department of Energy under SBIR (Phase I) Grant DE-FG02-94ER81729. 1. Churchill, G.A. and Waterman, M.S. (1992). The Accuracy of DNA Sequences: Estimating Sequence Quality. Genomics 14, 89-98.
|
Send the url of this page to a friend
Last modified: Wednesday, October 29, 2003
Home * Contacts * Disclaimer
Base URL: www.ornl.gov/hgmis
Site sponsored by the U.S. Department of Energy
Office of Science, Office
of Biological and Environmental Research, Human
Genome Program