DOE Genomes
Human Genome Project Information  Genomics:GTL  DOE Microbial Genomics  home
-
HGP Home
Archive Edition

logo

DOE Human Genome Program Contractor-Grantee Workshop IV

Santa Fe, New Mexico, November 13-17, 1994

Introduction to the Workshop
URLs Provided by Attendees

Abstracts
Mapping
Informatics
Sequencing
Instrumentation
Ethical, Legal, and Social Issues
Infrastructure

The electronic form of this document may be cited in the following style:
Human Genome Program, U.S. Department of Energy, DOE Human Genome Program Contractor-Grantee Workshop IV, 1994.

Abstracts scanned from text submitted for November 1994 DOE Human Genome Program Contractor-Grantee Workshop. Inaccuracies have not been corrected.

The Estimation, Encoding, and Robust Utilization of Uncertainty Information in DNA Sequences

John R. Hartman [1], Dianne M. Marsh [1], and Jeffrey S. Chamberlain [2]
[1]Computational Biosciences, Inc.; P.O. Box 2090; Ann Arbor, MI 48106-2090
[2]Department of Human Genetics, The University of Michigan Medical School; Ann Arbor, MI 48109-0618

Among the stated goals of the Human Genome Project are dramatic improvements in DNA sequencing technologies and corresponding reductions in cost per finished base. As these goals are realized, genome sequencing is becoming a more automated, production-oriented activity, and similar economies will be demanded in the process of assuring and documenting the quality of the data produced. In an environment where a thorough, expert manual validation of new sequence data may often be prohibitively expensive, it would be a great benefit to consumers of sequence data if the quality of base calls were provided in databases with the calls themselves. For this to be practical, uncertainty information must be both generated and utilized in an automatic and unobtrusive manner.

In the current research, (a) algorithms and heuristics for the estimation of base identity probability vectors (IPVs) from sequencing gel lane traces are being implemented and evaluated, (b) alternative schemes for the compact storage of this information is being explored, and (c) contig assembly software will be prototyped that utilizes such information at the input and generates statistically consistent representations of finished contigs at the output. To these ends, longitudinal data from a sequencing project at the University of Michigan Human Genome Center - consisting of raw ABI 373A trace files through corrected contig assemblies - will be used to construct a quantitative model from which IPVs can automatically be estimated. Its success will promote improvements in the robustness and reliability of sequence data while reducing its cost through longer usable fragment reads and greater validation efficiency.

The results of this work will be used to extend X/Gene, a comprehensive sequence analysis package supporting distributed processing on Unix networks. Thus enhanced, it will include facilities for automatically estimating, storing, disseminating, and robustly utilizing uncertainty information in a broad range of sequence analysis applications.

This work is supported by the U.S. Department of Energy under SBIR (Phase I) Grant DE-FG02-94ER81729.

1. Churchill, G.A. and Waterman, M.S. (1992). The Accuracy of DNA Sequences: Estimating Sequence Quality. Genomics 14, 89-98.
2. Golden, J.B., Torgersen, D., and Tibbetts, C. (1994). Pattern Recognition for Automated DNA Sequencing: I. On-line Signal Conditioning and Feature Extraction for Basecalling.
3. Koop, B.F., Rowan, L., Chen, W.-Q., Deshpande, P., Lee, H., and Hood, L. (1993). Sequence Length and Error Analysis of Sequenase (tm) and Automated Taq Cycle Sequencing Methods. BioTechniques 14, 442-7.
4. Lipshutz, R.J., Taverner, F., Hennessy, K., Hartzell, G., and Davis, R. (1994). DNA Sequence Confidence Estimation. Genomics 19, 417-24.
5. Tibbetts, C., Bowling, J.M., and Golden, J.B. (1993). Neural Networks for Automated Basecalling of Gel-based DNA Sequencing Ladders, in Venter, J.C., and Adams, M., eds., Automated DNA Sequencing and Analysis Techniques, Academic Press, 219-229.

Send the url of this page to a friend


Last modified: Wednesday, October 29, 2003

Home * Contacts * Disclaimer

Base URL: www.ornl.gov/hgmis

Office of Science Site sponsored by the U.S. Department of Energy Office of Science, Office of Biological and Environmental Research, Human Genome Program