Human
Genome News Archive Edition |
|
Sponsored
by the U.S. Department of
Energy Human Genome Program
|
Sequence-database challenges and features of the new GSDB schema were among topics addressed by Michael Cinkosky (NCGR), who observed that the GSDB staff's responsibility is to help the community keep the database complete, accurate, and up to date. (See New WWW Tools.) The new system will support direct client-server inserts and updates; entry versioning, which retains all versions of public entries; and third-party annotation, in which the core entry belongs to the original author. Entries are recast so that almost all data can point to links in other databases. Efforts are directed toward achieving "anonymous interoperability," with GSDB as one component in a biology-wide database federation.
GSDB staff is designing a sequence editor as an interactive tool for sequencing laboratories to view and edit large regions having complex annotation; design goals include online intuitive, graphical editing. The editor will be freely distributable and integratable. Anyone interested in participating in the design process can obtain the prototypes that serve as a basis for discussion (http://www.ncgr.org).
Chris Fields (NCGR) outlined the emerging informatics challenges guiding NCGR's strategic planning and offered some ideas on a productive new direction for genome informatics. A key technical challenge is the diversity of data applications, which will require the connection of genome data with information generated from various disciplines and maintained in different databases. Fields observed that the community will be forced to reduce data-maintenance costs by moving from centralized data banks and databases to interoperable data resources joined by the Internet.
An even greater challenge, Fields continued, will be the need for precise description of the same biological data at successive levels of complexity. He said computer science has developed conceptual tools, including the Virtual Machine, for describing the precise context in which data occurs. The bioinformatics community should consider using these tools to interrelate such data.
Fragment assembly of shotgun-sequencing data involves deciding which overlaps or melds to use in reconstructing the original strand. Eugene Myers (University of Arizona) described an algorithm that greatly simplifies the problem by first identifying all overlaps and melds that must occur in an optimal solution. For highly repetitive target DNA sequences, he further proposed use of a maximum-likelihood estimator based on the two-sided Kolmogorov-Smirnov statistic.
Uberbacher presented the new GRAIL 1A, which is designed to process large files of cDNAs, ESTs, or genomic fragments to predict coding regions, search databases, and produce a summary report. GRAIL-ET, a new technology that will be useful for analyzing very low pass sequence, detects errors in coding sequences using a coding-recognition and dynamic-programming method. With a 1% indel error rate, the system found 94% of the exons (89% of the gene message after the model was made). GRAIL and genQuest can be accessed by e-mail server (GRAIL@ornl.gov and Q@ornl.gov, respectively), graphical client tools obtained by ftp at arthur.epm.ornl.gov or via Mosaic (http://compbio.ornl.gov/Grail-1.3/). GRAIL has been licensed to ApoCom, Inc., for use by researchers in proprietary pharmaceutical and biotechnology companies who cannot use Internet because of data-security concerns. Uberbacher's group is also supporting mouse-human mapping research at ORNL and has constructed ACEDB implementation containing mapping and phenotype data.
Participants look forward to the next DOE Contractor-Grantee Workshop, scheduled for January 28-February 1, 1996.
Return DOE Contractor-Grantee Introductory Page
The electronic form of the newsletter may be cited in the following style:
Human Genome Program, U.S. Department of Energy, Human Genome News (v6n5).
Send the url of this page to a friend
Last modified: Monday, December 15, 2003
Home * Contacts * Disclaimer
Base URL: www.ornl.gov/hgmis
Site sponsored by the U.S. Department of Energy Office of Science, Office of Biological and Environmental Research, Human Genome Program