Bioinformatics Section 

DOE Human Genome Program Contractor-Grantee Workshop VIII
February 27-March 2, 2000  Santa Fe, NM


Home
Author Index
Sequencing
Table of Contents
Abstracts   
Instrumentation
Table of Contents
Abstracts
Mapping 
Table of Contents
Abstracts
Bioinformatics
Table of Contents
Abstracts
Function and cDNA Resources
Table of Contents
Abstracts

Microbial Genome Program
Table of Contents
Abstracts
Ethical, Legal, and Social Issues
Table of Contents
Abstracts
Infrastructure
Table of Contents
Abstracts

Ordering Information

Abstracts from
Past Meetings

83. A Visual Data-Flow Editor Capable of Integrating Data Analysis and Database Querying

Dong-Guk Shin1, Ravi Nori2, Rich Landers2, and Wally Grajewski2

1Computer Science and Engineering, University of Connecticut, Storrs, CT 06269-3155 and 2CyberConnect EZ, LLC, Storrs, CT 06268

shin@engr.uconn.edu

Determining mapping sequence variations or polymorphism between homologous genomic regions requires access to genomic data available from different sources and use of many data analysis and visualization programs. It is imperative that software be developed to enable genome scientists to automate tedious and repetitive data handling, database querying and analysis tasks. Our approach is to develop a data-flow editing environment in which genome scientists with minimal computer training can easily describe data analysis tasks. The scientists' use of the software tool involves organizing and coordinating individual tasks of data retrieval from different data sources, combined with data analysis tasks to derive answers to biologically significant questions.

Phase I aimed at developing prototype software which demonstrates the feasibility of a full-scale development of a data-flow editing environment in which interactions between data access and data analysis can be freely described by genome scientists with minimal computer training. The feasibility study is based on a working scenario of determining homology relationships between some known DNA sequences from one species and unknown sequences from a taxonomically-related species.

Software of this kind is expected to be immediately usable by molecular biology and the pharmaceutical industry both of which are becoming more computationally intensive. Since data-flow management problems are not unique to computational biology, the software developed is expected to be useful in many other data and computationally intensive areas, e.g., physics, chemistry, engineering and finance.

The proposed software will enable scientists to automate the repetitive analysis tasks involving an enormous amount of DNA sequence data that must be analyzed to understand its implications to biological and environmental processes. Without the software tool, the difficulties involved in conducting these large scale data analysis projects could be insurmountable due to the magnitude of data available and the variety of analysis techniques involved.

This work was supported in part by the DOE SBIR Phase I Grant No. DE-FG02-99ER82773.

 

 


The online presentation of this publication is a special feature of the Human Genome Project Information Web site.