DOE Genomes
Human Genome Project Information  Genomics:GTL  DOE Microbial Genomics  home
-
HGP Home
Human Genome News Archive Edition

Vol.9, No.3   July 1998

In this issue... 

1997 Santa Fe Highlights 
Patrinos Address 
Joint Genome Institute (JGI) Comes of Age 
JGI Sequencing 
JGI Informatics 
JGI and Bermuda Quality Sequence 
Grants Awarded for JGI Collaboration 
JGI Sequencing Clones 
Sequencing Strategies,Tools 
Gene-Discovery Resources 
Sequencing at NIH NHGRI 
Functional Genomics 
Data Surge Challenges Informaticists 
Genome Annotation: Informatics Advances Needed for Age of Functional Genomics 
ELSI: Rapid Progress Accelerates Societal Impact of Genome Research 
1999 DOE HGP Meeting Set for California 

Human Genome Project Administration 
New 5-Year Goals, Project Midpoint 
DOE, NIH Discuss Informatics 
JASON Group Review 
BER Genome Instrumentation Research 

In the News 
Private-Sector Sequencing Plan 
Bang for the Buck: Government-Backed Research Underpins Potentially High Payoff Ventures 
Palmisano Joins DOE OBER 
DNA Files series to be on NPR 
HUGO Addresses Sample Collection 
Sickle Cell Mice May Lead to New Treatments 
TIGR Sequencing 6 More Microbes 
Tuberculosis Microbe Sequenced 
C. Elegans Sequencing Nears Finish 
HGMIS Website Restructured 
cDNA Cloning Workshop Identifies Critical Issues 
Survey Identifies Growing Need for Synchrotron Analyses 
NCGR Announcements 

Publications 
Report on Functional Consequences of Gene Expression 
Book on Tuskegee Conference 
Book Focuses on Biomarker Implications, Conference Proceedings 
Genome Analysis Protocol Handbook 

Software and the Internet 
Mouse Genome Informatics Release 2.0 
New System Identifies Polymorphisms 
DOE Supports Web Site for 1997 AAAS Genome Symposium 
Expressed Human Genome Database 

Funding 
DOE ELSI 
NIH NHGRI 
NHGRI Initiates Mailing List 
U.S. Genome Research Funding 

Meeting Calendars & Acronyms 
Genome and Biotechnology Meetings 
Training Courses and Workshops 
Acronyms 


HGN archives and subscriptions   
HGP Information home 

DOE, NIH Discuss Informatics Goals

(See more detailed, personal notes on the meeting by DOE staff member Daniel Drell.)

Since the beginning of the Human Genome Project, informatics has been widely regarded as one of the project's most important elements. The vast quantity and wide variety of generated information dictate the use of computational tools for data collection, management, storage, organization, access, and analyses.

On April 2-3, the DOE and NIH human genome programs convened a workshop in Herndon, Virginia, to identify informatics needs and goals for the next 5 years. Attending were 46 invited informatics and genomics experts and 17 agency staff from DOE, NIH National Human Genome Research Institute, NIH National Institute of General Medical Sciences, and National Science Foundation (NSF).

Both DOE and NHGRI support the philosophy that the needs of data users are foremost and must drive the goals of genome informatics. At the meeting, the wide-ranging viewpoints of large sequencing centers, smaller specialized groups, biotechnology industry users, researchers exploring comparative and functional genomics, and medical geneticists were presented (see medicine and genome data).

Not all uses for these data can be anticipated today, thus implying the need for building structural flexibility into current and planned databases that support the genome project. Additionally, because knowledge will grow over time, curating the data --correcting it and adding new functional and useful links (annotation)-- must be done on a continuous basis.

Meeting attendees identified priorities and made suggestions and policy recommendations on these and other issues.

Priorities and Issues
Major priorities identified by the group included the development of a reference genome map and sequence database and databases of individual variation and functional expression. Sequence data should be continuous and annotated, linked to maps, and structured to allow all conceivable data-supported queries. Data should be updated and curated by editors. The variation database should be organized according to population and individual genotype and haplotype; it should include or link to information on individual phenotypic variation. Functional expression databases should include such pathway and regulatory data as in the databases WIT, KEGG, and EcoCyc.

Standardization. Much current data are highly heterogeneous in format, organization, quality, and content. This is not surprising, given the wide diversity of genome-research investigators who are generating the data. An identified priority is to comprehensively capture raw, summary, or processed data in standard, well-structured formats using controlled vocabularies. Additionally, databases must be integrated and linked.

Intelligent consensus standards should be defined and implemented by academia, government, and industry working together. Today, industry standards are very distinct from the few that exist in the genome project. The Object Management Group, now composed largely of industry representatives, also should involve personnel from academia and government. Explicit object definitions and access methods are needed desperately. Component-oriented software standards would promote systems integration, interoperability, flexibility, and responsiveness to change (adaptability). A balance is needed, however, between maintaining standards and allowing change and flexibility.

Tools. Tools to speed up the data-finishing bottleneck in sequencing are critical; still other tools are needed for production, research, access, annotation, data capture, functional genomics, and data mining. A Web site that collects and annotates these tools would be very useful.

Availability of Underlying Data, Especially for Individual Genotypes. Given the expense of phenotyping, the ability to see ABI traces and check on the possible association with a particular single-nucleotide polymorphism would be valuable. ABI traces are not necessary for the reference sequence because questionable regions can be resequenced.

Annotation. Automated annotation analyses should use clearly defined standard operating procedures, consistent application, and sufficient documentation for a more detailed understanding of particular chromosome regions. Automated annotation is a way to generate intelligent hypotheses about sequence functions and must be regarded critically as overall annotation improves with time. For this reason, human participation in the annotation process is still vitally important for getting the most out of genomic information.

Quality Checks. Attendees suggested regular checks of database quality. Users are frustrated by incorrect data and the unwillingness or inability of database providers to correct these mistakes. Official editors who curate information could resolve errors and improve data quality. Successful quality assessment at sequence centers serves as a model.

Training and Environment Issues. NSF science and technology centers are models for needed genome informatics centers. Three to five such centers were proposed to facilitate interactions among various disciplines and the training of students.

Policy Recommendations

  • Open competition should be used for most database and informatics needs.
  • No single database can be expected to do everything for everybody; users, however, should feel that they are interacting with only one entity. Data submission should be uniform.
  • Existing frameworks such as database schema and submission tools should be used where possible.
  • Model-organism databases should continue to be supported.
  • Raw data should be captured to the maximum extent possible before the information is irretrievable.
  • Investments should be made in optimizing and exporting software tools from genome centers.

[Daniel Drell (301/903-4742, daniel.drell@oer.doe.gov) and Lisa Brooks (301/496-7531, lisa_brooks@nih.gov)]


Return to Top of Page

The electronic form of the newsletter may be cited in the following style:
Human Genome Program, U.S. Department of Energy, Human Genome News (v9n3).

Send the url of this page to a friend


Last modified: Wednesday, October 29, 2003

Home * Contacts * Disclaimer

Base URL: www.ornl.gov/hgmis

Office of Science Site sponsored by the U.S. Department of Energy Office of Science, Office of Biological and Environmental Research, Human Genome Program