Genome Informatics Section
DOE Human Genome Program Contractor-Grantee Workshop
VII
|
80. Genome Annotation Data Management and Data Administration: Developing Summary Results for User Navigation, Genome Research, Improved Data Processing, and Quality Metrics Jay R. Snoddy, Miriam Land, Sheryl
Martin, Morey Parang, Inna Volker, Denise Schmoyer, Manesh Shah, Sergey
Petrov, Edward C. Uberbacher, and the Genome Annotation Consortium
Summary reports of the genome annotation data and the underlying data management required to generate them are being constructed. A goal is to create these reports from a robust, queryable, and scalable data management system (see abstract of Petrov et al.). Some of these summary reports will be available as online HTML documents. These summaries can help improve four primary areas.
Several general observations can be made now from the current snapshot of the data, and details from a later snapshot will be presented at the Oakland workshop. There are 7 to 10 times more predicted gene models (both GRAIL-EXP and Genscan gene models) than gene models annotated in GenBank. The majority of the predicted GRAIL-EXP genes do have one or more ESTs that are used in the gene modeling. A third to half of the gene models that predict putative protein sequences have a reasonable BLAST hit to known proteins in Swiss-Prot (BLAST with an Evalue <=1.0e-4). By this BLAST hit criteria, about 3 times more predicted genes appear to have good homolog candidates than there are annotated genes in the GenBank archival record. By the time of the Oakland workshop, we hope to display several online summary reports that can demonstrate the current state of genome annotation for genomes, chromosomes, contigs, and sequenced clones. This should provide users with the results of the different but integrated data-management and processing steps that we employ in genome annotation. We would be interested in suggestions for other reports or queries that others may find useful. |
Home | Sequencing | Functional Genomics |
Author Index | Sequencing Technologies | Microbial Genome Program |
Search | Mapping | Ethical, Legal, & Social Issues |
Order a copy | Informatics | Infrastructure |