Skip Navigation Genome.gov - National Human Genome Research InstituteGenome.gov - National Human Genome Research InstituteGenome.gov - National Human Genome Research InstituteNational Institutes of Health
   
       Home | About NHGRI | Newsroom | Staff
Research Grants Health Policy & Ethics Educational Resources Careers & Training

Home>Grants>Active Grants Database >Active Grants Database - Search Results
Print Version

5 U41 U41HG04269

A Data Coordinating Center for modENCODE

Principal Investigator: Lincoln Stein
COLD SPRING HARBOR LABORATORY
1 BUNGTOWN ROAD, PO BOX 100

Project Period: 05/04/2007 - 03/31/2011

Abstract (from grant application):

DESCRIPTION: The modENCODE project is a key sequel to the sequencing of the fly and worm genomes, and will have an enormous impact on our understanding of biological processes in all higher eukaryotes, including human. In order to manage the diverse, large-scale datasets that will be produced by modENCODE, we propose to create a data coordinating center (DCC) to track the data, integrate it with other information sources, and make it available to the research community in a timely and open fashion. This proposal brings together four groups with highly relevant backgrounds: The Micklem group, through its work on the InterMine system and FlyMine database, has extensive experience in integrating diverse types of data into high-performance data mining systems. The Stein and Lewis groups bring to the project an intimate familiarity with the C. elegans and D. melanogaster genomes, their reagents and research communities, and are well-positioned by their work with the WormBase and FlyBase databases to liaise with those MODs. The Kent group is responsible for the DCC for the Human ENCODE pilot project, and has extensive practical knowledge of developing and managing projects of this sort. We will assemble a team of three data managers stationed at CSHL and at Berkeley, who have a background in the bioinformatics of C. elegans and/or D. melanogaster. The managers will liaise with their contacts at the data provider sites to determine data file formats, milestones and quality control procedures for their datasets. They will also liaise with representatives from NCBI to coordinate modENCODE activities with the primary data repositories at GenBank and GEO. Data providers will upload their data sets to a staging server where they will be able to preview their data on an instance of the GBrowse genome browser. The data managers will QC the data before approving its transfer to the production database. Data will be integrated in the production database using InterMine, and from there released to the public on a monthly schedule. Researchers will be able to access the data via the GBrowse genome browser, bulk downloads, and via complex queries and reports mediated by InterMine and the BioMart data warehousing system. All major software systems used by the proposed DCC will be based on open source tools from the Generic Model Organism Database (GMOD), human ENCODE, and other sources. Throughout the project, Lewis and Stein will work close with FlyBase and/or WormBase to ensure that data collected by modENCODE becomes an integral part of the relevant model organism database. In addition we will dedicate a significant part of a data manager's effort to transfer data from modENCODE into the MODs during the last year of the project.

< Back to results


For any questions about NHGRI Active Grants please contact: Carol Martin.


PrivacyCopyrightContactAccessibilitySite MapStaff DirectoryFOIAHome Department of Health and Human Services  National Institutes of Health  USA.gov