NIH Launches dbGaP, a Database of Genome Wide Association Studies
The National Library of Medicine (NLM), part of the National Institutes
of Health (NIH), announces the introduction of dbGaP, a new database
designed to archive and distribute data from genome wide association
(GWA) studies. GWA studies explore the association between specific
genes (genotype information) and observable traits, such as blood
pressure and weight, or the presence or absence of a disease or
condition (phenotype information). Connecting phenotype and genotype
data provides information about the genes that may be involved
in a disease process or condition, which can be critical for better
understanding the disease and for developing new diagnostic methods
and treatments.
dbGaP, the database of Genotype and Phenotype, will for the first
time provide a central location for interested parties to see all
study documentation and to view summaries of the measured variables
in an organized and searchable web format. The database will also
provide pre-computed analyses of the level of statistical association
between genes and selected phenotypes. Genotype data are obtained
by using high-throughput genotyping arrays to test subjects’ DNA
for single nucleotide polymorphisms (SNPs), areas of the genome
that have been found to vary among humans.
The database was developed and will be managed by the National
Center for Biotechnology Information (www.ncbi.nlm.nih.gov),
a division of NLM. dbGaP is located at the website http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gap.
The initial release of dbGaP contains data on two studies: the
Age-Related Eye Diseases Study (AREDS), a 600-subject, multicenter,
case-controlled, prospective study of the clinical course of age-related
macular degeneration and age-related cataracts that was supported
by the National Eye Institute (www.nei.nih.gov);
and the National Institute of Neurological Disorders and Stroke
(www.ninds.nih.gov) Parkinsonism
Study, a case-controlled study that gathered DNA, cell line samples
and detailed phenotypic data on 2,573 subjects. NEI and NINDS worked
closely with NCBI in placing data from the two studies in dbGaP.
"The availability of AREDS data in this database, which can be
accessed free of charge, signals a whole new way of conducting
vision research," said Paul Sieving, M.D., Ph.D., director of NEI. "Having
this information widely available will help researchers better
understand gene-based eye diseases, will likely speed development
of effective therapies, and, thereby, will prove to be a worthwhile
investment for the taxpayers who funded this important medical
research."
Danilo Tagle, Ph.D., a program director for NINDS's neurogenetics
program, commented: "The launch of dbGaP addresses the critical
need for sharing of genotype and phenotype information coming from
genome wide association studies. The large collection of DNA samples
and well-described clinical information from these studies, and
subsequent genotyping analyses, are strategic investments by the
institute that will surely pay huge returns. They will continue
to pay dividends as other groups access dbGaP to do meta-analyses
of GWA datasets."
In order to protect research participant privacy, all studies
in dbGaP will have two levels of access: open and controlled. The
open-access data, which can be browsed online or downloaded from
dbGaP without prior permission or authorization, generally will
include all the study documents, such as the protocol and questionnaires,
as well as summary data for each measured phenotype variable and
for genotype results. Preauthorization will be required to gain
access to the phenotype and genotype results for each individual;
this individual-level data will be coded so as to protect the identity
of study participants. The AREDS and NINDS individual-level data
is expected to be available in several weeks, when the dbGaP authorization
system is put in place.
For AREDS and the NINDS Parkinsonism Study, pre-computed analyses
of the associations between phenotypic variables and genotypes
will be provided in the unrestricted part of the database. The
policy on providing access to pre-computed associations will be
determined on a study-by-study basis by the NIH institute overseeing
each study wishing to be included in dbGaP. In some cases, the
pre-computed association analyses may only be provided in the controlled-access
portion of the database, or it may be held in the controlled-access
portion for the duration of a publication embargo and then moved
to the open-access section.
"The dbGaP project marks a new milestone in data sharing," said
NLM Director Donald A. B. Lindberg, M.D. "Researchers, students
and the public will have access to a level of study detail that
was not previously available and to genotype-phenotype associations
that should provide a wealth of hypothesis-generating leads," he
said. "These data will be linked to related literature in PubMed
and molecular data in other NCBI databases, thereby enhancing the
research process."
NCBI expects to add database enhancements and a number of additional
studies over the coming year. GWA studies that will be added encompass
a broad range of disease areas and study models. The studies focus
on heart disease, women’s health, neurological disorders,
neuropsychiatric disorders, diabetes, and environmental factors
in disease. The Framingham SHARe Study, for instance, will provide
data from the landmark Framingham Heart Study, which is funded
by the National Heart, Lung, and Blood Institute (http://www.nhlbi.nih.gov/).
Blood samples from approximately 7,000 of the study subjects are
being genotyped, and that data will be linked to the numerous types
of phenotype data collected in the study.
Data from the Genetic Association Information Network (GAIN),
a public-private partnership, also will be added to dbGaP. The
project is being led by the Foundation for NIH (FNIH), with participation
and/or funding from Pfizer, Affymetrix, Perlegen Sciences, Abbott,
the Broad Institute of MIT and Harvard, and NIH. Private donors
have contributed $26 million to help fund GAIN, which provides
for genotyping DNA samples from participants in clinical studies
that were already conducted. In October, FNIH selected an initial
group of six studies to fund.
Development of dbGaP involved the participation of many NIH institutes.
The effort was led by NCBI's Information Engineering Branch, headed
by Branch Chief James Ostell, Ph.D. "dbGaP links together the fruits
of the world"s recent investment in sequencing the human genome
with our decades-long investment in clinical research," Dr. Ostell
said. "Correlating that information and making it widely available
is a key step in providing researchers with the data they need
to understand disease and attempt to develop cures. The potential
scientific advances from this information represent a payoff not
only for the taxpayers who financed genomic and clinical research,
but for the patients who participated in clinical trials in hopes
of furthering public health."
Established in 1988 as a national resource for molecular biology
information, NCBI creates public databases, conducts research
in computational biology, develops software tools for analyzing
molecular and genomic data, and disseminates biomedical information —all
for the better understanding of processes affecting human health
and disease. NCBI is a division of the National Library of Medicine
(http://www.nlm.nih.gov/)
at the NIH.
The National Institutes of Health (NIH) — The Nation's
Medical Research Agency — includes 27 Institutes and
Centers and is a component of the U.S. Department of Health and
Human Services. It is the primary federal agency for conducting
and supporting basic, clinical and translational medical research,
and it investigates the causes, treatments, and cures for both
common and rare diseases. For more information about NIH and
its programs, visit www.nih.gov. |