GenBank Celebrates 25 Years of Service with
Two-Day Conference; Leading Scientists Will Discuss the DNA Database
at April 7-8 Meeting
For a quarter century, GenBank has helped advance scientific discovery
worldwide. Established by the National Institutes of Health (NIH)
in 1982, the database of nucleic acid sequences is one of the key
tools that scientists use to conduct biomedical and biologic research.
Since its creation, GenBank has grown at an exponential rate, doubling
in size every 18 months. In celebration of this vital resource
and its contribution to science over the last 25 years, the National
Center for Biotechnology Information, National Library of Medicine
(NLM), NIH, is holding a two-day conference on GenBank.
The conference will take place April 7-8, 2008 at the Natcher
Conference Center on the main NIH campus in Bethesda, Maryland.
For details on the meeting, see the conference Web site, at http://www.tech-res.com/GenBank25.
The conference is open to the public and also will be available
via live and archived webcast; the April 7 proceedings can be viewed
at http://www.videocast.nih.gov/summary.asp?live=6670 and the April
8 proceedings at http://www.videocast.nih.gov/summary.asp?live=6671.
The conference will bring together a slate of world-renowned scientists
in molecular biology, genetics, bioinformatics and other areas
to discuss GenBank's applications, the discoveries it has enabled,
its history, and future directions. Speakers include Rich Roberts,
Ph.D., a Nobel Prize winner for his discoveries of split genes,
and currently Chief Scientific Officer at New England BioLabs;
Sydney Brenner, Ph.D., a Nobel Prize winner for his work on genetic
regulation of organ development and programmed cell death, and
currently a professor at the Salk Institute; Francis Collins, M.D.,
Ph.D., who led the Human Genome Project and is Director of NIH's
National Human Genome Research Institute; and Craig Venter, Ph.D.,
who led the private-sector effort to sequence the human genome
and is President of the J. Craig Venter Institute. More than a
dozen other eminent scientists will be speaking; the full list
of presenters can be viewed at the GenBank conference Web site.
"Each day, researchers across the world submit tens of thousands
of sequences to GenBank and collaborating databases in Europe and
Japan," said Donald A. B. Lindberg, M.D., Director of the National
Library of Medicine. "Because of these contributions, GenBank has
become an essential tool for molecular biology. The National Library
of Medicine is proud to partner with the research community in
making this valuable resource available."
Rich Roberts, Ph.D., Chief Scientific Officer at New England BioLabs,
commented, "GenBank has provided a foundation upon which much of
contemporary biology is now based. It is becoming almost impossible
to conceive of any serious biological study of a new organism that
does not begin with the determination of its DNA sequence, which
of course must be stored in GenBank." Roberts, one of the early
proponents of the database, added, "the availability of this wealth
of sequence information in a single repository is something we
could only dream about in 1979 at the Rockefeller Conference that
led to its creation and which we could not imagine being without
today."
GenBank History
When scientists first began sequencing proteins and DNA it was
an expensive and time consuming process, leading researchers to
usually limit their sequencing to those genes and proteins for
which they had a particular interest. A small number of groups
began collecting sequencing data and would sometimes do comparisons
that led to serendipitous discoveries, for example that two proteins
were related evolutionarily.
By the late 1970s consensus was emerging about the need for an
international computer database of nucleic acid sequence data.
In particular, a 1979 workshop sponsored by the National Science
Foundation and held at Rockefeller University resulted in a call
for such a database and development of analysis tools. NIH held
a series of workshops the following two years to define the project
and subsequently issued a request for proposals. In 1982, NIH awarded
a five-year contract for the nucleic acid sequence database to
the private firm of Bolt, Beranek and Newman with a subcontract
to Los Alamos National Laboratory, marking the official beginning
of GenBank.
A significant leap forward came shortly thereafter in the area
of analysis tools: In early 1983, two NIH researchers (John Wilbur,
M.D., Ph.D., and David Lipman, M.D.) published an algorithm that
allowed data banks to be searched for sequences similar to the
queried sequence in a matter of 2 or 3 minutes. This markedly accelerated
the science, making it easier for researchers to routinely do sequence
comparisons. Further advances in analysis tools followed, such
as the 1990 introduction of BLAST (Basic Local Alignment Search
Tool), which can search GenBank for similar sequences in mere seconds.
Shortly after GenBank was established, discussions began with
the European Molecular Biology Laboratory (EMBL), which had established
its own data bank. Within a couple of years GenBank and EMBL were
collaborating, and by the mid-1980s, the DNA Data Bank of Japan
(DDBJ) joined in. The three groups now exchange data daily under
what is known as the International Nucleotide Sequence Database
Collaboration (INSDC).
Growth of the databases was further stimulated by scientific journals,
which began requiring authors to get accession numbers from GenBank,
EMBL or DDBJ for articles that included sequences.
In 1987, NIH issued a second five-year contract, this time to
the firm of IntelliGenetics with a subcontract to Los Alamos National
Laboratory. When the contract ended in 1992, GenBank was moved
to the National Center for Biotechnology Information (NCBI), a
division of NIH's National Library of Medicine that was established
in 1988 under the leadership of BLAST co-developer David Lipman.
Today, GenBank continues to be operated by NCBI, which has integrated
it with dozens of other biological databases - such as genome maps
and protein structures - as well as the scientific literature (via
its PubMed and PubMed Central databases) and tools for analysis.
Improvements in sequencing technologies and reduced sequencing
costs are resulting in massive increases in the quantity of data
produced, in turn driving exponential growth in GenBank, which
currently contains data on about 110 million sequences and 200
billion base pairs.
"GenBank has been a critical research tool, enabling much of the
progress that has been made over the last two decades in understanding
biological function and genetics," said Lipman, Director of NCBI
and a speaker at the GenBank conference. "The value of the database
will only expand as it continues to grow, new computational tools
are introduced, and the data are further integrated with other
relevant data."
GenBank Basics:
What does it contain? GenBank is a comprehensive database of publicly
available annotated nucleotide sequences for more than 240,000
named organisms. The sequences include messenger RNA segments with
coding regions, segments of genomic DNA with a single gene or multiple
genes, and entire genomes. The number of base pairs in GenBank
doubles about every 18 months; currently the database includes
approximately 110 million sequences and 200 billion base pairs.
Where do the data come from? GenBank is an archive of primary
sequence data that has been provided by those who conduct the sequencing,
mostly individual labs and large-scale sequencing projects. GenBank
exchanges data daily with its two partners in the International
Nucleotide Sequence Database Collaboration (INSDC): the European
Bioinformatics Institute (EBI) and the DNA Data Bank of Japan (DDBJ).
What is GenBank's relationship to the Human Genome Project? Initiated
in 1990, the Human Genome Project was a 13-year effort coordinated
by the U.S. Department of Energy and NIH that aimed, among other
things, to determine the sequences of the 3 billion chemical base
pairs that make up human DNA. The sequence data were submitted
to GenBank as they were generated.
What sorts of discoveries have been made using GenBank? Analyses
of GenBank sequences are an indispensable and regular part of the
process of characterizing gene functions, which underlies many
important advances in science and medicine. GenBank has also proven
invaluable in identifying disease. One example, in November 2005,
involved identification of the first polio case in the U.S. since
1999. A state health laboratory in Minnesota had isolated an unknown
virus from a child from an Amish community who was thought to be
suffering from an intestinal virus. When the laboratory determined
the virus's DNA code, they searched against the sequences in GenBank
and found not only that it was a polio virus, but that it specifically
matched the strain of the virus used in the Sabin oral vaccine.
More recently, scientists investigating the die-off of honeybees
(colony collapse disorder) ran the sequences from diseased bee
hives through GenBank and found a strong correlation with Israeli
acute paralysis virus.
Where can I learn more? A good place to start is the homepage
for GenBank, at http://www.ncbi.nlm.nih.gov/Genbank.
Established in 1988 as a national resource for molecular biology
information, NCBI creates public databases, conducts research in
computational biology, develops software tools for analyzing molecular
and genomic data, and disseminates biomedical information, all
for the better understanding of processes affecting human health
and disease. NCBI is a division of the National Library of Medicine
at the NIH. For more information, visit http://www.ncbi.nlm.nih.gov/.
The National Library of Medicine is the world's largest library
of the health sciences. It is located on the NIH campus in Bethesda,
Maryland. For more information, visit the Web site at http://www.nlm.nih.gov/.
The National Human Genome Research Institute (NHGRI) led the National
Institutes of Health's (NIH) contribution to the International
Human Genome Project, which had as its primary goal the sequencing
of the human genome. This project was successfully completed in
April 2003. Now, the NHGRI's mission has expanded to encompass
a broad range of studies aimed at understanding the structure and
function of the human genome and its role in health and disease.
Additional information about NHGRI can be found at its Web site,
www.genome.gov.
The National Institutes of Health (NIH) — The Nation's
Medical Research Agency — includes 27 Institutes and
Centers and is a component of the U.S. Department of Health and
Human Services. It is the primary federal agency for conducting
and supporting basic, clinical and translational medical research,
and it investigates the causes, treatments, and cures for both
common and rare diseases. For more information about NIH and
its programs, visit www.nih.gov.
|