February
16-17, 1999 Summary
Prepared by Raju Kucherlapati and David Valle, Co-Chairs
The
Human Genome Project has stimulated increasing interest in genome biology for
a number of model organisms as the utility of genomic technologies and resources,
such as cDNA and genomic sequences, is rapidly being realized. Several groups
have advocated that the NIH support the generation of genomic and genetic resources
for a number of non-mammalian model organisms. In 1997, the National Cancer Institute
(NCI) convened a small panel to discuss the use of non-mammalian model organisms
to facilitate the study of human cancer. Among this panel's recommendations was
the development of the infrastructure (genetic and genomic resources and technologies)
needed to facilitate basic research in those model organisms important for cancer
research. This panel outlined
a series of specific recommendations for five model organisms: S. cerevisiae
(yeast); C. elegans; (round warm) D. melanogaster (fruit fly);
D. rerio (zebrafish) and X. laevis (Xenopus). The list of recommendations
can be found at: nci_nmm_report.html. Similarly,
in 1997 the zebrafish community presented the NIH with a list of genomic resources
needed for their research efforts. Because of the costs of such large-scale projects
and the shared interest of many Institutes and Centers in supporting the development
of these genomic resources, the NIH is facing the challenge of providing resources
for studying non-mammalian model organisms. To
address this challenge, the NIH convened a workshop for the purpose of evaluating
the current status of genomic resource development for the non-mammalian model
organisms already undergoing genomic analysis, identifying additional resource
needs for these organisms and considering what additional model organisms might
be suitable to similar development. Approximately 80 scientists, together with
an equal number of staff from the NIH, as well as representatives from several
other governmental agencies, including the National Science Foundation, the Department
of Energy and the U.S. Department
of Agriculture, met on February 16-17, 1999, on the NIH campus in Bethesda, MD.
This workshop was designed to have a broad group of scientists provide input on
this subject, and it was recognized that these discussions likely would be followed
by more focused dialogs. One of the desired outcomes of the workshop was the generation
of ideas as to how both the NIH and the relevant research communities could move
forward in this area in the future. The
workshop focused primarily on establishing priority needs for the five model organisms
identified by the NCI panel. Approximately 8-10 investigators from each of these
communities were present at the workshop. Scientists working on a diverse set
of other organisms were also in attendance. There were five breakout session groups,
each of which focused its discussion predominantly on the resource needs of one
of the five major organisms, and a prioritized list of recommendations was developed.
A summary of these recommendations is presented at the end of this Executive Summary;
individual reports are presented in the section "Breakout
Group Reports" and "Recommendations
for Additional Selected Model Organisms". Presentations were also made
on nine additional non-mammalian model organisms. These organisms were chosen
primarily because they are the subject of a significant level of NIH-supported
research. Scientists representing the additional models made recommendations about
priority needs for those other organisms and participated in a general discussion
of value of model organisms beyond those that are already well studied. For all
of the organisms that were discussed, a table featuring the major characteristics
was compiled by participants and can be found in the section, "Tables
Summarizing the Features of Selected Non-Mammalian Model Organisms".
The workshop was very successful
in enhancing communication across many lines. Representatives from each of the
five communities, as well as several of the others, put a great deal of effort
into canvassing their communities prior to the workshop to develop a consensus
on research needs. This was especially striking for the Xenopus and G. gallus
(chicken) researchers who had not previously considered their resource needs
as a community. Remarkably, approximately 100 researchers representing the chicken
community submitted a proposal for a chicken genome project at the time of the
workshop. At another level, there was considerable interaction between the different
communities. Those groups representing
organisms for which there is already a large amount of experience with structural
and functional genomics, especially S. cerevisiae and C. elegans,
conveyed lessons that they had learned. In addition, representatives from the
zebrafish community, which had recently come together as a cohesive group to work
on the generation of genetic and genomic resources, met with Xenopus investigators
to discuss the lessons learned from launching the zebrafish genome project. Lastly,
several of the groups are planning follow-up meetings that will focus specifically
on the genomic resource needs for their model organism.
|
MAJOR
RECOMMENDATIONS A
number of common themes and issues were identified:
Sequencing
Genomic Sequences for Primary
Model Organisms. The
different breakout groups considered genomic sequencing, progress and needs. The
genomic sequence of the yeast S. cerevisiae has been completed. More than
99% of the C. elegans sequence is complete, and the group recommended that
closing the remaining gaps in the sequence of this organism as a top priority.
Completion of the sequence of D. melanogaster may occur within the next
year, depending on the success of the collaboration between researchers at the
University of California, Berkeley and Celera Genomics. In
any event, the sequence should be completed by 2001. The group recommended that
a project to sequence the zebrafish genome be initiated with the goal of completing
this sequence by the end of 2008. The Xenopus community did not consider genomic
sequencing a high priority at this time. Comparative
Genomic Sequences. Comparative
sequencing of genomes has proved to be a good predictor of gene structure and
functionally important transcriptional regulatory regions. Identification of conserved
regulatory regions may make it possible to assemble regulatory cascades by searching
whole genome sequence for conserved transcription factor binding sites. The limited
data from C. elegans and C. briggsae have shown the power of this
approach. The complete sequence
of C. briggsae and ultimately another more distantly related nematode,
perhaps a parasite, would provide powerful tools for biologists. Since the genomic
sequence of D. melanogaster will possibly be completed within a year, limited
sequencing of a related species (e.g., D. virilis) would provide valuable
information about functionally important sequences. EST
Sequences. Expressed sequence tags (ESTs), short sequences of cDNA clones,
have proved extremely useful for a variety of research applications. For example,
human ESTs have been extraordinarily useful for identification of human genes
based on homology to genes identified in model organisms. Similarly, ESTs have
been useful for gene identification in genomic sequence in a region of interest
in positional cloning efforts or in regions surrounding an insert in insertional
mutagenesis studies. For Xenopus and zebrafish, assembly of a large set of EST
sequences was considered higher priority than genomic sequencing. High
cost is a general concern relevant to all large-scale sequencing efforts. There
are few laboratories where the cost and efficiency of sequencing are such that
the above recommendations can be implemented immediately. Thus, efforts should
be made to improve sequencing technology, reduce the cost and increase the opportunity
for more groups to become efficient in large-scale sequencing. Full-length
cDNA Clones and Sequences All
the breakout groups felt that availability of a fully representational, complete
set of sequenced full-length cDNA clones would be an important resource. Such
a unigene set of full-length cDNAs would be useful for confirming the expression
of predicted genes and determining patterns of alternative pre-mRNA splicing;
monitoring changes in genome-wide patterns of transcription using, for example,
microarrays; systematic RNA-mediated interference (RNAi); two-hybrid analysis;
and in vitro synthesis of protein products to be used for functional biochemical
experiments. Therefore, efforts
to generate such sets of clones and sequences should be given a top priority.
While the availability of full-length cDNAs holds great promise for many types
of experiments, the technology for systematically and efficiently isolating full-length
cDNAs needs further development. Support for improving this critical technology
should be continued. cDNA
Microarrays The
availability of the complete genome sequence and, in particular, identification
of all transcription units, is revolutionizing the study of yeast biology. Similar
consequences are expected for the study of other model organisms as their sequence
becomes known. The development of microarray technology is playing a central role
in this revolution, greatly facilitating and expanding functional analysis of
genes and genomes. Currently, however, microarray technology is not widely available
because it is not easily transferable, requires a high initial investment, and
methods to quantify and interpret the results are just beginning to be developed.
To make this technology as robust
and broadly available as possible, it was recommended that NIH provide additional
resources to enhance the dissemination of the technology, especially to academic
researchers; to generate analytic tools to interpret the results; and to create
of sets of standard controls to allow comparison of results between experiments
and laboratories. Genome-wide
Gene Knock-outs The
status of the technology to obtain genetic inactivation or modification of genes
differs widely for each of the organisms. Efficient technologies to generate such
gene modification are still needed for zebrafish and C. elegans, for example.
A genome-wide effort for modification of genes in yeast is underway, and a smaller
scale project for D. melanogaster is in progress. Genome-wide
knockouts should be developed as a central resource that is then made readily
available to the community. Databases The
availability of easily accessible, up-to-date, public databases is essential for
storage, utilization and manipulation of the large amounts of genomic and genetic
data that are being generated. To promote accessibility and interaction between
model organism communities, it was considered of high importance that the databases
for each of the model organisms have similar formats. Some of the features considered
important for all databases included: effective links to the databases of other
organisms; curated pathways (e.g., for metabolic and signal transduction pathways);curated
and cross-referenced expression array data and methods for sorting existing array
data; a phenotype-based search engine; image data for protein localization; and
expansion to include new features, such as polymorphism data and unpublished information
on mutant phenotypes. For all five organisms, databases at varying degrees of
development are available, but the need for them to be significantly enhanced
was recognized. Therefore, each group recommended an increase in support for the
databases. Similarly, support to develop public databases for other models will
be critical. Centralized
Resources and Their Distribution To
enable research on model organisms, several vital resources were identified. One
of these is a stock center for each organism. Currently, individual laboratories
are unable to store and distribute all the mutants they identify due to lack of
space and funds for maintenance. Existing stock centers are also facing the same
problem. The anticipated increase in the number of mutants that will be generated
will require that the capacity for storing stocks and the funds for maintenance
of these stocks be increased. Another resource that will be useful is a set of
commonly used vectors, as is access to genomic and cDNA clones and libraries.
It is necessary to provide adequate funds for individual research laboratories
and large centers to
store and distribute these key
molecular reagents. Cost
Estimates The participants
made approximate yearly total cost estimates to implement the recommendations.
These numbers were estimated at the time of the meeting and may not reflect the
actual costs of these resources accurately. These cost estimated can be found
in the breakout session reports. |
GENERAL DISCUSSION OF THE VALUE
OF MODEL ORGANISMS Beyond
the five main models that were considered by the breakout groups, the other major
focus of the workshop was consideration of additional model organisms. What follows
is a summary of this discussion: Model
organisms serve biomedical research in several ways. First, they exemplify intrinsically
interesting biology. Investigators interested in a particular biological question
utilize an appropriate model organism as their experimental system. Although medical
concerns may not have figured in to the formulation of the question, the answer
sometimes has great medical relevance. Examples of this serendipitous process
include the discovery of the role of mismatch repair genes in familial cancer
syndromes based on work in E. coli and S. cerevisiae; the elucidation
of apoptosis as a common mechanism in neurodegenerative disease based on work
in C. elegans, and the realization of the importance of hedgehog signaling
in human developmental defects first worked out in D. melanogaster and
in D. rerio. Second, investigators interested in studying a particular
human problem may find that it is easier to approach using a model system. The
recent explosion in our knowledge of the genes involved in genetic disorders of
peroxisome biogenesis and function based on their initial identification in yeast,
is a good example. Third, model organisms serve as models for models. Currently,
the genome-wide approaches to functional genomics being developed in S. cerevisiae
serve as a model for investigators working in C. elegans, D. melanogaster
and other model systems. For
the purpose of this meeting, five model organisms were designated as "major" on
the basis of their phylogeny, their experimental history, the size of their investigator
community and the magnitude of their contribution to the sum of our biomedical
knowledge. But this handful of organisms does not begin to encompass the biological
diversity and experimental advantages of the millions of species comprising the
35 phyla of extant animal life. Thus, the organizers of the meeting felt it was
important to consider additional model organisms in terms of the biological properties
they best exemplify, their experimental utility and their value in providing a
more complete sampling of phylogenetic and biologic diversity. Information was
assembled and briefly presented on nine of these (summarized below). |
Chlamydomonas
(C. reinhartii). A unicellular organism with prominent chloroplasts,
flagellum and basal bodies, Chlamydomonas has a 100 Mb genome and typically exists
as a haploid organism although it is possible to construct diploids. Flagella
and the closely related cilia are vital for many human cells and tissues including
ciliated epithelia and sperm. About 10% of the Chlamydomonas genome is estimated
to encode proteins necessary for flagellar structure and function; 34 of these
have already been cloned. There is a well-developed investigator community and
a stock center. EST sequences from organisms at certain stages of the cell cycle
and a physical map with a BAC contig are top priorities of the Chlamydomonas community. Tetrahymena
(T. thermophila). A ciliated unicellular organism with interesting
nuclear dimorphism: a transcriptionally inactive diploid micronucleus and a transcriptionally
active, ~200 Mb macronucleus with ~250 chromosomes each ~1 Mb in size. Tetrahymena
undergoes homologous recombination allowing facile gene disruption or replacement
and research in Tetrahymena has led to the identification of self-splicing introns,
telomere structure and identification of telomerase and telomerase RNA. Top priorities
of the Tetrahymena community include support for a pilot project to explore direct
shotgun sequencing of ~10% of the macronuclear genome. This would provide insight
into genome organization and identify a set of genes to manipulate and characterize
as models of human counterparts. This pilot would also speed the development of
the technology for construction of high-resolution maps, cloning by complementation,
insertional mutagenesis and the development of highly engineered strains. Additionally,
funds for an annual course to train biologists interested in using Tetrahymena
would enhance its value as a model. Dictyostelium
(D. discoideum). A free-living amoeba that undergoes aggregation and
differentiation into a simple multi-cellular organism, Dictyostelium is a powerful
model for the molecular genetics of phagocytosis, cytokinesis, cell/cell interactions
and signal transduction pathways. A Dictyostelium genome project is underway with
~20% of the 34 Mb genome completed and a collection of ~10,000 ESTs. Relatively
modest additional funds to support the finishing steps of the genome project and
to support enhancement of a central database and stock storage and distribution
center would greatly enhance the value of Dictyostelium as a model. Resources
to be developed in the future include cDNA arrays. Fission
yeast (S. pombe). Fission yeast is a simple unicellular eukaryote readily
amenable to genetic manipulation with stable haploid and diploid forms, homologous
recombination and thousands of mutants and hundreds of genes already in hand.
A genome project is underway with ~75% of the 14 Mb genome complete. S. pombe
has proved to be a valuable model for the elucidation of cell cycle regulation
and other vital cellular processes. The ancestors of S. cerevisiae and
S. pombe diverged ~500-100 Myr ago; this evolutionary separation makes
comparison of their genes a powerful tool for identification and analysis of human
genes. Additional funds to complete and annotate the genome sequence, develop
DNA arrays and genome wide mutagenesis would enhance the value of S. pombe
as a model. In particular the synergism afforded by adding S. pombe to
the list of models with completed genome sequences will be substantial in helping
to decipher gene identification and function in the human sequence. |
Neurospora
(N. crassa). Neurospora and the filamentous fungi, in general, have
been important model organisms for some time. Beadle and Tatum used Neurospora
to develop the one gene/one enzyme hypothesis and more recently it has been used
for studies of a wide variety of cellular processes many of which are possible
in yeast. Homologous recombination, high frequency transformation and more than
a 1000 identified mutants enhance its usefulness as a genetic model. Genome analysis
is in progress with funds committed for sequencing ~30% of the genome and ongoing
EST projects that have identified about 40% of the estimated 13,000 genes. Funds
to continue these genomic studies plus to develop DNA arrays would increase the
usefulness of Neurospora as a model. Aplysia
(A. california). A mollusc with simple, learned behaviors, Aplysia
provides a useful model for neuronal interactions, synaptic plasticity, physiology
and the study of memory. The cell/cell connections are easy to map and electrophysiology
is facilitated by the very large neuronal size. Genetic study of Aplysia is only
minimally developed with ~ 225 identified cDNAs. The accumulation of EST sequences,
support for a central database and assembly of cDNA arrays would greatly increase
the value of Aplysia as a model and would facilitate comparisons of genes important
for neuronal function and behavior in other models such as Drosophila and mouse. Sea
urchin (S. purpuratus). A deuterostome metazoan, sea urchin
has been a powerful model for gene regulation and early embryogenesis and as a
representative of a non-vertebrate metazoan phyla. A modest EST collection is
available and an urchin genome project is underway with private funding. Additional
support for these projects would provide much improved access to an important
segment of the biologic panoply. Fugu
(F. rubripes). The pufferfish, Fugu, is a model vertebrate genome
characterized by a relative lack of repetitive DNA, small introns and dense gene
packing so that there is ~1 gene/6-7 kb in a total genome of about 400 Mb. This
characteristic plus its position in phylogeny makes sequencing the Fugu genome
an efficient way to identify and characterize vertebrate genes. About 0.5 Mb of
Fugu genomic sequence has been determined and is 21% coding sequence with conservation
of synteny with mammalian genomes. Additional support for a Fugu genome project
would provide information useful for gene identification in the human genome. Chicken
(G. gallus). Chicken has been a productive model for experimental
embryology. A variety of methods have been developed to study and manipulate embryogenesis
in ovo. These have lead to important contributions to our understanding of limb
development, neurogenesis, body axis development, somite formation and other aspects
of embryogenesis. These developmental studies and transfer of their results to
human systems would be greatly improved by the generation of genetic resources
including a robust chick EST database, a physical map of the chicken genome, support
for a chicken mutant repository and development of a web-based database. A proposal
for a chicken genome project was presented at the workshop. |
Conclusions
of model organism discussion. Each of these organisms has advantages for the
study of certain biological processes and, at least for some, significant genomic
resources are already being developed. Additionally, it was recognized that availability
of genomic information on multiple models has a synergistic value for research
in human genetics. For nearly all of these additional models, a relatively small
investment would greatly enhance their genomic resources and value as experimental
systems. Of particular interest was the development of EST sequences, genome sequencing
and databases to collate the information and make it accessible to the entire
community of biomedical scientists.Understanding our evolutionary history will
provide enormous insight to development, gene function and to the role of genetic
variation in human disease. Additional consideration of how the current collection
of model organisms represents phylogeny should be made. The possible value of
an insect model in addition to Drosophila and of an ascidian model to represent
early chordates should be explored. Additional discussion will be required to
select specific models and prioritize resources. |
| |