Proposal to sequence Populus Genome

Stan Wullschleger and Jerry Tuskan (Oak Ridge National Laboratory) and Toby Bradshaw (University of Washington)

1. The importance of the genome to biomedical and biological research

Forest trees are the dominant life form in many ecosystems and contain greater than 90% of the Earth’s terrestrial biomass. Managed and unmanaged forests throughout the United States, and indeed the world, provide recreational and environmental benefits such as carbon sequestration, renewable energy supplies, watershed protection, improved air quality, biodiversity and habitat for endangered species. However, despite the importance of forest trees for natural ecosystems and the world economy, little is known about the biology of forest trees in comparison with the detailed information available for crop plants and model organisms such as Arabidopsis. As a result, the forest science community would derive much benefit from a comprehensive genomics research program, because traditional genetic approaches in forestry are limited by the large size, long generation interval, and outcrossing mating system of most trees. It is, therefore, of considerable importance that a forest tree genome be sequenced, so that forest tree biologists will have the necessary resources to begin a large scale, thorough analysis of genes that produce traits useful in the pursuit of basic science questions, to foster the development of improved plant materials for the forest products industry, and to ultimately select novel phenotypes that could be used to address questions related to the energy-related mission of the Department of Energy.

The genus Populus (including poplars, cottonwoods and aspens) is especially well suited to serve as the model genome for trees because of the following reasons:

 

 

2. The importance of the genome to DOE mission and stated goals

The Department of Energy, through its demonstrated success in the Human Genome Project and various microbial sequencing activities, is in a strong position to champion a Populus genome initiative. Such an activity focused on a woody perennial would bring together strengths in molecular biology, computational biology, bioinfomatics, global climate change, carbon management, bioremediation and thereby, would draw upon the considerable expertise provided by multiple programs within DOE’s Office of Science. In terms of research sponsored by the Office of Biological and Environmental Research (BER), Populus species including cottonwood, hybrid poplar and aspen are already being used in many activities ranging from carbon sequestration research, to free-air CO2 enrichment (FACE) studies, and to the development of fast-growing trees as a renewable bioenergy resource (EE/RE). Furthermore, a sequencing effort in trees could conceptually be applied to areas of phytoremediation, whereby trees such as poplars could be used to remediate hazardous waste sites (http://www.anl.gov/OPA/Frontiers2000/d3ee.html). Clearly, the information derived from a genome sequencing effort would benefit projects within DOE and open the doors to countless other opportunities to use woody plants in the pursuit of questions of interest to the energy-mission of DOE.

3. The size and interest of the research community

At the ninth annual Forest Tree Genome Workshop (14 January 2001), held in conjunction with the Plant and Animal Genome Meeting, a session was organized to address the possibility of soliciting support for sequencing the genome of a forest tree. Support among attendees was high and the organizers of that session established a website for promoting the establishment of an international collaborative, public effort to determine the complete DNA sequence of the poplar (Populus) genome.

http://poplar2.cfr.washington.edu/popseq/popseqsig.htm

It was felt that before the forest science community could proceed to solicit the political support necessary to begin a poplar genome sequencing project, it would be vital that forest tree biologists express their enthusiasm for such a project. So far, over 70 scientists have indicated via the website that they acknowledge that a poplar genome sequencing effort should be a top priority of forest genomics research and have expressed their desire to see this activity proceed as quickly as possible toward the ultimate goal of determining the complete sequence of the Populus genome.

4. Resources available to complement the sequence

The genus Populus, one of two genera in the Family Salicaceae, first occurred in the fossil record around 56 m.y.b.p. Today, the genus consists of five sections, 30 to 40 species and is circumpolar in the Northern Hemisphere. There are approximately eight species indigenous to North America, with four of those having commercial importance. All members of the Populus genus have a genome contained on 19 nearly identical, metacentric chromosomes, a nuclear content of 2C = 1.2 pg and an estimated genetic map length of ca. 2500 cM. The majority of the species are constitutively dioecious, having male and female members of a species. Interspecific hybridization is readily achievable among members within a single section and among members of certain alternate sections. Selection and hybridization began among the aspens in the middle 1950s, followed by selection and breeding in cottonwoods in the 1960s. Heterosis or hybrid vigor is common among interspecific hybrids. Relatively large multi-generation pedigrees exist for those species and their hybrids that have commercial value.

The most commonly produced pedigrees are from P. trichocarpa x P. deltoides, P. deltoides x P. nigra, P. grandidentata x P. alba and P. tremuloides x P. tremula. These species provide a logical choice for a genome sequencing effort. The largest F1 pedigree [2000+ progeny] is between a P. trichocarpa female ‘383-2499' and a P. deltoides male ‘ILL129'. The female, ‘383-2499', has provided the nuclear DNA for the construction of a 10x BAC library that has been distributed around the world. This BAC library was originally constructed by scientists at Texas A&M University. Furthermore, more than 150 simple sequence repeats (SSRs) have been 1) identified from this female, are retrievable from:

http://poplar2.cfr.washington.edu/pmgc/SSR/

and 2) placed on a genetic map. In addition, eight of the BACs have been partially sequenced using shotgun cloning approach. Approximately 1.5 Mb of P. trichocarpa ‘93-968' total genomic DNA has been sequenced from shotgun cloning experiments and SSRs occur about once in every 1 kb; the length and the form of the SSRs vary greatly. There is a 5000+ xylem-specific P. tremula x P. tremuloides ESTs available from:

http://www.biochem.kth.se/PopulusDB/index.html

Preliminary efforts to clones genes from alternate genotypes suggest that homology is high among all members of the genus [90%+] and that Populus contains multiple, typically two, copies of single genes found in other plant species such as Arabidopsis. As such, we recommend that the female clone ‘383-2499' be used in the genomic sequence effort for all Populus. This P. trichocarpa or black cottonwood female was originally collected along the Nisqually River in Washington, and is replicated in field and nursery plantings presently maintained by Dr. Toby Bradshaw and colleagues, University of Washington and Washington State University.

5. Other funding support possible for the genome sequencing project

This project has evident appeal to the USDA. Marco Marra and his group at UBC are currently engaged in a BAC-end sequencing project, and several EST projects are in progress.

6. Indications on how the genome sequence would be used

Once a draft sequence is available for the Populus genome, steps would be taken to annotate the completed genome. This work would be done in collaboration with JGI and ORNL, with the participation of the forest genetics community. We are fortunate to have access to Dendrome, a web-based collection of forest tree genome databases and other forest genetic information resources for the international forest genetics community.

http://dendrome.ucdavis.edu/index.html

Dendrome is a project of the Institute of Forest Genetics, Pacific Southwest Research Station, USDA Forest Service. It is part of a larger collaborative effort to construct genome databases for major crop and forest species, and the USDA-ARS Center for Bioinformatics at Cornell University, New York maintains this collection of crop databases as part of their ARS Genome Database Resource.

The primary genome database of Dendrome is called TreeGenes. TreeGenes is a database that includes genetic map, DNA sequence, germplasm, and other related information for a large number of forest tree species, including Populus. These same data can also be found at the Dendrome server under Genome Resources or at the USDA-ARS Center for Bioinformatics and Comparative Genomics.