Main page

Contents

Introduction

Sequencing Resources

Systems Biology

Analysis Results

Bioinformatics

Environmental Genomics

Technology Development

ELSI

Author Index

Institution Index

Pdf versions:
faster download (339 kb) or
higher resolution (610 kb)

Return to HGPI

Analysis Results:  Functional Genomics

15

The Celltech/MRI ENU Mutagenesis Program for Identifying Genes Controlling Immune Function in the Mouse

M. Brunkow[1] (Mary.Brunkow@sea.celltechgroup .com), M. Appleby[1], K. Staehling-Hampton[1], J. Gilchrist[2], P. Charmley[1], F. Ramsdell[1], J. Bouck[1], T. Britschgi[1], A. Snell[1], T. Howard[1], M. McEuen[1], P. Tang[1], S. Proll[1], B. Paeper[1], P. Tittel[1], G. Carlson[2], and R. Schatzman[1]

[1]Celltech R&D, Inc., Bothell, WA 98021; and [2]McLaughlin Research Institute, Great Falls, MT 59405

A major focus of the Celltech/MRI Mutagenesis Consortium is the use of ENU mutagenesis in mice to identify novel, clinically relevant targets in the areas of lymphocyte biology, inflammation and autoimmunity. The approach involves a three-generation recessive screen, and we are focusing on phenotypes which mimic desired clinical responses (e.g., suppressed inflammatory response), thus improving our chances of directly identifying relevant therapeutic targets. We have implemented a number of in vitro screens including activation of T- and B-cells, as well as T-dependent and T-independent inflammatory responses. These are carried out on peripheral blood lymphocytes, and have the advantage of being relatively high throughput and requiring only small volumes of blood, thus affording us the opportunity to perform a number of challenges on a single sample. The in vitro screens have been coupled with a complementary set of more complex in vivo screens based on classic pharmacologic models of inflammation and immune response (e.g., graft-versus-host response and colitis). In the past 2-1/2 years, over 100 phenodeviants have been identified and entered into the mapping process. We have developed an integrated laboratory / informatic pipeline which enables rapid identification of a candidate interval and the genes contained within, as well as efficient tracking of gene testing results, capturing both exon sequence and gene expression data. The development and utilization of informatics tools has proven critical in the effective management of the program. Another important aspect of the program is the ongoing process of new screen development to ensure as broad an interrogation of the immune system as possible. Specific lessons learned from the mutations identified so far will be discussed in more detail.



16

Nucleotide- or Amino Acid-Coded Mass Tagging for Functional Genomics and Proteomics

Sheng Gu, Songqin Pan, Tom Hunter, Haining Zhu, Fadi Abdi, John Engen, E. Morton Bradbury, and Xian Chen (chen_xian@lanl.gov)

Bioscience Division, Los Alamos National Laboratory

Mass spectrometry (MS) is a promising tool for rapid, accurate, and sensitive analyses in both areas of functional genomics and proteomics, but critical advances are needed to further increase its specificity and accuracy for the large-scale analyses of biomolecules at the genomic level. To address these cutting-edge issues, with systems biology in mind, we have developed a novel MS-based technique of mass tagging with stable isotopes for post-genomic studies. Our strategy of nucleotide- or amino acid-specific mass tagging in DNA or protein molecules provides a much more sensitive and accurate way of molecular labeling than radiological or chemical labeling. In addition to mass-to-charge ratio (m/z) in MS spectra, the use of these stable-isotope labels for tagging biological molecules in a sequence-specific way have dramatically enhanced the specificity, accuracy, sensitivity, and throughput of the MS-based technology for functional genomics and proteomics analyses.

We have extended the applications of our technology of mass tagging to quantitative proteomics, de novo peptide sequencing, direct detection of post-translation modifications and low abundant membrane proteins, and protein-protein interactions. In practical cases, we have investigated the differential protein expression involved in p53-induced apoptosis of cancer cells systematically using our quantitative mass tagging strategy that will be generally applicable for quantitative proteomics of any disease cells. A dozen of papers describing our technology have been published in the leading journals.



65

cis-Regulatory Discovery using Comparative Genomics

Tristan De Buysscher (tristan@caltech.edu), Nora Mullaney, and Barbara Wold

Caltech

A major aspect of gene function is its regulation: when, where, and how much protein is generated from the gene. The binding of regulatory proteins to small patterns of DNA around a gene, called cis-regulatory elements, to enhance or reduce expression has been long studied. However, only a small number of these control elements are known for comparatively few genes. Recent advances in efficient transgenics and the growing availibility of large scale genomic sequence from two or more species allows for an aggressive computational search for candidate cis-regulatory elements. For this purpose I developed a simple sequence comparison algorithm that was implemented in the context of an interactive sequence viewer, Family Relations (Brown 2002). It is used to highlight preferentially conserved (and therefore potentially functional) non-coding genomic sequence blocks around an orthologous gene in moderately diverged genomes.

A new sequence tool has been developed, Mussa, which uses a transitivity algorithm to combine two way analyses into arbitrarily large N way analyses. The resolving power increases dramatically with the addition of sequence from more species. Once candidate regions are found, they can now be relatively quickly assayed using lentiviral trangenesis (Luis 2002). Optimal number of species to use and their evolutionary divergence still needs to be examined, as well as addition of quantitative tools to provide an overview of conservation on a genomic scale.



17

Understanding the Biology of Brucella melitensis from Genome to Proteomes

Vito G. DelVecchio[1] (vimb@aol.com), Cesar V. Mujer[1], Mary Ann Wagner[1], Michel Eschenbrenner[1], Sue Hagius[2], and Phil Elzer[2]

[1]Institute of Molecular Biology and Medicine, University of Scranton, Scranton, PA 18510-4625; and [2]Department of Veterinary Science, Louisiana State University AgCenter, Baton Rouge, LA 70803

Brucellae are pathogenic gram-negative bacteria that cause brucellosis, a chronic infectious disease in humans characterized by undulant fever, arthritic pain, and neurological disorders. Brucellosis frequently causes abortion and sterility in domesticated animals such as cattle, sheep, and goats. Based on pathogenicity and host specificity, six species are found within this genus: B. abortus, B. canis, B. melitensis, B. neotomae, B. ovis, and B. suis. In addition, strains from marine mammals have been isolated and are tentatively referred to as members of B. maris. The genome of B. melitensis has been sequenced, annotated, and analyzed. It consists of two circular chromosomes of 2,117,144 bp and 1,177,787 bp that have been predicted to encode for 3198 ORFs. Sequence analysis confirmed that B. melitensis has the ability to survive and grow in aerobic, microaerophilic or anaerobic conditions. Although typical virulence factors and pathogenicity islands are absent, adhesins, invasin, and hemolysins are present.

Genomic data cannot designate which theoretical ORFs are active and thus cannot provide a definitive description of the ultimate biological potential of an organism. To obtain a functional overview, a global proteomics study of the laboratory-grown virulent strain 16M was initiated. So far, 937 proteins representing 269 ORFs were identified using 2-D gel electrophoresis and peptide mass fingerprinting. The two circular chromosomes of B. melitensis are functionally active and the locations of ORFs identified at the protein level are evenly distributed in each chromosome. A comparison of strain16M proteome with that of Rev1 revealed significant differences in the expression of several proteins affecting iron metabolism, sugar binding, protein biosynthesis and lipid degradation. To enhance these MALDI-TOF studies, SELDI-TOF was used to again pinpoint the differences between strains Rev1 and 16M. In general, genomic and proteomic information will eventually result in better insight to biomarker discovery, rapid identification and diagnostics, and aid in future vaccine development.



18

Finishing of Human Chromosome 16 Reveals Extensive Segmental Duplications

Norman Doggett[1] (doggett@lanl.gov), Cliff Han[1], Mark Mundt[1], Gary Xie[1], Robert Sutherland[1], David Bruce[1], Levy Ulanovsky[1], Jane Grimwood[2], Jeremy Schmutz[2], Susan Lucas[3], Laurie Gordon[3], Joel Martin[3], and JGI Staff[3]

[1]Center for Human Genome Studies, Los Alamos National Laboratory; [2]Stanford Human Genome Center; and [3]DOE Joint Genome Institute

The minimal tiling path of sequenced clones covering chromosome 16 consists of 636 BACs (169 Caltech and 467 RP11), 75 cosmids, 9 PACs, 5 YAC derived subclones (including subcloned cosmids from half YACs for each telomere) 4 P1 clones, 3 fosmids and 3 PCR fragments. These provide essentially complete coverage of the ~79 Mb of euchromatin. 685 clones are currently finished (93.5%) and the remainder are active in finishing (32 exist as phase 2 ordered accessions). There are currently 8 clone gaps. Four of the clone gaps are small, with a total combined size of less than 100 Kb. Four clone gaps occur in complex segmental duplication regions and are estimated to be small but have not been reliably sized. We have discovered a high level of intrachromosomal duplications during the mapping and sequencing of this chromosome. To help us overcome the complexities of assembling the correct sequence over the most complex of these segmental duplications, we have drafted over 400 additional BAC, cosmid and fosmid clones specifically targeted at duplications and finished close to 100 redundant clones. These efforts allowed us to produce sequence contigs representing a single haplotype across many segmental duplications. We find that 7.8 Mb (~10% of the chromosome) consists of intra-chromosomal duplicated sequence. This is significantly higher than the estimate of 3.4% made by the public consortium effort, based on the analysis of the draft sequence of the human genome. Intrachromosomal duplications occur in 109 duplication blocks along the chromosome. The largest of these segmental duplications is 520,022, 423,731, and 424,145 bp, and these contain many smaller duplications. Many duplications contain known and predicted genes. The Polycystic Kidney Disease 1 (PKD1) gene for example is duplicated or partially duplicated as 5 copies on chromosome 16. The nuclear pore complex interacting protein (NPIP) is copied 23 times on chromosome 16 and displays greater sequence divergence of its exons than its introns which provides an indication of positive selection acting at this locus. We will present further detailed sequence and evolutionary analysis of the complete set of intrachromosomal duplications.

Supported by the U.S. DOE under contract No. W-7405- ENG-36.



19

The Molecular Basis for Metabolic and Energetic Diversity

Timothy Donohue[1] (tdonohue@bact.wisc.edu), Jeremy Edwards[2], Mark Gomelsky[3], Jonathan Hosler[4], Samuel Kaplan[5], and William Margolin[5]

[1]Bacteriology Department, University of Wisconsin-Madison; [2]Chemical Engineering Department, University of Delaware; [3]Department of Molecular Biology, University of Wyoming; [4]Department of Biochemistry, University of Mississippi Medical Center; and [5]Department of Microbiology and Medical Genetics, University of Texas Medical School at Houston

Our long-term goal is to engineer microbial cells with enhanced metabolic capabilities. As a first step, this team of scientists and engineers seeks to acquire a thorough understanding of energy-generating processes and genetic regulatory networks of the photosynthetic bacterium, Rhodobacter sphaeroides. The ability to capitalize on the metabolic activities of this versatile bacterium was increased by the completion of the R. sphaeroides genome sequence at the DOE-supported Joint Genome Institute. The R. sphaeroides Genomes to Life Consortium is deciphering important energy-generating activities of this bacterium and studying the assembly and operation of energy generating machines. The long term goals of these efforts are to acquire the information needed to design microbial machines that degrade toxic compounds, remove greenhouse gases, or synthesize biodegradable polymers with increased efficiency. At the March 2003 workshop, we will provide a progress report on our analysis of the metabolic capabilities of this facultative microorganism.

In particular, we will report on activities in the following areas. 1. The identification of proteins that are central to growth via respiration and the utilization of solar energy by photosynthesis. 2. The formulation of a first generation metabolic map and new software tools that will aid future analysis of the pathways and regulatory networks of this bacterium. 3. Microscopic imaging techniques that will allow us to visualize the organization of the photosynthetic apparatus and the assembly of key bioenergetic molecular machines. In this poster, we hope to illustrate why this cross-disciplinary, systems approach to the analysis of energy generation by this facultative bacterium can provide new insights into fundamental aspects of energy generation by this photosynthetic organism.



20

Biomarker Discovery for Brucella melitensis Wild Type and Vaccine Strains using SELDI-MS Technology

Michel Eschenbrenner[1] (eschenbrenm2@scranton.edu), Mary Ann Wagner[1], Frank Estock[1], Cesar V. Mujer[1], Sue Hagius[2], Philip Elzer[2], and Vito G. DelVecchio[1]

[1]Institute of Molecular Biology and Medicine, The University of Scranton, Scranton, Pennsylvania 18510; and [2]Department of Veterinary Science, Louisiana State University AgCenter, Baton Rouge, Louisiana 70803

The Gram-negative bacteria, Brucella, are responsible for brucellosis, a zoonotic disease afflicting various domesticated animals and humans. Their importance as potential bioterrorism agents requires the need for a quick and efficient identification system. Surface-Enhanced Laser Desorption/Ionization (SELDI) technology allows selective protein capture from crude extracts. SELDI-MS was used to compare the wild type 16M from the vaccine strain Rev 1. The proteins were bound on different protein chips, and their respective spectra were compared. Seven putative biomarkers, with molecular masses ranging from 6.5 to 85.2 kDa, were identified for each strain. Two protein peaks were specifically detected in Rev 1 using normal phase chips and five protein peak differences were observed between 16M and Rev 1 using weak cation exchange chips.



21

Multi-Species Comparative Sequence Analysis of a 365 kb Interval on Human Chromosome 21 Surrounding SIM2

Kelly A. Frazer[1] (kelly_frazer@perlegen.com), Kazutoyo Osoegawa[2], Mark F. Doherty[1], Michael Jenn[1], Xiyin Chen[1], Pieter J. de Jong[2], and David R. Cox[1]

[1]Perlegen Sciences, 2021 Stierlin Court, Mountain View, CA 95051; and [2]Children’s Hospital and Research Center, Oakland, CA 94609

The rate of evolution varies widely in different regions of a genome within a species as well as for orthologous sequences between species. Thus, when performing cross-species sequence comparisons it is not possible to choose a standard threshold criteria of “functional” conservation that is applicable across the entire human genome for distinguishing between sequences that are conserved due to constraints from those that are conserved because of shared ancestry. We previously performed a three-way comparative analysis of human, mouse and dog DNA across a 6-Mb 21q22 region. This study suggested that comparing the sequences of multiple species is a powerful empiric means of distinguishing actively conserved sequences from sequences conserved due to shared ancestry. We have expanded this study to identify and compare the distribution of conserved human-horse, human-cow, human-pig, human-dog, human-cat, and human-mouse elements within a 365-kb interval in human 21q surrounding the single-minded 2 (SIM2) gene.

High-density arrays representing 365-kb of human chromosome 21 sequences were hybridized with orthologous horse, cow, pig, dog, cat, and mouse DNA to identify evolutionarily conserved human sequences. Approximately 15.8% (57,482 bp) of the human sequence analyzed was identified as conserved, of which ~28.3% (16,258 bp) is found in humans and only one of the six mammalian species, ~45.5% (26,157 bp) is found in humans and between two to five of the mammalian species, and ~26.2% (15,067 bp) is found in humans and all six of the mammalian species analyzed. These data suggest: 1. A significant fraction of the human DNA sequences that are evolutionarily conserved will not be identified by human-mouse sequence comparisons. 2. A comprehensive comparative analysis of the human genome for the identification of functional elements will require that it be compared with the genomic sequences of multiple mammals.



22

The Microbial Proteome Project: A Database of Microbial Protein Expression in the Context of Genome Analysis

Carol S. Giometti (csgiometti@anl.gov) and Gyorgy Babnigg

Biosciences Division, Argonne National Laboratory

Using complete genome sequences to predict the proteins expressed by a cell does not provide an accurate assessment of the relative abundance of proteins under different environmental conditions. In addition, genome sequences do not define the subcellular location, biomolecular and cofactor interactions, or covalent modifications of proteins that are critical to their function. Therefore, analysis of the protein components actually produced by cells (i.e., the proteome) in the context of genome sequence is essential to understanding the regulation of protein expression. As the number of complete microbial genome sequences increases, vast amounts of genome and proteome information are being generated. In parallel with the proteome analysis of numerous microbial systems, we are developing methods for managing and interfacing the diverse data types generated by both genome and proteome studies as part of Argonne’s Microbial Proteome Project. The goal is to provide users with a highly interactive database that contains proteome information in the context of genome sequence in formats conducive to data interrogations pertinent to biological questions. To achieve that goal, we are developing and maintaining three World Wide Web-based databases: Proteomes2, ProteomeWeb, and GelBank. The Proteomes2 database (http://proteomes2.bio.anl.gov) is a password-protected site that provides DOE project collaborators with access to the experimental details for approximately 1,000 samples from seven different microbes (Shewanella oneidensis, Geobacter sulfurreducens, Prochlorococcus marinus, Methanococcus jannaschii, Pyrococcus furiosus, Rhodopseudomonas palustris, and Deinococcus radiodurans) and links each sample with multiple protein patterns. ProteomeWeb (http://ProteomeWeb.anl.gov) is an interactive public site that provides the identification of expressed microbial proteins, links to genome sequence information, tools for mining the proteome data, and links to metabolic pathways. GelBank currently includes the complete genome sequences of approximately 90 microbes and is designed to allow queries of proteome information. The database is currently populated with protein expression patterns from the Argonne Microbial Proteomics studies and will accept data input from outside users interested in sharing and comparing proteome experimental results.

This research is funded by the United States Department of Energy, Office of Biological and Environmental Research, under Contract No. W-31-109-ENG-38.



23

Comparative Mapping and Sequencing of Syntenically Homologous Segments of Human Chromosome 19 Across Multiple Vertebrate Species Including Chicken

L.A. Gordon[1] (gordon2@llnl.gov), M. Tran-Gyamfi[1], R. Nandkeshwar[1], M. Groza[1], M. Christensen[1], E. Fields[1], P. Butler[1], M. Wagner[1], I. Ovcharenko[2], A. Aerts[3], K. Kadner[3], J. Smith[4], R. Crooijmans[5], M. Groenen[5], S. Lucas[3], and L. Stubbs[1]

[1]Biology and Biotechnology Research Program, Lawrence Livermore National Laboratory, Livermore CA; [2]Lawrence Berkeley National Laboratory, Berkeley, CA; [3]D.O.E. Joint Genome Institute, Walnut Creek, CA; [4]Department of Genomics and Bioinformatics, Roslin Institute, Roslin, U.K.; and [5]Department of Animal Sciences, Wageningen Agricultural University, Wageningen, The Netherlands

Cross-species comparison of syntenically homologous, conserved sequence provides biologically relevant insight into the human genome and evolutionary processes. It facilitates the identification of low-copy or rarely expressed genes, signals the presence of otherwise difficult to detect non-coding regulatory elements, and sheds light on ancestral genome organization, lineage-specific chromosomal rearrangements and mechanisms of gene evolution. We previously mapped and sequenced human chromosome 19 (HSA19) - related homology segments in mouse (Genomics 74:129-141, 2001; Science 293:104-111, 2001). While comparisons between the two mammalian species are proving extraordinarily helpful, biological understanding is substantially enhanced by comparing sequences from additional reference species at informative evolutionary distances. To this end we have mapped and are sequencing HSA19-related regions from a third, evolutionarily more distant vertebrate, the chicken.

As homology breakpoints had not been previously detailed in chicken we designed overgo and PCR probes wherever protein-translated HSA19 gene sequences identified well-conserved (60-95%) chicken ESTs. Probes for over 100 gene loci were hybridized successfully to three BAC libraries, one from Gallus domesticus and two from Gallus gallus. Clones identified by hybridization were restriction digested and assembled into maps to assess clonal integrity and overlap, facilitate contig extension, identify homology breaks and generate efficient sequencing tiling paths. Contigs homologous to HSA19p13.3 and p13.1 located on chicken chromosome 28 (GGA28), as well as islands of HSA19q homology scattered throughout the chicken genome, have been successfully characterized and 170 clones submitted for sequencing at the JGI. Preliminary analyses of 80 clones yields sequence-based identification of additional syntenic orthologs and provides high resolution detail of homology segment breakpoints and rearrangements.

As expected, chicken sequence exhibits much higher levels of conservation relative to mouse and human than, for instance, that of the evolutionarily more remote puffer fish, Fugu rubripes, recently sequenced at the JGI (Science 297:1301-1310, 2002). While linkage groups as a whole are well conserved in chicken, interruptions and rearrangements in synteny at the level of gene-to-gene resolution are pervasive. Comparisons of homology breakpoints between the three species suggest presumptive ancestral genome arrangements; in at least one case mouse and chicken share gene order that is not preserved in human, while in other cases disruptions in synteny can be attributed to breaks and rearrangements in mouse. These data are facilitating the annotation of HSA19 while shedding intriguing light on the mechanisms that drive genome evolution and vertebrate speciation.

This work was performed under the auspices of the U.S. Department of Energy by the University of California, Lawrence Livermore National Laboratory under Contract No. W-7405-Eng-48.



24

Isolation of DNA Binding Proteins from Nuclear Extracts by Biomolecular Interaction Analysis (BIA)-Based Ligand Fishing

Christopher A. Hack[1] (cahack@lbl.gov), Michael Murphy[1], Shirin Fuller[1], Lior Pachter[2], Dario Boffelli[1],[3], Sharon Doyle[1], Paul Richardson[1], and Eddy Rubin[1],[3]

[1]U.S. Department of Energy Joint Genome Institute, Walnut Creek, CA 94598, USA; [2]Department of Mathematics, University of California, Berkeley, CA 94720, USA; and [3]Department of Genome Sciences, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA

Genome Sequencing efforts are producing large quantities of data, but proper interpretation of these data is difficult without some understanding of the proteins encoded by the genomic DNA and the cis-regulatory elements which oversee their expression. Of particular interest are transcription factor proteins that regulate gene expression and the cis-regulatory elements to which they bind. The identification of transcription factors and their cis-regulatory binding sites will facilitate more comprehensive analysis of specific protein expression patterns, adding key information to the process of decoding the genome. We have developed an assay that uses Biomolecular Interaction Analysis (BIA)-based ligand fishing as a means to characterize the interactions of transcription factors that bind DNA enhancer elements, with the goal of identifying isolated protein binders by downstream analysis. The Surface Plasmon Resonance (SPR) biosensor in a BIA instrument allows for real-time monitoring of interactions between proteins and their binding partners. Double-stranded oligonucleotides of DNA matching sequence in regions immediately upstream of the human apolipoprotein A (apoA) promoter were captured in turn to the surface of BIA sensor chips. Nuclear extracts from human liver cell line HepG2 were passed over the captured oligonucleotides, and the interaction of DNA-binding proteins was monitored with the SPR biosensor. (Oligonucleotides and nuclear extracts were prepared as described by D. Boffelli et al., Science, Vol. 299, pp. 1391-4, (2003).) Protein binding to highly conserved regions of DNA was observed to be significantly stronger than binding to regions of DNA that showed greater sequence divergence between hominoids and Old World monkeys. In addition, the BIA process is non-destructive, allowing recovery of bound proteins for downstream analysis and identification by silver-stained PAGE and/or mass spectrometry. Protein bands were observed from samples eluted from conserved regions of DNA and analyzed by silver-stained PAGE, indicating the presence of one or more transcription factors specific to binding sites on the conserved oligonucleotides. Such specific bands were not observed on PAGE analysis of samples eluted from non-conserved regions, further supporting the hypothesis that the conserved regions of genomic DNA are functionally important as regulators of gene expression. Further work to characterize the eluted transcription factors by mass spectrometry is in progress.

This work was performed under the auspices of the U.S. Department of Energy, Office of Biological and Environmental Research, by the University of California, under Contracts No. W-7405-Eng-48, No. DE-AC03-76SF00098, and No. W-7405-ENG-36.



25

Differential Expansion of Homologous Zinc Finger Gene Clusters Located on Human Chromosome 19q13.2 and Mouse Chromosome 7 (Mmu7)

Aaron T. Hamilton[1] (hamilton28@llnl.gov), Mark Shannon[1],[3], Laurie Gordon[1],[2], Elbert Branscomb[2], and Lisa Stubbs[1],[2]

[1]Biology and Biotechnology Research Program (BBRP), Lawrence Livermore National Laboratory; [2]DOE Joint Genome Institute; and [3]Applied Biosystems

In comparative studies on regions of the mouse genome that are syntenically homologous to Human chromosome 19, we discovered one conserved region of HSA19q13.2 and mouse chromosome 7 (Mmu7) that contained a cluster of zinc finger genes (encoding transcriptional regulators) including multiple probable orthologous pairs of genes. However there appear to be very different duplication histories in the expansion of the cluster, resulting in unequal numbers of zinc-finger genes (21 human, 10 mouse) through differential duplication of ancestral genes such that strictly orthologous relationships have not been maintained. For example, one human gene (ZNF235) is related to six mouse genes which have apparently arisen by duplication events since the divergence of the two lineages, while a single mouse ZNF gene (Zfp61) remains as the single mouse “homolog” for ten recently duplicated human genes. We have developed a hypothesis to explain the phylogenetic history of this gene cluster and have studied the divergence of expression patterns for the duplicated genes. Planned experimental manipulation of individual zinc-finger (ZNF) gene expression will reveal potential target genes that may be regulated by the ZNF genes in the cluster, allowing comparisons between recently-duplicated ZNF genes and also an inter-species assessment of the functional conservation of orthologs. We have also begun to investigate how differences in expression patterns between paralogous zinc-finger genes are reflected in diverging structures of the duplicated regulatory elements of the ZNF genes. Because mammalian genomes contain hundreds of zinc-finger transcriptional regulators, many of which are of the same KRAB-ZNF type as those in the cluster we surveyed, such changes in gene number and expression patterns have implications for the study of mechanisms for tissue-specific gene regulation and for the analysis of the origin of genetic diversity on which natural selection acts. As this region is sequenced for other species the data for the cluster in each will be added to the comparative analysis. An overview and progress reports on these aspects of the project will be presented.



26

Genome Construction and Analysis in Rhodobacter sphaeroides 2.4.1

Samuel Kaplan (Samuel.Kaplan@uth.tmc.edu), Madhusudan Choudhary, Ronald C. Mackenzie, Jung Hyeob Roh, and William E. Smith

Microbiology & Molecular Genetics, University of Texas Health Science Center at Houston

Rhodobacter sphaeroides 2.4.1 is a free-living facultative photosynthetic member of the a-3 Proteobacteria. This organism is capable of displaying a diverse array of growth modes, reflecting its very substantial metabolic potential. Our interest in this organism has focused on its ability to transition from aerobic to anaerobic photosynthetic growth and on the structure and function of its complex genome.

The J.G.I. completed the high throughput genome sequence of R. sphaeroides 2.4.1 in October of 2001, resulting in 195 contigs, and these together with our own genome “skimming” project of chromosome II, enabled us to provide the compete physical assembly of chromosomes I (C-I) and chromosome II (C-II) which will be presented. Using the genome sequence data, and taking into consideration the third position bias of this high G+C organism (68.81%) we and members of the DOE-sponsored R. sphaeroides Microbial Cell Project Team, together with the Affymetrix Corp. constructed a Gene Chip. The “Chip” consists of probe sets for 4292 orf’s, 47-rRNA and tRNA genes and 394 intergenic regions, which for most part have been reassigned as orf’s following genome assembly.

Analysis of the transcriptome in our laboratory has proceeded along several parallel lines involving analyses of transcriptome expression under standard growth conditions, and employing the use of mutant organisms known to possess alterations in gene expression. In order to validate the results of the Gene Chip experiments we have developed standardized protocols for RNA isolation and cDNA development. We have performed all experiments in triplicate and have routinely obtained R values (Pearson Coefficient) of 0.980 or better. We have followed the ratios for the expression of all of the ribosomal proteins from each replicate to each other replicate, which is predicted to be 1.00 and which is revealed to be on average 0.983.

Because of our long-term interest in the aerobic to photosynthetic transition, we have focused the first of our transcriptome analyses on the expression of genes involved in photosynthesis, genes involved in taxis and flagellar assembly for which there are numerous duplicate and triplicate representatives, as well as generalized gene expression showing patterns of change under these growth conditions. We shall provide both the data derived form each of these sets of experiments as well as summary results which are more easily viewed and reveal for the first time a global picture of gene expression in specific, multi-dimensional regulatory systems of R. sphaeroides.

This work has been supported by the DOE Grant OBER DE-FG02-01ER63232 and USPHS Grant GM15590.



27

Characterization of an Imprinted Domain Located in Human Chromosome 19q13.4/ Proximal Mouse Chromosome 7

Joomyeong Kim (kim16@llnl.gov), Anne Bergmann, Angela Kollhoff, and Lisa Stubbs

Genomics Division, Biology and Biotechnology Research Program, L-441, Lawrence Livermore National Laboratory, 7000 East Avenue, Livermore, CA 94551

For a subset of mammalian autosomal genes, the two parental alleles are not functionally equivalent due to genomic imprinting. Imprinting involves inactivation of one allele, depending upon the parental origin. In early studies, we located one imprinted gene, Peg3 (paternally expressed gene 3), to human chromosome 19q13.4. We have isolated and characterized 5 additional imprinted genes from the 1MB-genomic intervals surrounding human and mouse PEG3, including Zim1 (imprinted Zinc-finger gene 1), Zim2, Zim3, Usp29 (Ubiquitin-specific processing protease 29), and Znf264. We are currently studying the potential regulatory mechanism controlling the imprinting and expression of these six genes using comparative genomics approaches. Based on our preliminary results, we predict that one region, the surrounding region of the first exon of Peg3, might be responsible for the imprinting of a whole domain. Sequence comparison of the regions derived from human, mouse and cow revealed the presence of one evolutionarily conserved sequence motif that is repeated multiple times within the first intron of Peg3 in all three mammals. DNA mobility shift and chromatin immunoprecipitation (ChIP) assays clearly demonstrated that this motif is an in vivo binding site for the Gli-type transcription factor YY1. The Peg3 YY1-binding sites are methylated only on the maternal chromosome in vivo, and ChIP assays confirmed that YY1 binds specifically to the unmethylated paternal allele of the gene. Promoter, enhancer and insulator assays with deletion constructs of sequence surrounding the YY1-binding sites indicate that the region functions as a methylation-sensitive insulator that may influence the imprinted expression of Peg3 and neighboring genes. Our current study is the first report demonstrating the involvement of YY1 in methylation-sensitive insulator activity and suggests a potential role of this highly conserved protein in mammalian genomic imprinting.



28

Comparative Analysis of Syntenic Genomic Sequences

Jonathan E. Moore[1] and James A. Lake[1],[2] (lake@mbi.ucla.edu)

[1]Molecular Biology Institute, University of California, Los Angeles; and [2]Departments of Molecular, Cell, and Developmental Biology, and Human Genetics, University of California, Los Angeles

Comparative analyses of genomic sequence holds great promise for the identification of genes, their structure, and various regulatory elements. We have developed a gene- and putative-regulatory- element-finder that utilizes a method called pattern filtering. Pattern filtering optimally filters the evolutionary signals of the conserved functional elements from the stochastic noise of mutation, allowing the reliable determination of biological elements. In tests of pattern filtering’s ability to predict coding regions in the 200-kb CD4 regions of human and mouse, our methods achieve a correlation coefficient per nucleotide of 98.6%, well above that of any gene-finder of which we are aware. In addition, our methods show conserved regions which do not code for proteins, which are assumed to be regulatory elements or genes of untranslated-RNAs. We are applying these methods to syntenic sequences in order to identify novel genetic and functional elements.



29

Proteomic Profiles of Rhodopseudomonas palustris

Nathan C. Verberkmoes[1], Caroline S. Harwood[2], Loren J. Hauser[1], Dale A. Pelletier[3], and Frank W. Larimer[3] (larimerfw@ornl.gov)

[1]Graduate School of Genome Science and Technology, University of Tennessee, Oak Ridge, TN; [2]Department of Microbiology, University of Iowa, Iowa City, IA; and the [3]Center for Molecular and Cellular Systems, Oak Ridge National Laboratory, Oak Ridge, TN

We recently described (VerBerkmoes, et al., J. Proteome Research 1:239-252, 2002,) a comprehensive method for proteome analysis that integrates both intact protein measurement (“top-down”) and proteolytic fragment characterization (“bottom-up”) mass spectrometric approaches, capitalizing on the unique capabilities of each method. This approach is being applied to proteomic profiling of the anoxygenic photobacterium Rhodopseudomonas palustris. Multiple physiological states, i.e., aerobic heterotrophic growth, anaerobic heterotrophic growth, anaerobic photoheterotrophic growth, and anaerobic phototrophic growth, are being profiled. In addition, profiles of mutants defective in major assembly and regulatory processes are being profiled. The proteomic profiles are also being used to enhance the annotation of the genome: a significant number of “genes of unknown function” have been authenticated, and their cellular localization and physiological response are now known. Over 25% of the proteins profiled represent the “unknown” class.



30

Molecular Comparisons of Gene Homologs in Primates

N. Kouprina, V. N. Noskov, J. C. Barrett, and V. Larionov (larionov@mail.nih.gov)

Laboratory of Biosystems and Cancer, National Cancer Institute, NIH, Bethesda, MD 20892

Transformation-Associated Recombination (TAR) cloning allows selective isolation of a desired chromosomal region or gene from complex genomes. The method exploits a high level of recombination between homologous DNA sequences during transformation in the yeast Saccharomyces cerevisiae. We investigated the effect of nonhomology on the efficiency of gene capture and found that up to 15% DNA divergence did not prevent efficient gene isolation. Such tolerance to DNA divergence greatly expands the potential applications of TAR cloning for comparative genomics. We efficiently and accurately isolated primate gene homologs using a TAR vector containing a human gene targeting sequences. Complete copies of the breast cancer BRCA1 (80 kb) and a major determinant of cerebral cortical size, the gene ASPM, (70 kb) were isolated from chimpanzee, gorilla, orangutan and rhesus macaque genomes, sequenced and compared to corresponding human DNA sequences. Such comparison allowed to follow the gene evolution in great apes and explain a high frequency of intragenic rearrangements in BRCA1 in human population. Because the entire isolation procedure of a gene homolog from several primates could be accomplished in approximately 2 weeks, TAR cloning is a powerful tool for comparative genomics.



31

Elucidating the Role of Two Mammalian Telomerase-Associated Protein Components in vivo—TERT and VPARP

Yie Liu[1] (liuy3@ornl.gov), Bryan E. Snow[2], Wen Zhou[3], Natalie Erdmann[2], Karuna Chourey[1], Marla Gomez[1], Murray O. Robinson[3], and Lea Harrington[2]

[1]Functional Genomics Group, Life Sciences Division, Oak Ridge National Laboratory, TN 37831-6445; [2]Ontario Cancer Institute/Amgen Institute, Department of Medical Biophysics, University of Toronto, 620 University Avenue, Toronto, Ontario M5G 2C1 Canada; and [3]Amgen Inc., 1840 DeHavilland Drive, Thousand Oaks, CA 91320

Telomeres are DNA-protein complexes localized on the end of each chromosome, the function of which is to cap and protect chromosomes against degradation or fusion. Telomeres thus play an essential role in the control of genomic stability. Although telomeres are lost during the aging process in most human somatic cells, telomeres are maintained in germ line cells due to the expression of telomerase, which catalyzes the addition of telomeres and replenishes telomere loss during cell division. Eukaryotic telomerase contains a telomerase reverse transcriptase (TERT) and an RNA template component that together comprise its catalytic core; several other associated factors, of which only a few have been identified, are also known to be essential. We used a gene targeting approach to generate embryonic stem cells and mice lacking TERT or one telomerase associated proteins, VPARP, in order to determine the role of these proteins in vivo.

ES cells lacking mTert lose telomerase activity and show progressive telomere shortening, leading to end-to-end fusions and genetic instability. ES cells heterozygous for mTert knockouts also showed a progressive loss of telomeric DNA; however, despite an average telomere length similar to mTert null ES cells, no genetic instability was observed and a minimal amount of telomeric DNA can be detected at all chromosome ends. Taken together with previous studies, these finding suggest it is the presence of a subset of critically short chromosome ends, and not a shorter average telomere length per se, that herald the onset of genetic instability.

VPARP-deficient mice are viable and fertile. Furthermore, there is no detectable change in telomerase activity or telomere length in early passages of Vparp-deficient ES cells and tissues from early generation deficient mice. Since VPARP is also localized to the mitotic spindle, we examined microtubule and spindle architecture, chromosome stability and chromosome segregation in VPARP deficient mice. These data will also be presented.



32

Noncoding Deletion Present in Van Buchem Patients Removes Essential Regulatory Elements Required for Bone-Specific Expression of BMP-Antagonist Sclerostin

Gabriela G. Loots[1] (ggloots@lbl.gov), Michaela Kneissel[2], Mary Brunkow[3], Jessie Chang[1], Dmitriy Ovcharenko[1], Ingrid Plajzer-Frick[1], Veena Afzal[1], and Edward M. Rubin[4]

[1]Life Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA; [2]Novartis Pharma, Basel, Switzerland; [3]Celltech Inc., Bothell, WA, USA; and [4]DOE, Joint Genome Institute, Walnut Creek, CA, USA

Sclerosteosis is a generalized progressive bone overgrowth disorder due to the loss of function of the SOST gene product sclerostin. Van Buchem disease is a similar skeletal disorder characterized by milder sclerosteosis-like phenotypes and is associated with the presence of a 52 kb deletion (VBDel) located ~35kb downstream of the SOST transcript and ~10kb upstream of the MEOX1 gene on human chromosome 17p21. Human-Mouse comparative sequence analysis revealed several highly conserved noncoding elements present in the VBDel suggesting that Van Buchem disease is caused by the removal of essential SOST-specific regulatory elements. Using in vitro BAC-recombination techniques we have engineered a ~160kb human BAC containing the SOST and MEOX1 transcripts by removing the ~52kb intergenic region absent in patients suffering from Van Buchem Disease. We have generated several lines of transgenic animals carrying either the wildtype human SOST BAC or the VBDel modified BAC. Following the expression pattern of the endogenous sost mouse gene, we have investigated the expression pattern of the human transgenes in these two types of transgenic animals. Similar to the murine SOST expression, the human SOST transcript from the wildtype BAC is predominantly expressed in the mineralized bones of fetal, neonatal and adult mice, as well as in the apical ectodermal ridge of the developing embryo. Transgenic animals carrying the modified VBDel BAC fail to express the human SOST transcript in mineralized bone, while the embryonic expression of this transgene is unaffected. Using comparative sequence analysis and transient transgenic technology we have testes all the evolutionarily conserved noncoding elements present in the VBDel for the potential to drive expression in vitro and in vivo. Our findings suggest that Van Buchem disease is caused by a regulatory mutation that diminishes osteoblast-specific expression of the BMP-antagonist sclerostin.



33

Evolutionary Analysis of Enzymatic Functions and Metabolic Pathways

N. Maltsev, E. Marland (marland@mcs.anl.gov), A. Rodrigez, D. Sulakhe, R. Krishnamurthy, L. Ulrich, and P. Anumula

Mathematics and Computer Science Division, Argonne National Laboratory

Bioinformatics group at Argonne National Laboratory is developing an integrated computational environment WIT3 for high-throughput analysis of the genomes, metabolic reconstructions and evolutionary analyses of metabolic networks. It includes the following components: a) databases containing sequence, metabolic, and chemical data, b) GADU—an automated pipeline with the scalable backend for high-throughput analysis of the genomes. GADU utilizes distributed computing technology (Globus) and DOE Science Grid and ANL computational resources for analysis of biological data c) rule-based knowledge base for evolutionary analysis of enzymes, and d) tools and algorithms for analysis of protein families developed by our group (e.g. PhyloBlocks, PhE-B, SVMMER). Analysis of 106 prokaryotic genomes is available via WIT3.



34

Analysis of Novel Deinococcus radiodurans Mutants following Whole Genome Transcriptome Analysis

Vera Yu. Matrosova[1] (vmatrosova@usuhs.mil), Marina V. Omelchenko[1] (omelchen@ncbi.nlm.nih), Amudhan Venkateswaran[1], Min Zhai[1], Mathias Hess[1], Elena K. Gaidamakova[1], Kira S. Makarova[2], Jizhong Zhou[3], and Michael J. Daly[1] (mdaly@usuhs.mil)

[1]Uniformed Services University of the Health Sciences, 4301 Jones Bridge Road, Bethesda, MD 20814, Tel: 301-295-3750; [2]National Center for Biotechnology Information, NIH, Bethesda, MD; and [3]Oak Ridge National Laboratory, Oak Ridge, TN

Deinococcus radiodurans R1 (DEIRA) is a Gram-positive aerobic bacterium with an extraordinary resistance to ionizing radiation. Molecular mechanisms underlying this phenotype remain poorly understood. To define the repertoire of DEIRA genes responding to acute irradiation (15 kGy), transcriptome dynamics were examined in cells representing early, middle, and late phases of recovery using DNA microarrays covering ~94% of its predicted genes. At least at one time point during DEIRA recovery, 832 genes (28% of the genome) were induced and 451 genes (15%) were repressed two-fold or greater. All genes were classified according to general expression patterns. Genes induced in the early phase of recovery (displaying a recA-profile) included those involved in DNA replication, repair, recombination, cell wall metabolism, cellular transport, and many encoding uncharacterized proteins. To test if uncharacterized genes implicated by transcriptional profiling contribute to its resistance phenotype, DEIRA mutants were constructed and characterized.



35

Functional Annotation of Human Genes by Gene-Driven Chemical Mutagenesis in Mice

E. J. Michaud[1],[2] (michaudejiii@ornl.gov), C. T. Culiat[1],[2], Z. Liu[3], K. Krylova[3], F. W. Larimer[1], K. T. Cain[1], D. J. Carpenter[1], L. L. Easter[1], C. M. Foster[1], A. W. Gardner[1], K. J. Houser[1], L. A. Hughes[1], M. Kerley[1], T.-Y. S. Lu[1], R. E. Olszewski[1], I. Pinn[1], G. D. Shaw[1], S. G. Shinpock[1], A. M. Wymore[1], M. L. York[1], E. J. Baker[1], J. R. Snoddy[1], D. K. Johnson[1],[2], and E. M. Rinchik[1],[2],[4]

[1]Life Sciences Division, Oak Ridge National Laboratory, P.O. Box 2008, Oak Ridge, TN 37831; [2]The University of Tennessee Oak Ridge National Laboratory Graduate School of Genome Science and Technology, Oak Ridge, TN 37830; [3]SpectruMedix, 2124 Old Gatesburg Road, State College, PA 16803; and [4]Department of Biochemistry, Cellular, and Molecular Biology, University of Tennessee, Knoxville, TN 37996

The availability of the complete DNA sequence of the mouse genome, coupled with the development of high-throughput methods for rapid detection of single-nucleotide polymorphisms (SNPs), have made it practical to consider genome-wide, gene (sequence)-driven approaches to mouse germline mutagenesis. Such gene-driven strategies allow one to perform whole-genome mutagenesis, and then screen for alterations in any pre-selected gene(s). To complement embryonic stem-cell-based gene-driven mutagenesis resources, such as gene-trap libraries and banks of N-ethyl-N-nitrosourea (ENU)-mutagenized ES cells, we have been generating a cryopreserved bank of DNA, tissues (for RNAs and proteins), and sperm from 4,000 C57BL/6JRn mice that each carry a unique load of paternally induced ENU mutations. This ORNL Cryopreserved Mutant Mouse Bank (CMMB) is a source of induced, heritable SNPs in both regulatory regions and coding sequences of virtually every gene in the genome. High-throughput Temperature Gradient Capillary Electrophoresis (TGCE) is used to identify mutations by heteroduplex analysis in pre-selected genes in the CMMB DNA panel, and mutant stocks will be recovered by in vitro fertilization or intracytoplasmic sperm injection from the parallel bank of frozen sperm. Thus, the CMMB will provide mouse models of a wide range of altered proteins for phenotypic, gene/protein-network, and structural biology-type analyses. We will present progress on (i) production of the 4,000-member CMMB (now completed); (ii) methods used for mutation screening by high-throughput TGCE; (iii) our current estimate of the per-base-pair mutation frequency in the CMMB; and (iv) reconstitution of mutant stocks.

Research sponsored by the Laboratory Directed Research and Development Program of Oak Ridge National Laboratory, and by the Office of Biological and Environmental Research, U.S. DOE, under Contract No. DE-AC05-00OR22725 with UT-Battelle, LLC.



36

The Genome of Marine Synechococcus sp. Strain WH8102

Brian Palenik[1] (bpalenik@ucsd.edu), Bianca Brahamsha[1], Jay McCarren[1], Eric Allen[1], Eric Webb[5], John Waterbury[5], Fred Partensky[4], Alexis Dufresne[4], Frank Larimer[2], Miriam Land[2], Ian Paulsen[3], and Patrick Chain[6]

[1]Scripps Institution of Oceanography, University of California, San Diego, La Jolla, CA; [2]Oak Ridge National Laboratory, Oak Ridge, TN; [3]The Institute for Genomic Research, Rockville, MD; [4]Station Biologique, Roscoff, France; [5]Woods Hole Oceanographic Institution, Woods Hole, MA; and [6]Lawrence Livermore National Laboratory, Livermore, CA

Cyanobacteria in the open oceans are major contributors to carbon fixation on a global scale. The sequencing and analysis of the genome of marine Synechococcus sp. strain WH8102 shows for the first time that these organisms are highly adapted to their oligotrophic marine environment, with relatively small compact genomes and reduced regulatory machinery. WH8102, for example, utilizes more sodium-dependent transporters than a model freshwater cyanobacterium. It also appears to have adopted strategies for conserving limited iron stores by using nickel and cobalt in some enzymes. In contrast to other marine cyanobacteria, however, WH8102 appears to be more of a generalist, possibly due to its novel ability among cyanobacteria to swim toward nutrient patches. This microorganism is predicted to transport dissolved organic nitrogen (DON) and phosphorus (DOP) sources that are likely present but have been largely ignored to date in phosphorus and nitrogen cycling of oligotrophic environments. The genome of WH8102 appears to have been greatly influenced by horizontal gene transfer, likely through phages. The genetic material contributed by horizontal gene transfer appears to include multiple glycosyltransferases. These may help the cell change its surface glycosylation and thus evade detection by grazers and/or phages. Horizontal gene transfer may have also contributed the genetic material that was used to develop the novel form of swimming motility seen in this strain and closely related cyanobacteria.



37

Genomes to Proteomes to Life: Application of New Technologies for Comprehensive, Quantitative and High Throughput Microbial Proteomics

Richard D. Smith (rds@pnl.gov), James K. Fredrickson, Mary S. Lipton, David G. Camp, Gordon A. Anderson, Ljiljana Pasa-Tolic, Ronald J. Moore, Margie F. Romine, Yufeng Shen, Yuri A. Gorby, and Harold R. Udseth

Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99352

At present our understanding of biological processes is substantially incomplete; e.g. we do not know with good confidence all the biomolecular players in even the most studied pathways and networks in microbial systems. It is clear that many important signal transduction proteins will be present only at very low levels (~ hundreds of copies per cell) and will provide extreme challenges for current characterization methods. There is also a growing recognition of the limitations associated with gene expression (e.g. cDNA array) measurements. Increasing evidence indicates that the correlation between gene expression and protein abundances can be low, and that the correlation between gene expression and gene function is even lower. Thus, global protein characterization (proteomic) studies actually complement gene expression measurements.

Successes in genome sequencing efforts have provided an informatic foundation for high throughput proteomic measurements to broadly identify large numbers of proteins and their modification states with high confidence, as well as to measure their abundances. The challenges associated with making useful comprehensive proteomic measurements include identifying and quantifying large sets of proteins that have relative abundances spanning more than six orders of magnitude, that vary broadly in chemical and physical properties, that have transient and low levels of modifications, and that are subject to endogenous proteolytic processing. Additionally, proteomic measurements should not be significantly biased against e.g. membrane, large or small proteins. A related need is the ability to rapidly and reliably characterize protein interactions with other biomolecules, particularly their multi-protein complexes. The combined information on protein complexes and the changes observed from global proteome measurements in response to a variety of perturbations is essential for the development of detailed computational models for microbial systems and the eventual capability for predicting their response e.g. to environmental changes and mutations.

We report on development and application of new technologies for global proteome measurements that are orders of magnitude more sensitive and faster than existing technologies. The approaches are based upon the combination of nano-scale ultra-high pressure capillary liquid chromatography separations and high accuracy mass measurements using Fourier transform ion cyclotron resonance (FTICR) mass spectrometry. Combined, these techniques enable the use of highly specific peptide ‘accurate mass and time’ (AMT) tags. This new approach avoids the throughput limitations associated with other mass spectrometric technologies using tandem mass spectrometry (MS/MS), and thus enables fundamentally greater throughput and sensitivity for proteome measurements. Additional new developments have also significantly extended the dynamic range of measurements to approximately six orders of magnitude and are now providing the capability for proteomic studies from very small cell populations, and even single cells. A significant challenge for these studies is the immense quantities of data that must be managed and effectively processed and analyzed in order to be useful. Thus, a key component of our program involves the development of the informatic tools necessary to make the data more broadly available and for extracting knowledge and new biological insights from complex data sets.

The development of this new technology is proceeding in concert with its applications to a number of microbial systems (initially Shewanella oneidensis MR1, Deinococcus radiodurans R1, and Rhodopseudomonas palustris) in collaboration with leading experts on each organism. This research is providing the first comprehensive information on the nature of expressed proteins by these systems and how they respond to mutations in the organism or perturbations to its environment. Initial studies applying these approaches have demonstrated the capability for automated high-confidence protein identifications, broad and unbiased proteome coverage, and the capability for exploiting stable-isotope (e.g. [1][5]N) labeling methods to obtain high precision relative protein abundance measurements from microbial cultures. These initial efforts have demonstrated the most complete protein coverage yet obtained for a number of microorganisms, and have begun revealing new biological understandings.

Finally, it is projected that the AMT tag approach can also be extended to the characterization of the proteomes of much more complex microbial communities.

This research is supported by the Office of Biological and Environmental Research of the U.S. Department of Energy. Pacific Northwest National Laboratory is operated for the U.S. Department of Energy by Battelle Memorial Institute through Contract No. DE-AC06-76RLO 1830.Rd., Germantown, MD 20874.



38

An Integrated Approach to Functional Annotation of Mammalian Genomic Sequence

Lisa Stubbs[2] (stubbs5@llnl.gov), Xiaochen Lu[1], Joomyeong Kim[2], Aaron Hamilton[1], Nagarajan Lakshmanan[1], Sha Hammond[1], Eddie Wehri[1], Matt Groza[1], Thomas Gulham[1], Mary Tran[2], Tim Harsch[1], Laurie Gordon[2], and Art Kobayashi[2]

[1]Genome Biology Division, Lawrence Livermore National Laboratory and [2]D.O.E. Joint Genome Institute, 7000 East Avenue, L-441, Livermore CA 94550

With the availability of finished human sequence, high quality draft sequence from mouse, rat, and Fugu, and the genomes of vertebrates from other evolutionary branches on the way, we are now in possession of powerful tools for a full functional description of all the genes, regulatory sequences and other functional elements in the human genome. Comparative alignment with sequences from divergent vertebrate genomes has proved to be a powerful tool for distilling out that small fraction of the human genome with critical, evolutionarily conserved functions. The differences between related genomes can also be very revealing, especially when they can be linked to species-specific aspects of biology. We are focused on both the conservation and change of protein-coding genes and the regulatory networks that control their transcription in vertebrate evolution.

Effectively mining conserved elements from alignments of multiple, complex genomes is itself a daunting task, but one for which excellent computational tools have been developed in recent years. Once similarities and differences have been distilled from complex genomes computationally, however, the task of confirming predictions about the functions of these sequences remains a significant experimental challenge. To meet that challenge, we have begun to assemble a suite of experimental tools to test functional predictions regarding conserved human sequences in a high-throughput manner. We have focused on testing the validity of predicted genes and regulatory elements in human chromosome 19 (HSA19), and especially gene-rich chromosome that has recently been finished by JGI teams (J. Grimwood et al., in preparation). To add extra depth to HSA19 genome comparisons, we are generating sequence from related regions of the chicken genome, which because of its position in the evolutionary tree permits particularly informative comparisons.

We are integrating verification of predicted gene structures and the functional testing of candidate regulatory sequences in cell culture with high-throughput methods for determining gene expression in sectioned mouse and human tissues. Our goal is to produce a fully annotated version of HSA19 sequence with all transcription units, promoters and enhancers verified experimentally, with novel genes archived as full-length sequences in expression vectors, with cell-type specific expression patterns determined in both human and mouse, and with lineage-specific conservation and change in coding and non-coding elements fully documented for future study. This project is integrated with related studies ongoing in the laboratory of Barbara Wold, California Institute of Technology, involving a major effort to integrate HSA19 genes into global regulatory networks using microarray expression technology, to develop tools for automated analysis of in situ images, and to test regulatory elements in vivo using high-throughput transgenic methods. Although these studies are focused on specifically on the 60 Mb and 1400 genes of HSA19, the tools we are developing should be extrapolated easily to functional annotation of any complex genome. [Related abstracts including details of specific aspects of this project and closely integrated programs will also be presented; see abstracts by L. Gordon et al.; S. Hammond, N. Lakshmanan et al.; A. Hamilton et al.; and J. Kim et al.].

This work was performed under the auspices of the U. S. Department of Energy, Office of Biological and Environmental Research by the University of California, Lawrence Livermore National Laboratory under Contract No. W-7405-Eng-48.



39

Comparative in silico Proteomes of Two Brucella Species

Mary Ann Wagner (wagnerm2@uofs.edu), Michel Eschenbrenner, Frank Estock, Cesar V. Mujer, and Vito G. DelVecchio

University of Scranton, Institute of Molecular Biology and Medicine, Scranton, PA 18510

Recent publications announcing completion of both the Brucella melitensis 16M and B. suis 1330 genomes provided invaluable information regarding the genomic potential of these important pathogens. As two whole genomes have been sequenced and annotated, researchers are now able to perform in silico comparative genomic analysis of these two organisms. Various in-house and web-based software packages were used to compile theoretical values of physical properties for all proteins of each organism, as determined by the primary annotations for each genome. These properties include pI, Mr, presence of secretory signal sequences, presence of transmembrane domains and grand average of hydropathy (GRAVY) scores. Theoretical parameters were evaluated with respect to species and chromosome of origin. A related member of the alpha proteobacteria, Agrobacterium tumefasciens C58, was subject to the same analyses. Chromosome II-encoded proteins of both B. melitensis and B. suis were found to have a greater average GRAVY score than those of chromosome I, indicating that chromosome II-encoded proteins are generally more hydrophobic than those of chromosome I. Thus, proteins arising from the two different chromosomes have different overall physical characteristics, and monochromosomic origin of both replicons is therefore not favored. In addition to genomic characteristics, such as GC content, it is suggested that overall characteristics of potential proteins may also provide a means to evaluate possible horizontal gene transfer events of large, protein-coding segments of DNA.



40

Signatures for the Detection, Identification and Characterization of Microbial Pathogens

P. Scott White (scott_white@lanl.gov), Lance Green, Murray Wolinsky, Tom Brettin, David Torney, and John Nolan

Los Alamos National Laboratory

The need for robust nucleic acid-based signatures has intensified with the recent focus on the development of tools for biothreat reduction. The availability of whole genome sequence data from pathogens and their neighbors makes it possible to develop signatures using a comparative genomics approach with appropriate levels of resolution and power of exclusion for each typing application.

Single nucleotide polymorphisms, or SNPs, are an abundant source of variation that can be used as signatures, and are amenable to a wide variety of scoring methods and platforms. Furthermore, the use of DNA sequence variation as signatures allows for technology-independent scoring and databases.

We will describe a DNA signature design pipeline that we are developing that makes use of whole genome sequence data. Using comparative genomics tools, targets for candidate signatures are determined, sequence data from the appropriate samples are collected, then phylogenetic analyses distill the signature to a highly informative subset of the total genetic variation discovered.

In addition, we will also describe a high throughput SNP scoring capability that we have recently developed. The method combines robust SNP scoring assays with a flow cytometry platform (i.e. no electrophoresis), and provides rapid scoring of numerous SNPs simultaneously (via multiplexing), with very high serial throughput rates. We will show examples of signature and assay design and implementation using Bacillus anthracis and influenza virus sequences.

By combining carefully designed, DNA/RNA sequence-based signatures with rapid typing it is possible to address many of the current and future surveillance, forensic, and clinical diagnostic needs.



41

Oligonucleotide-Directed Single Base DNA Alterations in Mouse Embryonic Stem Cells

Kyonggeun Yoon[1] (kyonggeun.yoon@mail.tju. edu), O. Igoucheva[1], V. Alexeev[1], and E. A. Pierce[2]

[1]Department of Dermatology and Cutaneous Biology, Jefferson Medical College; and [2]F.M. Kirby Center for Molecular Ophthalmology, University of Pennsylvania School of Medicine

We have investigated the use of single- stranded oligodeoxynucleotides (ODN) to introduce specific single-base alterations into endogenous genes in mouse ES cells. The primary advantage of this approach is the ability to introduce a specific base change into a gene of interest in a single step. We have recently demonstrated that ODN can be used to introduce targeted single base changes into the genomic DNA of mouse ES cells at approximately 0.01%. If oligonucleotides were to be used for gene targeting, how can we make it more practical? Low rates of homologous recombination, on the order of 10-5, were overcome by the ingenious use of selectable markers in gene targeting vectors. However, it has been difficult to devise a general selection strategy, because positive and negative selections used in the gene targeting vectors cannot be incorporated into ODN. We hypothesized that cells competent in ODN-mediated alteration of one gene might be also be competent in alteration of other gene. Based on this concept, we developed a selection strategy to identify cells that have undergone a gene modification by the use of two ODNs, one targeting a gene of interest and the other targeting a defective selectable marker gene that manifests a phenotypic change upon gene alteration. Our results indicate that if two oligonucleotides are present within the nucleus of a “repair-competent” cell, then dual targeting events could possibly occur with a relatively high frequency. Thus, the absolute frequency remains the same level, but the probability of finding cells with the desired gene alteration is increased by first selecting cells according to the phenotypic change. Such selected ES cells could in turn be used to create accurate mouse models of inherited diseases.



42

Microarray-Based Functional Analysis of the Radiation-Resistant Bacterium, Deinococcus radiodurans

Jizhong Zhou[1] (zhouj@ornl.gov), Yongqing Liu[1], Dorothea Thompson[1], and Michael Daly[2]

[1]Environmental Sciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831; and [2]Department of Pathology, Uniformed Services University of the Health Sciences, Bethesda, MD 20814

Deinococcus radiodurans (DEIRA) is a bacterium best known for its extreme resistance to the lethal effects of ionizing radiation, but the molecular mechanisms underlying this phenotype remain poorly understood. To define the repertoire of DEIRA genes responding to acute irradiation (15 kGy), transcriptome dynamics were examined in cells representing early, middle, and late phases of recovery using DNA microarrays covering ~94% of its predicted genes. At least at one time point during DEIRA recovery, 832 genes (28% of the total predicted genes) were induced and 451 genes (15%) were repressed two-fold or greater. The expression patterns of the majority of the induced genes resemble the previously characterized expression profile of recA following irradiation. DEIRA recA, which is central to genomic restoration following irradiation, is substantially up-regulated upon DNA damage (early phase) and down-regulated before the onset of exponential growth (late phase). Many other genes were expressed later in recovery, displaying a growth-related pattern of induction. Genes induced during the early phase of recovery included those involved in DNA replication, repair, recombination, cell wall metabolism, cellular transport, and many encoding uncharacterized proteins. Most striking was the observation that metabolic functions, in particular, appear to play crucial roles in DEIRA’s recovery from acute radiation. Collectively, the microarray data suggest that DEIRA cells efficiently coordinate their recovery by a complex network that involves the regulation of multiple cellular functions. Components of this network include a predicted novel ATP-dependent DNA ligase, which appears to functionally replace the repressed NAD-dependent DNA ligase, and metabolic pathway switching that could prevent additional genomic damage elicited by metabolism-induced free radicals.

[an error occurred while processing this directive]