Function and cDNA Resources Abstracts

DOE Human Genome Program
Contractor-Grantee Workshop VIII
February 27-March 2, 2000  Santa Fe, NM


Home
Author Index
Sequencing
Table of Contents
Abstracts   
Instrumentation
Table of Contents
Abstracts
Mapping 
Table of Contents
Abstracts
Bioinformatics
Table of Contents
Abstracts
Function and cDNA Resources
Table of Contents
Abstracts

Microbial Genome Program
Table of Contents
Abstracts
Ethical, Legal, and Social Issues
Table of Contents
Abstracts
Infrastructure
Table of Contents
Abstracts

Ordering Information

Abstracts from
Past Meetings

95. The I.M.A.G.E. Consortium: Progress Toward a Complete Set of Human Genes

Christa Prange, Peg Folta, Tim Harsch, Genevieve Johnson, Tom Kuczmarksi, Bernadette Lato, Leeanne Mila, David Nelson, and Anthony Carrano

Biology and Biotechnology Research Program, Lawrence Livermore National Laboratory, Livermore, CA 94550

prange1@llnl.gov

The I.M.A.G.E. Consortium is the largest publicly available collection of cDNAs, containing approximately three million clones. cDNAs are currently derived from five different species, with an emphasis on sampling from both normal and abnormal human and mouse tissue types at a variety of developmental stages. As a collaborative effort between the National Cancer Institute (NCI) and various academic groups, the Cancer Genome Anatomy Project employs EST sequence data to characterize normal, pre-cancerous, and cancerous cell types. We have also recently begun arraying cDNAs from full-length enriched libraries as part of the Mammalian Gene Collection, a collaborative effort between the I.M.A.G.E. Consortium, the National Institutes of Health, the National Center for Biotechnology Information, and many academic groups.

Another goal of the I.M.A.G.E. Consortium is to provide web-based software to aid in the analysis of clones derived from I.M.A.G.E. libraries. Re-arrayed sets of clones representing specific target genes will be chosen based on this clustering analysis, and made available for use by the community.

Further information about the I.M.A.G.E. Consortium is available by email (info@image.llnl.gov) or through the WWW.

This work was performed under the auspices of the U.S. Department of Energy, Office of Health and Environmental Research (OHER) by Lawrence Livermore National Laboratory under contract no. W-7405-Eng-48.


96. Analysis of Uncharacterized Human cDNAs which Encode Large Proteins in Brain

M. Oishi, T. Nagase, R. Kikuno, M. Hirosawa, and O. Ohara

Department of Human Gene Research, Kazusa DNA Research Institute, Kisarazu, Chiba, Japan

The aim of Kazusa cDNA project, which was initiated five years ago, is to accumulate sequence and other information of unidentified full-length human cDNA clones which encode large proteins in brain. Our ultimate goal is to characterize brain functions on molecular basis and to identify genes responsible for serious neurodisorders such as schizophrenia and bipolar disorders. From human brain cDNA libraries, which are quite versatile in their contents compared to those of other tissues, we have been focussing on large cDNA clones (with more than 4 kb in size), mainly because (1) genes known to be responsible for diseases tend to be large in size and (2) large cDNAs, for some reasons, have been left out in world-wide efforts of cDNA characterization. After initial screening for the clones to make large proteins in vitro, selected clones are subjected to entire sequencing, expression pattern analysis among major tissues and brain subtissues and determination of their chromosomal location. To date, characterization of more than 1200 cDNA clones has been completed and the information is accessible through a data base for Human Unidentified Genes Encoding (the HUGE protein data base).


97. Novel Approaches to Facilitate Gene Discovery and the Development of a Non-Redundant Arrayed Collection of Full-Length cDNAs

Sergey Malchenko1, Brian Berger1, Vera Da Costa Soares1, Maria De Fatima Bonaldo1, and Marcelo Bento Soares1,2

Departments of 1Pediatrics and 2Physiology and Biophysics, The University of Iowa, Iowa City, IA 52242

bento-soares@uiowa.edu

(A) Gene Discovery. Serial subtraction of normalized cDNA libraries has proven powerful to expedite gene discovery in large-scale EST programs. This strategy enabled us to generate large non-redundant collections of rat (41,000), mouse brain (22,000), and human (15,700) cDNAs within a two-year time frame, with minimal sequencing effort. This process, however, can only facilitate the identification of mRNAs that are represented in starting libraries. Since the number of primary recombinants needed to guarantee representation of rare mRNAs exceeds those typically attained in standard libraries, it is anticipated that a fraction of such transcripts will not be represented. Furthermore, mRNAs whose expression is limited to a small number of cells within a tissue may also not be appropriately represented in a bulk tissue library regardless of their level of expression. To address this problem, we developed a method aimed at the cloning of mRNAs that are either under- or not-represented in standard normalized libraries. We have applied this procedure to construct a mouse hippocampus cDNA library significantly enriched for rare mRNAs. Enrichment was documented by analysis of over 1,000 ESTs as well as by Southern hybridization of library DNA with a number of cDNA probes that were under-represented in the non-normalized or normalized mouse hippocampus libraries.

(B) Development of non-redundant collections of full-length cDNAs. Full-length-enriched libraries have been and continue to be constructed and made available to the Mammalian Gene Collection program, a trans-NIH initiative to generate high accuracy sequence of large numbers of full-length cDNAs. However, given that enrichments are typically of the order of 50%, some screening strategy is necessary for en masse selection of the full-length clones in these libraries. We are developing novel methods and strategies to address this problem with the goal of generating comprehensive non-redundant collections of arrayed full-length cDNAs.


98. From EST to High Quality cDNA: The BDGP Pipeline for the Construction of Drosophila cDNA Resources

Mark Stapleton1, Damon Harvey, Peter Brokstein, and Gerald M. Rubin

Berkeley Drosophila Genome Project-University of California, Berkeley, CA and 1Lawrence Berkeley National Laboratory, Berkeley, CA

staple@bdgp.lbl.gov

The Drosophila Genome Center's future goals are centered around functional genomics. By taking advantage of the Drosophila genomic sequence, we intend to develop tools and technologies for answering biological questions in a high-throughput environment. Our first step in this direction is to create a publicly available unigene set of Drosophila cDNAs and sequence them to high quality.

We have finished the second stage of creating a set of Drosophila cDNAs. The first stage consisted of sequencing greater than 80,000 5' ESTs and was finished March 19, 1999. For the second stage, these ESTs were then clustered based on their 5' ends to reduce redundancy, which resulted in a set of 12,198 clusters. The clone extending most 5' in each cluster has been selected and rearrayed. We have sequenced the 5' and 3' ends of these clones to verify their identity and to further collapse the set on the basis of their 3' identities. We also determined the length of the cDNA insert in each clone so that optimal sequencing strategies can be applied to specific size ranges. Finally, we have performed pilot experiments for full-length sequencing utilizing transposon-based methods that resulted in the completion of 283 high quality cDNAs.


99. The RIKEN Mouse Full-Length cDNA Encyclopedia

Piero Carninci, Kazuhiro Shibata, Masayoshi Itoh, Hideaki Konno, Jun Kawai, Yuko Shibata, Yuichi Sugahara, T. Endo, Y. Ozawa, Yoshifumi Fukunishi, Atsushi Yoshiki, M. Kisakabe, Masami Muramatsu, Yasushi Okazaki, and Yoshihide Hayashizaki

Genome Science Laboratory, RIKEN, Tsukuba Life Science Center, 3-1-1 Koyadai, Japan

carninci@rtc.riken.go.jp

We report the ongoing efforts to prepare the mouse full-length cDNA Encyclopedia. We have, to the date of the submission of this abstract, constructed more than 150 full-length cDNA libraries. Libraries were prepared with the CAP trapper full-length cDNA selection method coupled to the trehalose-thermoactivated reverse transcriptase, in order to clone long full-length cDNAs. Additionally, in most libraries, cDNAs were normalized and subtracted to isolate rarely expressed full-length cDNAs. To date, we have clustered 100,907 3' end sequences containing at least half of all mouse full-length cDNAs in 690,130 successful sequencing reactions from 5'-end validated libraries. We have also established technology and mRNA resources to collect the majority of remaining mouse expressed sequences.

We will discuss about library preparation and tissue selection strategy, quality of cDNA libraries in terms of complexity and full-length cDNAs presence and coverage of mouse genes by our clones.


100. Tissue Gene Expression Profiling Using RIKEN Full-Length Mouse 20K cDNA Microarray

Yasushi Okazaki1,2, Rika Miki1,2,3, Yosuke Mizuno1,2,3, Yasuhiro Tomaru1, Kouji Kadota1,2,4, Piero Carninci1, Kazuhiro Shibata1,2, Masayoshi Itoh1,2, Yasuhiro Ozawa1, Jun Kawai1,2, Hideaki Konno1,2, Yoshifumi Fukunishi1,2, Toshinori Kusumi1, Hitoshi Goto1,5, Hiroyuki Nitanda1,5, Yohei Hamaguchi1,6, Itaru Nishiduka1,6, Masami Muramatsu1,2, Atsushi Yoshiki7, Moriaki Kusakabe7, Joseph Derisi8, Vishy Iyer9, Michael Eisen9, Patric O. Brown9, and Yoshihide Hayashizaki1,2,3

1Laboratory for Genome Exploration Research Project, Genomic Sciences Center (GSC) and Genome Science Laboratory, Tsukuba Life Science Center, The Institute of Physical and Chemical Research (RIKEN), Koyadai, Japan; 2CREST, Japan Science and Technology Corporation (JST); 3Tsukuba University; 4University of Tokyo; 5Tohoku University, Sendai, Japan; 6Yokohama City University; 7Experimental Animal Research Division, Tsukuba Life Science Center, The Institute of Physical and Chemical Research (RIKEN), Koyadai, Japan; 8University of California; and 9Stanford University, Stanford, CA

yosihide@rtc.riken.go.jp

The target of the Genome Science Laboratory of RIKEN is to clone and sequence the largest number possible of full-length mouse cDNAs and then to sequence these cDNAs in two phases. The first phase is to classify the cDNAs and the second is to complete full-length sequencing and functional annotations. We have developed two original methods to construct full-length cDNAs efficiently: "cap-trapper" which preferentially recognizes the Cap site of mRNA and the "trehalose-thermoactivated reverse transcriptase (RT)" which allows the RT reaction at higher (60 C) temperature. We have constructed over 80 libraries from embryonic tissues of different developmental stages and adult tissues in order to ensure the greatest possible coverage of the expressed mRNA.

More than 200,000 successful sequencing passes have been performed with the use of two in house developed tools; a high-throughput plasmid preparation system and the RISA 384 capillary sequencer. Most of the sequences were performed from 3' end in order to select individual cDNAs. We have selected more than 65,000 different cDNAs.

Using these sets of RIKEN full-length cDNA, we have established Gene Expression Microarrays containing 20 K set of RIKEN full-length cDNA unique mouse genes (http://genome.rtc.riken.go.jp). These set have been used to profile expression patterns of various adult and embryonic tissues. Target DNAs were PCR amplified and printed on the Poly-L-lysine coated slide glasses. Target DNAs were blocked by excess amount of Cot1DNA. Probes were labeled by two-color fluorescent dye using random primer and reverse transcriptase. Normalization has been achieved using a global normalization method. We have also developed a program to filter the noise. The experiment was done twice and the reproducible results were extracted and clustered. We will present a large set of database, which show the spatial and temporal expression patterns of mice. These mouse full-length 20 K cDNA microarrays are widely applicable to analyze the global expression profiling of normal and diseased status of the mice.


101. The Molecular Genetics of DNA Repair in Drosophila

K.C. Burtis, R.S. Hawley, C. Boulton, K. Hollis, A. Laurencon, and D. Milliken

Section of Molecular and Cellular Biology, University of California at Davis, Davis, CA 95616

shawley@netcom.com

Screening for new repair-deficient mutants:

To date we have screened approximately 12,100 of 12,500 available lines of EMS induced mutations on the 2nd and 3rd chromosomes from the Zuker collection. Thus far we have both identified and confirmed by retest approximately 60 lines that display significant sensitivity to one or more mutagens. We are currently assigning the newly-isolated mutants to complementation groups as well as testing them for allelism with existing 2nd and 3rd chromosome mus mutations.

Characterization of existing mutagen-sensitive mutations:

We are in the process of carefully mapping the existing collection of mutagen-sensitive mutations. In several cases we have refined the map positions down to small genetic intervals and are testing P element insertions in the region for their ability to complement these mutations. In most cases however we are still in the process of positioning the mutant to within a numbered unit on the polytene chromosome. We have also continued our molecular and genetic characterization of two Drosophila ATM homologs. For one of these genes, mei-41, we have completed a synthetic lethal screen and begun a genetic fine structure analysis of existing mutant alleles.

Microarrays:

Glass slide microarrays have been produced that include over 10,000 Drosophila cDNAs. The cDNAs are derived from the Berkeley Drosophila Genome Project Unigene set, as well as from a set of testes-specific cDNAs generated by Dr. Brian Oliver at the NIDDK. We will report the results of our initial array experiments examining changes in gene expression resulting from exposure of Drosophila to various doses of ionizing radiation.

Genomics:

A comprehensive summary of Drosophila homologs of approximately 90 known DNA repair genes will be presented. This summary is based on analysis of the complete sequence of the Drosophila genome developed by Celera Genomics in collaboration with the Berkeley Drosophila Genome Project. Also integrated into this analysis is our current data regarding the association of these sequences with extant Drosophila mutagen-sensitive mutations.


102. The Tennessee Mouse Genome Consortium

D. K. Johnson1, D. R. Miller1, J. Snoddy1, B. A. Berven1, and E. M. Rinchik1,2

1Life Sciences Division, Oak Ridge National Laboratory, P.O. Box 2009, Oak Ridge, TN 37831-8077 and 2Department of Biochemistry, Cellular, and Molecular Biology, The University of Tennessee, Knoxville, TN 37996

k29@ornl.gov

In order to maximize our capabilities for screening mice for a wide variety of mutant phenotypes, we have joined with institutions across the state of Tennessee to form the Tennessee Mouse Genome Consortium (TMGC). Our goal is to combine and exploit the clinical and academic expertise resident at Oak Ridge National Laboratory, the University of Tennessee, Vanderbilt University, the University of Memphis, St. Jude's Children's Hospital, and Meharry Medical College to induce and analyze genetic mutations that alter development, behavior, biochemistry, and morphology in mice. Each institution has confirmed its commitment to the goals of the Consortium by signing a Memorandum of Cooperation, by agreeing to a set of scientific, administrative, and veterinary principles governing institutional interactions, and by providing start-up funding for investigators to develop analytical methods and tools that can contribute to TMGC research projects.

In addition to the broad-based screening supported by state-wide expertise in many fields, the factors that distinguish the TMGC are ORNL's unique history in mouse genetics/mutagenesis and in the design of genetic screens, as well as ORNL's strength in bioinformatics and computational biology. Our genetics strategy is designed to produce multiple mice that may express new recessive mutations in a visually-identifiable "test class", which permits multi-site screening, screening for innately variable phenotypes, and screening in an aged, test-class colony.

Pilot screens for mutations in the central nervous system have currently identified fifteen potential mutants in about 450 pedigrees screened from mutagenesis experiments targeting two regions of mouse chromosome 7 (see abstract by Rinchik, et al.). The infrastructure provided by the TMGC provides a basis for the pooling of our expertise in joint proposals to federal and non-federal sponsors for long-range support for this unified, large-scale effort to develop mouse models as community resources for human genetics research.


103. Designing Genetic Reagents to Facilitate the Mutagenesis and Functional Analysis of the Mouse Genome

Edward J. Michaud1,2, Qing G. von Arnim1,2, Carmen M. Foster1,4, Yun You1,2, Dabney K. Johnson1,2, and Eugene M. Rinchik1,2,3

1Mammalian Genetics and Development Section, Life Sciences Division, Oak Ridge National Laboratory, P.O. Box 2009, Oak Ridge, TN 37831-8077; 2University of Tennessee - Oak Ridge National Laboratory Graduate Program in Genome Science and Technology; and 3Department of Biochemistry, Cellular and Molecular Biology, University of Tennessee, Knoxville, TN 37996-0845, and 4Department of Pathology, College of Veterinary Medicine, University of Tennessee, Knoxville, TN 37901-1071

michaudej@bio.ornl.gov

Analysis of the molecular, cellular, and organismal consequences of induced and spontaneous mutations in mouse genes provides insight into the roles that genes play in human biological systems and disease. The complete DNA sequences of the human and mouse genomes will soon be available, and strategies are being developed to annotate the physical maps with gene-function maps. For many years, ORNL has used a phenotype-driven chromosome-region mutagenesis strategy in the mouse to map gene function in pre-selected segments of the genome. Currently, we are applying this strategy to approximately 8% of the mouse genome (see abstract by Rinchik et al.). In collaboration with the Joint Genome Institute (JGI), we are also conducting molecular, genomic, transcriptional, and DNA-sequence analyses of our mutagenized regions in order to integrate the genetic mutation maps with the transcript maps (see abstract by Johnson et al.).

This project forms the third component of the ORNL mutagenesis program; designing genetic reagents to facilitate regional-mutagenesis and functional-genomics analyses in additional portions of the genome. Our regional-mutagenesis strategy is based on having visibly marked (altered coat color, for example) chromosomal deletions or inversions in order to perform the mutagenesis and gene-function mapping in the most cost-effective, high-throughput, user-friendly, and error-free manner. However, the genetic reagents that facilitate these regional-mutagenesis screens are currently available for a limited portion of the mouse genome. We are employing embryonic stem-cell strategies to design marked chromosomal alterations in large, gene-rich regions of mouse chromosomes that are in synteny conservation with portions of the human genome being mapped and sequenced by the JGI. These reagents will facilitate additional mutagenesis screens in the mouse for the purpose of annotating human DNA sequence information with the whole-organism biological functions of genes. Our initial focus is on the proximal 23 cM of mouse Chromosome 7 (human 19q homology) and a 16 cM region of proximal mouse Chromosome 11 (human 5q homology). Efforts are also under way to complement the chromosome-region mutagenesis program by developing an integrated, systems-biological approach to analyzing complex multigenic traits in mice (see abstracts by Doktycz et al., and Snoddy et al.).


104. Mouse Genetics and Mutagenesis for Functional Genomics: Phenotype-Driven Regional Mutagenesis and Genomics at the Oak Ridge National Laboratory

E. M. Rinchik1,2, D. A. Carpenter1, E. J. Michaud1, Y. You1, P. R. Hunsicker1, L. B. Russell1, D. R. Miller1, M. L. Klegig2, and D. K. Johnson1

1Life Sciences Division, Oak Ridge National Laboratory, P.O. Box 2009, Oak Ridge, TN 37831-8077 and 2Department of Biochemistry, Cellular, and Molecular Biology, University of Tennessee, Knoxville, TN 37996

rinchikem@ornl.gov

A major goal of the mouse-genetics program at ORNL is to apply our experience in chemical germ-cell mutagenesis, mutation recovery and propagation, and broad-based phenotype screening, for creating a large, user-friendly mouse-mutation resource that can be used by the wider biological community for functional annotation of human DNA sequence. Our current overall program expands previous work that molecularly characterized regions of mouse Chromosome (Chr) 7 while also recovering N-ethyl-N-nitrosourea (ENU)-induced, recessive single-gene mutations. For example, in one screen of a ~5-cM Chr-7 region (human 11p and 15q homologies), simple phenotype-screening criteria had ascertained 19 new mutations in 1218 gametes, and, recently, broadly based phenotype-screening has yielded seven additional heritable mutations (including two subtle behavioral ones) in ~450 additional gametes, with another 13 subtle variants undergoing heritability testing. All mutations are being placed, by a simple set of genetic complementation crosses with overlapping deletions, into the rich DNA-sequence and expression map evolving for this region.

Mutations within two additional regions [mid-Chr 7 (human 15q homology), and mid-to-distal Chr 15 (human 8q, 22q, and 12q homologies] are being recovered using dominantly and recessively marked inversion chromosomes in three-generation screens, which allows easy detection and low-cost maintenance of chromosomally "pre-mapped" deleterious recessive mutations without any molecular genotyping. In parallel, deletions are being developed in embryonic stem cells for use as finer-mapping and gene-identification reagents. Our experimental design also provides for the generation of multiple mutant test-class mice of a singular genotype for comprehensive multi-site phenotype screening (e.g., across the Tennessee Mouse Genome Consortium) and for establishment of aging colonies to be screened for later-onset recessive phenotypes. It also provides a facile means for placing any mutation on a number of inbred genetic backgrounds to analyze modifier effects in genetic-network analyses. We estimate that approximately 8-10% of the genome will be covered by our screens in the near term, with even wider coverage possible as additional genetic reagents are created.


105. Defining Complex Genetic Pathways with Gene-Expression Microarrays

M. J. Doktycz1, B. H. Jones2, C. T. Culiat2, P. R. Hoyt1, B. W. Harker1, R. E. Barry4, D. D. Schmoyer3, S. Petrov3, E. M. Rinchik2,5, K. L. Beattie1, J. R. Snoddy3, and E. J. Michaud2

1Biochemistry and Biophysics Section, 2Mammalian Genetics and Development Section, and 3Computational Biosciences Section, Life Sciences Division, and 4Robotics and Process Systems Division, Oak Ridge National Laboratory, P.O. Box 2009, Oak Ridge, TN 37831 and 5Department of Biochemistry, Cellular, and Molecular Biology, University of Tennessee, Knoxville, TN 37996

okz@ornl.gov

A primary goal of functional genomics is to understand the molecular mechanisms underlying complex interactions among genetically controlled biochemical pathways and the effects of environmental exposures and aging. The complete DNA sequences of the human and mouse genomes will soon be available, including the sequences of the estimated 100,000 genes present in each of these mammals. Even now there are over 893,000 mouse expressed sequence tags (ESTs) present in databases. The availability of these EST reagents, combined with recent advances in analytical technologies and bioinformatics tools are making a dramatic impact on our comprehension of complex genetic pathways. We are exploiting these EST reagents for determining the components of genetic pathways in a single organ system, the skin. Gene-expression profiles are being determined for anonymous skin ESTs, as well as ESTs from genes with known roles in skin development, differentiation, apoptosis, DNA repair, cancer, pigmentation, and skin and hair morphology. Gene expression is being examined during normal growth and differentiation processes, and compared to expression patterns elicited in response to genetic mutations or environmental exposures. To this end, we are combining three areas of expertise at ORNL (i.e., mouse molecular genetics, analytical technologies and instrumentation, and bioinformatics) to develop an integrated-systems approach for defining gene function in genetic networks. Custom instruments, combining reagent-jets with precision movement stages, have been developed for the high throughput production of high-density microarrays. Automated procedures have been developed using commercial liquid handling systems for the preparation of tissue-specific cDNA probes, and for the parallel processing of 96 cell or tissue samples into fluorescently-labeled cDNA targets for hybridization to microarrays. Integration of these various instruments, tissue samples, cDNA clones, microarrays, and expression data will be accomplished with the aid of several inter-operating bioinformatics tools. Three bioinformation-system modules are being developed: (1) to track mice, tissues, and molecular samples; (2) to analyze the results of gene-expression arrays; and (3) to perform biologically meaningful reduction of data (e.g., by cluster analysis) and linking of the expression results to other databases containing structural and functional information (see abstract by Snoddy et al.). An important goal of this project is to make these data available to the scientific community through the web. These efforts complement the chromosome-region mutagenesis program at ORNL (see abstracts by Johnson et al., Michaud et al., and Rinchik et al.) by developing an integrated, systems-biological approach to analyzing complex multigenic traits in mice.


106. Genome-Wide Expression Analysis Prove that Distinct Sets of Genes Participate in Cardiac Hypertrophy and the Regression of Hypertrophy

Carl Friddle, James Bristow, Teiichiro Koga, and Edward M. Rubin

Department of Genome Sciences, Lawrence Berkeley National Laboratory, Berkeley, CA 94720

EMRubin@lbl.gov

Cardiac hypertrophy is a significant risk factor for cardiac failure, affecting 15% of the adult population and 50% of those with hypertension. Single gene disorders account for a small fraction of these cases. Prior studies have identified a limited set of genes that play important roles in the onset progression and regression of cardiac hypertrophy. These studies have primarily focussed on genes known to function in the heart. In the present study, using pharmacological models of hypertrophy in mice, expression profiling was performed with fragments of more than 3,000 genes to characterize and contrast expression changes during induction and regression of hypertrophy. Administration of angiotensin II and isoproterenol by osmotic minipump produced increases in heart weight (15% and 40% respectively) that returned to pre-induction size following drug withdrawal. From multiple expression analyses of left ventricular RNA isolated at daily time-points during cardiac hypertrophy and regression, we identified sets of genes whose expression was altered at specific stages of this process. While confirming the participation of 25 genes and pathways known to be altered by hypertrophy, a larger set of 30 genes was identified whose expression had not previously been associated with cardiac hypertrophy or regression. Of the 55 genes that showed reproducible changes during the time course of induction and regression, 32 genes were altered only during induction and 8 were altered only during regression. This study identified both known and novel genes whose expression is affected at different stages of cardiac hypertrophy and regression and demonstrates that cardiac remodeling during regression utilizes a set of genes that are distinct from those used during induction of hypertrophy.


107. Ribozyme Gene Vector Libraries Identify Putative Tumor Suppressor Genes

Qi-Xiang Li, Eric Marcusson, Joan Robbins, Mark Leavitt, Flossie Wong-Staal, and Jack R. Barber

Immusol, Inc. San Diego, CA 92121 and University of California, San Diego, CA

barber@immusol.com

We have developed a method for gene identification, based on analysis of cellular function, that allows the specific, directed identification and cloning of many genes. We have created viral vectors containing a highly complex library of Rz genes that can be stably and efficiently introduced into mammalian cells. By utilizing methodologies that enable selection of cells that have undergone a phenotypic change as a result of a specific Rz, we can isolate Rzs that inactivate genes associated with that phenotype. Once specific Rzs are selected and verified, their target recognition binding sequences can be used as tags to identify and clone the corresponding target genes.

We have used this approach to identify three novel tumor suppressor genes. We will present evidence that validates the role of these genes in the process of malignant transformation. Furthermore, we have used RNA expression profiling to begin the functional dissection of the pathways that these genes are involved in.


108. New Vectors for TAR Cloning and Retrofitting of Mammalian Genes

Maxim Y. Koriabine, Gregory G. Solomon, Lois A. Annab, J. Carl Barrett, and Vladimir L. Larionov

Laboratory of Molecular Genetics and Laboratory of Molecular Carcinogenesis, National Institute of Environmental Health Sciences, P.O. Box 12233, Research Triangle Park, NC 27709

koriabi1@niehs.nih.gov

The recent development of TAR (Transformation-Associated Recombination) cloning strategy for the selective isolation of specific regions and genes from complex genomes greatly advanced YAC cloning technology. Over the last two years the new technique was successfully applied for isolation of different genes and specific regions from human and mouse genomes. In this study we describe construction of a second generation of TAR cloning vectors, pVC604 (HIS3-CEN6-pBR), pVC604-A (HIS3-CEN6-pBR-Alu) and pVC604-B (HIS3-CEN6-pBR-B1), for gene isolation from human and mouse genomes. New vectors greatly simplify replacement of targeting sequences and subsequent physical analysis of the cloned material. In order to help to mobilize the DNA inserts in YACs for a variety of studies, we also have designed a set of vectors that retrofit YACs with different mammalian selectable markers and permit their transferring into E. coli cells as circular YAC/BACs. The following vectors were constructed: BRV1-N [BAC-URA3-Neomycin phosphotransferase (Neo)], BRV2-H [BAC-URA3-Hygromycin phosphotransferase (Hyg)], BRV3-B [BAC-URA3-Blasticidin S deaminase (BSD)], BRV4-G [BAC-URA3-xanthine-guanine phosphoribosyl transferase (gpt)] and BRV5-C (BAC-URA3-Cytidine deaminase (codA)]. In this study we have shown that using these vectors YACs up to ~700 kb can be efficiently converted to YAC/BACs with mammalian selectable markers by in vivo recombination in yeast. We also show evidence that circular YAC/BACs of up to 300 kb can be subsequently transferred into E. coli cells by electroporation for further DNA isolation. The YAC retrofitting method is simple, and opens possibility to use the YACs generated by TAR cloning for structural and functional studies.


109. Defining the Minimal Length of Sequence Homology Required for Selective Gene Isolation by TAR Cloning

Vladimir Noskov, Maxim Koriabine, Greg Solomon, Natalay Kouprina, J. Carl Barrett, Lisa Stubbs1, and Vladimir Larionov

Laboratory of Molecular Genetics and Laboratory of Molecular Carcinogenesis, National Institute of Environmental Health Sciences, P.O. Box 12233, Research Triangle Park, NC 27709 and 1Human Genome Center, Lawrence Livermore National Laboratory, Livermore, CA 95616

noskov@niehs.nih.gov

Using the recently developed TAR cloning technique, it is possible to directly isolate specific chromosomal regions and genes from complex genomes as linear or circular YACs. Over the last two years the new technique has been successfully applied for isolation of different genes and specific regions of human and mouse genomes. In this study we investigated the minimal length of sequence homology required for gene isolation by TAR cloning using the Tg.AC transgene as a model. The Tg.AC transgene unit consists of a zeta-globin promoter fused to the v-Ha-ras structural gene with a terminal simian virus 40 (SV40) polyadenylation signal sequence. We constructed a set of radial TAR cloning vectors containing the B1 repeat and different size SV40-specific hooks (from 800 bp to 20 bp). With a vector containing a 800 bp hook, cloning of Tg.AC transgene sequences from mouse genome was highly specific: one among fifty yeast transformants obtained contained a YAC with Tg.AC transgene. The same yield of positive clones was observed when length of homology was reduced to 60 bp. Therefore the minimal length of a unique sequence required for gene isolation is only 2 times larger than the minimal size of homology required for spontaneous mitotic recombination in yeast. This observation greatly facilitates selection of hooks for isolation of specific regions as well as construction of TAR vectors because the hooks can be synthesized as oligonucleotides instead of being isolated as genomic fragments.


110. Contamination of BAC Clones by E. coli IS186 Insertion Elements

Owatha L. Tatum, Andrew W. Womack, Mark O. Mundt, and Norman A. Doggett

Bioscience Division and DOE Joint Genome Institute, Los Alamos National Laboratory, Los Alamos, NM 87545

doggett@lanl.gov

The E. coli insertion element IS186 is a 1343 bp transposable element which is present at three to four copies in the E. coli genome. The transposon is flanked by a 23 bp inverted repeat and has been shown to insert preferentially into GC-rich targets. We have discovered 8 BAC and P1 clones from several widely used human genomic libraries which have a single copy of this insertion element included in the finished Genbank submission. These clones were sequenced at six different sequencing centers and in no case was the insertion element annotated as being derived from E. coli. The average G+C content of a 100 bp window on either side of the insertion site in all clones is very high (75.8%) and appears to be within CpG islands. In two of the 8 cases the insertion site is flanked by G+C-rich SVA repeat elements (a retroviral LTR class of repeat). Earlier studies of IS186 insertions into plasmids have shown that target duplications of 8 to 12 bp occur at the insertion site. We looked for evidence of target duplication in a BAC clone containing this insertion by comparison with the finished sequence of a cosmid clone which overlapped the insertion site. This proved that the insertion of IS186 caused a 10 base pair duplication of human sequence surrounded the insertion site. Ten base pair duplications were surrounding the insertion site were found in the finished sequence of all other clones. In order to determine whether IS186 insertions occurred during library construction or propagation we performed PCR and sequencing experiments on several isolates of each clone. We found that RPCI-11 BAC clones sent directly from the Roswell Park resource did not contain insertion elements providing strong evidence that IS186 insertions are most likely occur during subsequent propagation of BACs. We estimate the frequency of this insertion in finished clones to be approximately 1 in 1000 but the actual frequency could be much higher if this element has been removed from some finished sequences prior to submission.

Supported by the US DOE.


111. Developing General Methods to Select Phage Antibodies Against Gene Products

Peter Pavlik1, Robert Siegal1, Daniele Sblattero2, Vittorio Verzillo1,2, Roberto Marzari3, Jianlong Lou2, Jim Marks4, and Andrew Bradbury1,2

1Los Alamos National Laboratory, Los Alamos, NM 87545; 2SISSA, Trieste, Italy; 3University of Trieste, Trieste, Italy; and 4University of California San Francisco, San Francisco, CA

amb@lanl.gov

Phage display offers the possibility of selecting polypeptides (and the genes which encode them) from libraries of 1e10 or more different polypeptides on the basis of their abilities to bind target proteins and subdomains. This diversity far surpasses the estimated number of total genes in the human genome. The application of this technology to the Human Genome Project will powerfully accomplish a central goal: the derivation of ligands that recognize protein products of all human genes, such ligands being either antibodies, or protein fragments.

Where the recognition ligands derived from this relatively new technology are antibody binding regions (single chain Fv) they can be employed in the same way as traditional antibodies. As such, they can play essential roles in assigning gene function, including the characterization of spatiotemporal patterns of protein expression and the elucidation of protein-protein interactions. Where the recognition ligands are protein fragments, they can be considered to be potential protein-interaction partners for the immobilized polypeptide and so a starting point for further biochemical studies.

This project has concentrated on trying to find a general way to isolate antibodies against gene products, preferably starting from gene sequence and using peptides to avoid the need for cloning and expression, although high throughput methods to select against recombinant products have also been developed.

Selection of antibodies against recombinant proteins has been reduced to the microtitre format. A comparison of the antibodies selected using this protocol with the standard selection procedure shows that the antibodies selected are on the whole different, although there is some overlap. Should gene products be available this is a very efficient way to select antibodies in a high throughput format.

In addition, selection on peptide surrogates of gene products has also been attempted. 192 scanning peptides corresponding to overlapping parts of four different proteins have been synthesised on microtitre pins and used to select phage antibodies. Some of the selected antibodies are able to recognise the full length protein. An analysis of the peptides which select antibodies recognising the full length protein has allowed us to develop an algorithm to predict which peptides are more likely to select useful proteins.


112. Search and Identification of Proteins that Bind Specifically to the Satellite DNAs

Ivan B. Lobov and Olga I. Podgornaya

Institute of Cytology RAS, 4 Tikhoretskii Ave., 194064 St. Petersburg, Russia

ivan_lobov@hotmail.com

In the nucleus, chromosomes and individual chromosome domains are arranged by a non-random fashion. This organization is cell type-specific and undergoes rearrangements under the conditions that alter gene expression. Tandemly organized transcriptionally silent non-coding sequences, satellite DNAs (satDNAs), are localized in gene-poor heterochromatic chromosome regions. In interphase nucleus, these regions have a tendency to fuse, forming large chromocenters in a cell type-specific manner. Euchromatic regions can also associate with chromocenters that causes gene silencing as a result of transcription repression effect of heterochromatin.

Heterochromatin properties are mediated by proteins specifically associated with satDNAs that, however, are poorly characterized. To furthering our understanding of the role that satDNAs play in genome, we undertook search for proteins that bind specifically satDNAs of mouse and human. Chromosomal DNA anchored to the nuclear matrix (NM) or scaffold at the specific sites called M/SARs (for Matrix or Scaffold Attachment Regions) and via large blocks of satDNAs. We used electrophoretic mobility shift essay to reveal NM DNA-binding protein specific for the mouse major satDNA. We have developed a reliable approach for mild non-denaturing extraction of NM proteins that are generally insoluble under physiological conditions. The main DNA-binding protein revealed in these experiments was identified as a mouse homologue of SAF-A, an M/SAR-binding protein. We have also found that in interphase nuclei SAF-A predominantly decorates and covers heterochromatic areas.

Using Southwestern assay we have also identified four abundant DNA-binding proteins (p150, p120, p83 and p66) in nuclei and NM preparations. These proteins bound specifically to mouse major satDNA and fragment of alphoid satDNA from locus alpha21-II of human chromosome 21. p120 and p66 were identified as SAF-A and lamin B correspondingly. p150 and p83 are apparently identical with SAF-B and ARBP, well-characterized M/SAR-binding proteins. Using an electrophoretic assay and computer modeling of DNA structure we have found that proteins prefer intrinsically bent DNA fragments over the strait ones. Thus, despite the lack of sequence homology, different satDNAs share structural features that might serve as a recognition signal for DNA-binding proteins of the NM.

Our data raise the possibility that different M/SAR-binding proteins can bind specifically to certain subsets of satDNAs of different species. The ability of NM proteins to recognize both M/SARs and satDNAs might serve as a general mechanism of gene association with heterochromatin.


113. Diversity in the Proteome: Homologous DNA Replicase Genes Use Alternatives of Transcriptional Slippage or Translational Frameshifting for Gene Expression

Norma M. Wills, Bente Larsen, Chad Nelson, John F. Atkins, and Raymond F. Gesteland

Department of Human Genetics, University of Utah, 15 N. 2030 East Room 7410, Salt Lake City, UT 84112-5330

nwills@genetics.utah.edu

A newly discovered contributor to the complexity of the proteome stems from generation of multiple RNAs from a single gene by transcriptional slippage. In the Thermus thermophilus dnaX gene, transcriptional slippage on a run of nine T residues results in a mixture of mRNAs differing in the number of A residues. Standard translation of a subpopulation of mRNAs yields the full-length tau protein while another subpopulation produces the shortened gamma protein. Transcriptional slippage was implicated by determining the masses of PCR products spanning the run of A residues using mass spectrometry. With genomic DNA as the PCR template, the predominant signals correspond to molecules containing nine A/Ts. The pattern is strikingly different using reverse-transcribed mRNA as template. There are multiple signals corresponding to molecules containing 8-18 A/Ts showing heterogeneity in the mRNA population transcribed from the single dnaX gene.

This method of dnaX gene expression in Thermus thermophilus differs markedly from dnaX expression in E. coli where two analogous proteins are produced from a single dnaX gene by ribosomal frameshifting. Standard translation of the homogeneous mRNA population produces the full-length tau protein. Approximately 50% of the time, ribosomes shift to the -1 reading frame at a specific sequence, A AAA AAG, stimulated by signals in the mRNA and produce the shortened gamma protein. It is surprising that two rather similar dnaX sequences lead to very different modes of expression in the two organisms. The global importance of these and other alternative mechanisms of gene expression will be revealed by proteome analysis now underway.


114. The Transcriptional Program of Gametogenesis in Budding Yeast

Ira Herskowitz

Department of Biochemistry and Biophysics, University of California, San Francisco, San Francisco, CA 94143-0448

ira@cgl.ucsf.edu

Gametogenesis in yeast is the process whereby diploid cells of the a/alpha cell type undergo meiosis and form spores. Sporulation is initiated only when two conditions are met: cells are of the appropriate cell type (a/alpha), and cells receive the appropriate environmental stimulus (nutritional starvation). Under these conditions, a developmental program is initiated in which the following events occur: chromsomes are duplicated; then the chromosomes align and recombine with each other. After successful alignment and recombination, the duplicated sister chomatids are separated from each other (the first meiotic division). Next, the sister chromatids are separated from each other (the second meiotic division). Finally, the separated sets of chromosomes (a haploid set) are wrapped up in spores. The end result of sporulation is production of four haploid spores encased in a single sac.


148. Fluorescent-based Protein Kinase and Phosphatase Assays

Edward M. Davis and Wafaa Mahmoud

SymBiotech Incorporated, Wallingford, CT 06492

symbio@snet.net

With funding from a DOE SBIR Phase I grant, Number DE-FG02-99ER82901, SymBiotech is developing fluorometric-based assays for phosphorylated proteins and peptides to follow kinase and phosphatase activity. The assays employ a new reagent that eliminates the need to use radioisotopes or costly monoclonal antibodies and opens new opportunities for developing sensitive and safe diagnostic assays.

The fluorescence of pepsin (a monophosphorylated protein)was determined after treating it with different concentrations of the phospho-detecting reagent. Results showed that 3.6 nanomoles phosphoprotein yields a maximum fluorescence of about 200 fluorescent units when treated with 1 mg/mL of reagent.

To follow CAM kinase activity, an HPLC-based assay was developed as the gold standard. A fluorescent-based kinase assay is now under development. Preliminary results show that the fluorescent reagent reacts with the CAM kinase-specific peptide substrate (Autocamtide-2, Alexis Biochemicals). Work now focuses on following CAM kinase activity by monitoring the degree of peptide phosphorylation using the fluorescent-based assay.


149. Probe design on genomic level for DNA oligo microarrays

Li, F. and Stormo, G.D.

Dept of Genetics, Washington University Medical School St. Louis, MO 63110

stormo@genetics.wustl.edu

We have designed a program to predict optimal oligo-probes for each gene in an entire genome. Criteria used for optimality are maximizing the minimum number of mismatches to every other gene, finding appropriate T_m for the correct gene match and maximizing the reduction in T_m for the mismatched genes. We have used the probe to predict a set of optimal probes for each of several model systems that have complete genome sequences and put them in a publicly available database: ural.wustl.edu/~lif/probe.pl

The program is also available from the authors for use on any other sequence sets.


155. The Tree of Life: The Origin of Universal Scaling Laws in Biology from Molecules, Genes, and Cells to Whales Cells to Whales

Geoffrey B. West1, James H. Brown2, Brian Enquist3, and William H. Woodruff4

Los Alamos National Laboratory and Santa Fe Institute1, University of New Mexico and Santa Fe Institute2, University of California, Santa Barbara3, and Los Alamos National Laboratory4

gbw@lanl.gov

Although life is the most complex system known, many of its attributes satisfy remarkably simple universal scaling laws. For example, metabolic rate scales as mass to the 3/4 power, ranging from the largest organisms (whales and sequoias) to the very smallest unicellular microbes, even extending down through mitochondria to the molecular level of the respiratory complex - an astounding 27 orders of magnitude. Many other such allometric scaling laws are known which relate lengths (such as the radius of the aorta, genome size) and time-scales (heart-rate, lifespan, reaction rates) to mass. These are typically power laws whose exponents are simple multiples of 1/4. The phenomenology of these observations will be reviewed and a general model presented that explains quantitatively their origin and universality. It is based on the fundamental observation that, at all scales, life is sustained by the transport of resources and information through space-filling fractal-like hierarchical branching networks whose terminal units are invariant. Assuming that natural selection has led to network systems which minimize energy dissipated, or, alternatively, to the area of interface with the resource environment being maximized, the origin of quarter-power scaling for a myriad of observables for diverse biological systems can be explained. A general argument will be presented augmented by detailed analyses of the mammalian circulatory and plant vascular systems, for which complete quantitative descriptions can be derived. The extension of these ideas to growth, aging and possibly genomics will be dicussed.


The online presentation of this publication is a special feature of the Human Genome Project Information Web site.