Mapping Abstracts

DOE Human Genome Program
Contractor-Grantee Workshop VIII
February 27-March 2, 2000  Santa Fe, NM


Home
Author Index
Sequencing
Table of Contents
Abstracts   
Instrumentation
Table of Contents
Abstracts
Mapping 
Table of Contents
Abstracts
Bioinformatics
Table of Contents
Abstracts
Function and cDNA Resources
Table of Contents
Abstracts

Microbial Genome Program
Table of Contents
Abstracts
Ethical, Legal, and Social Issues
Table of Contents
Abstracts
Infrastructure
Table of Contents
Abstracts

Ordering Information

Abstracts from
Past Meetings

46. Analysis of WUSTL's Human BAC Fingerprint Database

R. Sutherland, M. Mundt, and N. Doggett

Bioscience Division and DOE Joint Genome Institute, Los Alamos National Laboratory, Los Alamos, NM 87545

rds@lanl.gov

We have used the LANL human chromosome 16 BAC contig data to evaluate the Washington University's Genome Sequencing Center Human BAC Fingerprint Database.

WU has fingerprinted 162,272 RPCI-11 BACs and assembled them into 12,549 contigs.

LANL has identified 4085 BACs using 1106 overgos and STSs from 16 q-arm. The 16 q-arm is 45 Mb and covers 1.35% of the human genome.

For this exercise, only BACs from sections 1 and 2 of the RPCI-11 library were considered. For these sections, there are 125,979 WU's BACs within contigs and 3530 LANL mapped BACs.

The first results are a straight set-to-set comparison of the two data sets to see which WU contigs can be linked to 16 q-arm map. BACs occurring in the LANL set were used to query WU contigs. The results were as follows: 10,882 BACs from 657 contigs were identified from the WU data, 2034 BACs were in common with the LANL data. 57% of the LANL BACs could be found in a WU contig but only 18.7% percent of WU BACs in these contigs were found in the LANL set.

For the second analysis we discounted all WU contigs that contained only a single LANL mapped BAC. The results were as follows: 3,618 BACs from 245 contigs were identified from the WU data, 1622 BACs were in common with the LANL data. 46% of the LANL BACs could be found in a WU contig while 44.8% percent of WU BACs in these contigs were found in the LANL set.

For the third analysis we limited the WU set further. Only BACs that are contained in contigs that range from 2-40 members were considered; this is a 1 sigma distribution. The results were as follows: 2,766 BACs from 236 contigs were identified from the WU data, 1491 BACs were in common with the LANL data. 42% of the LANL BACs could be found in a WU contig while 53.9% percent of WU BACs in these contigs were found in the LANL set.

We believe that the LANL BAC map provides >90% coverage of the 16 q-arm and that we identified the great majority of 16 q-arm BACs from sections 1 and 2 of the RPCI-11 library. Thus, the percentages above suggest to us that there is a significant level of false overlaps in the WU BAC contigs.


47. Human Chromosome 16 Mapping Update

Cliff S. Han, Robert D. Sutherland, Phillip B. Jewett, Mary L. Campbell, Linda J. Meincke, Judy G. Tesmer, Mark O. Mundt, Larry L. Deaven, and Norman A. Doggett

Bioscience Division and DOE Joint Genome Institute, Los Alamos National Laboratory, Los Alamos, NM 87545

chan@telomere.lanl.gov

We have used sequence-based markers from an integrated YAC STS-content/somatic cell hybrid breakpoint physical map and radiation hybrid maps of human chromosome 16 to construct a new sequence-ready BAC map of this chromosome. The integrated physical map was previously generated in our laboratory and contains 1150 STSs, providing a marker on average every 78 Kb on the euchromatic arms of chromosome 16. The other two maps utilized for this effort were the radiation hybrid maps of chromosome 16 from Whitehead Institute and Stanford University. To create large sequenceable targets of this chromosome we used a systematic approach to screen high density BAC filters with probes generated from overlapping oligonucleotides (overgos). We first identified all available sequences in the three maps. These include sequences from genes, ESTs, STSs, and cosmid end sequences. We then used BLAST to identify 36 bp unique fragments of DNA for overgo probes. A total of 906 overgos were selected from the long arm of chromosome 16. After a total of 212 hybridizations we have constructed an initial probe-content BAC map of chromosome 16q consisting of 828 overgo markers and 3363 BACs providing greater than 85% coverage of the long arm of this chromosome. Gaps in the map are being closed with the following methods: 1) PCR screening the RPCI-11 library with the BAC end sequence-derived STSs. 2) BAC end sequence database searches with draft sequences of BAC clones near the gaps. 3) Screening the RPCI-11 library with overgos generated from the BAC end sequences near the gaps. To date, 400 PCR screening and 5 pooled overgo hybridization have been completed for the gap closing effort, extending the coverage of the BAC map to over 90%.

Supported by the US DOE, OBER under contract W-7405-ENG-36.


48. Annotation and Analysis of the Draft Sequence of 16Q12

Jung-Rung Wu, Mark O. Mundt, Cliff S. Han, Kristina Kommander, Robert D. Sutherland, Lela Tatum, Norman A. Doggett, and Larry L. Deaven

Bioscience Division and DOE Joint Genome Institute, Los Alamos National Laboratory, Los Alamos, NM 87545

wu@telomere.lanl.gov

We have completed a map and chosen a tiling set of more than 40 spanning BACs, most often from the RPCI-11 library, for a region that covers more than 5 Mb of human 16q12, encompasing a locus for inflamatory bowel disease (IBD1). These BACs have been selected by a combination of overgo hybridization, restriction map assembly by both fingerprinting and sequence prediction, and BAC end sequence searches. Sequencing has been completed to levels ranging from light shotgun to Bermuda finished bases. We will present annotation and statistical results of our analysis of the draft sequence we have achieved thus far and show how these results compare with ~ 5 Mb of sequence from 16p13.3.


49. Progress in Mapping the Mouse Genome

Cliff S. Han, Linda J. Meincke, Larry L. Deaven, and Norman A. Doggett

Bioscience Division and DOE Joint Genome Institute, Los Alamos National Laboratory, Los Alamos, NM 87545

chan@telomere.lanl.gov

The mouse genome is the second major target for sequencing by the JGI. The initial focus is on regions of biological interest and regions of synteny to human chromosome 5, 16, and 19. The mapping group at LANL is now focusing on mouse genome targets syntenic to human chromosome 16 for BAC map construction. The probes for this effort are derived from STSs and cDNA sequences.

1) STS mapping: We utilyze the sequences from STSs located on mouse chromosomes syntenic to human chromosome 16. These STSs come from various map sources. To date, 960 overgos generated from STSs have been screened against a 5X portion of the RPCI-23 mouse library and 2403 BACs identified. Overgos from 833 STSs were located to at least one BAC.

2) cDNA mapping: We use two approaches to find cDNAs in the region syntenic to human chromosome 16: 1) BLAST against mouse unigene database with unigene sequences from human chromosome 16 that are masked with repeatmasker. 2) BLAST against mouse unigene database with genomic sequences of human chromosome 16 that are masked with repeatmasker. A total 600 cDNA sequences were found after the two BLAST searches. Overgos from 96 of the cDNAs have been screened against the RPCI-23 library. 570 BACs were hit by 94 overgos. Average hit per overgo probe is 12. The first eighty six mouse BAC clones have been sent to the sequence queue.

Supported by the US DOE, OBER under contract W-7405-ENG-36.


50. Rapid Construction of Mouse Sequence-Ready Maps Using a Homology-Driven Approach

Lisa Stubbs, Joomyeong Kim, Laurie Gordon, Hummy Badri, Mari Christensen, Matt Groza, Chi Ha, Sha Hammond, Michelle Vargas, and Eddy Wehri

DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598 and Genome Division, Lawrence Livermore National Laboratory, 7000 East Avenue, Livermore CA 94550

stubbs5@llnl.gov

We have developed a rapid and efficient "homology-driven" strategy for assembling mouse BAC clone contigs for comparative sequencing, and are using this approach to generate large contigs spanning all mouse regions related to gene-containing segments of human chromosome 19. The strategy uses overgo probes designed from matches detected between human genomic DNA sequence and mouse ESTs or other cDNA fragments in pooled hybridization against gridded mouse BACs. The overgoes are chosen with 50-100 kb spacing and hybridized in pools corresponding to position of the homologous sequence in the human chromosome. The human map is used as a model for assembly of corresponding mouse contigs, and contig assembly, clone integrity, and overlap are verified by restriction fingerprinting, completed at a depth that permits the creation of a detailed restriction map of the mouse contig. This strategy has been used to assemble maps of more than 15 Mb of mouse DNA as of the date of this submission (12/99), and we expect to complete maps of all chromosome 19-related regions within 2-3 months. The maps we have generated provide an important source of clones for directed comparative sequencing, and reagents for basic studies of genome evolution and for analysis of mouse mutations.


51. Structural and Functional Analysis of a Conserved Imprinted Region of Human Chromosome 19q13.4 and Mouse Chromosome 7

Joomyeong Kim1,2, Vladimir Noskov3, Xiaochen Lu1,2, Anne Bergmann1,2, Tiffany Warth2, Paul Richardson1, Vladimir Larionov3, Natasha Kouprina3, and Lisa Stubbs1,2

1DOE Joint Genome Institute, 2800 Mitchell Avenue, Walnut Creek, CA; 2Genome Division, Lawrence Livermore National Laboratory, 7000 East Avenue, Livermore, CA; and 3National Institute of Environmental Health Sciences, Laboratory of Molecular Genetics, Research Triangle Park, NC

stubbs5@llnl.gov

Mouse genetics studies have long ago predicted that a genomically imprinted domain would be found near the centromere of mouse chromosome 7, a region with known syntenic homology to human chromosome 19q. Animals that inherit only maternal alleles of this region die as neonates, suggesting the presence of a gene or genes, expressed exclusively from the paternal chromosome, which is required for normal development. In earlier studies, we mapped a known paternally-expressed gene, PEG3, to human 19q13.4, and we reasoned that other imprinted genes would be found nearby in a clustered imprinted domain. We have used the emerging human sequence and data derived from related mouse clone contigs to identify new genes near PEG3, and have demonstrated that these novel genes are also imprinted in mice. These new genes are expressed highly in embryos; one represents a strong candidate for the paternally expressed, neonatal lethal factor predicted by mouse genetics. Our studies demonstrate that, despite many basic similarities, human and mouse regions surrounding PEG3 have undergone significant changes in gene content and organization. We will discuss the structure, expression and evolution of genes in this imprinted region, discuss the potential functions of each gene, and speculate on the possible implications of these findings for the genetics of chromosome 19-linked disorders in humans.


52. Mapping and Functional Analysis of the Mouse Genome

D. K. Johnson,1 C. T. Culiat,1 M. L. Klebig,2 Y. You,1 D. R. Miller,1 L. B. Russell,1 E. J. Michaud,1 and E. M. Rinchik1,2

1Mammalian Genetics and Development Section, Life Sciences Division, Oak Ridge National Laboratory, P.O. Box 2009, Oak Ridge, TN 37831-8077 and 2Department of Biochemistry, Molecular, and Cellular Biology, University of Tennessee, Knoxville, TN 37996

johnsondk@ornl.gov

As part of a functional-genomics strategy to determine how altered genes and proteins impact complex biological systems in mammals, the Mammalian Genetics Program at ORNL is characterizing a number of regions of the mouse genome on both the physical and functional levels, using mouse mutations as tools. This project forms a logical partnership with our regional mutagenesis program, which is designed to detect, maintain, and partially characterize new chemically induced mutations in ~8-10% of the mouse genome by utilizing new genetic tools and broad-based phenotype screening. The integrated efforts of these projects will advance the post-genome sequencing mission of annotating human DNA sequence with whole-organism functional information from the mouse model system.

Our goal is to acquire the DNA sequence of each region, to develop a validated transcription/ expression map, and to ascribe whole-organism functional information to each coding sequence through analysis of heritable gene mutations. Our chosen genome regions and subregions will be physically delimited by identifiable DNA landmarks (typically chromosomal rearrangements); hence, we can easily co-map mutant phenotypes with coding units to establish unambiguous sequence/function relationships by superimposing mutation maps onto transcription maps. The target regions include the 5- to 6-cM pink-eyed dilution (p) region in mouse Chromosome (Chr) 7 (human Chrs 11p, 15p, and 15q homologies); the 14 cM between p and the albino (Tyr; c) region (human Chr 15q); the 6- to 11-cM Tyr region (human Chrs 6p, 11p, 11q, and 15q); all of Chr 15 (human Chrs 5p, 8q, 12q, 22q), concentrating initially on the distal half; and mid-Chr 10 (human Chrs 6q, 10q, 12q, 21q, and 22q). With available molecular and embryonic stem (ES)-cell techniques, the growing emphasis on regional-mutagenesis strategies and the development of mouse reagents with which to carry out those strategies, we and others can extend this same discovery approach to any genome region.

Complete DNA sequence for these regions will be obtained by collaboration with the Joint Genome Institute or by mining of public databases created by the NIH mouse sequencing efforts. After ascer-tainment of potential transcription units from EST mapping and from computational analysis of raw DNA sequence by ORNL's Computational Biosciences Section, predicted transcription units will be verified by RNA analyses (Northerns, RT-PCR, RNase protection, and/or microarray procedures). The ultimate correlation of dense mutation maps with the transcription/expression maps has begun by identifying candidate mutant genes bearing ENU mutations, using densely mutagenized regions within the p- and Tyr regions as initial models with which to develop efficient mutation-scanning techniques. Phenotype gaps can also be filled with knockout/ gene-trap mutations for genes discovered in DNA sequence analysis but not represented as ENU mutations. All new DNA sequence information, expression information, and mutations will be advertised to interested partners via the WWW.


[Research sponsored by the Office of Biological and Environmental Research, USDOE, under contract DE-AC05-960R22464 with Lockheed Martin Energy Research, Inc.]


53. Toward Completion of a Human Chromosome 5 BAC Map and a Mouse Syntenic BAC Map

Steve Lowry, Ze Peng, Duncan Scott, Yiwen Zhu, Mei Wang, Roya Hosseini, Michele Bakis, Joel Martin, Ingrid Plajzer-Frick, Jeff Shreve, Le-Thu Nguyen, and Jan-Fang Cheng

Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720

jcheng@mhgc.lbl.gov

Physical mapping of BACs on human chromosome 5 is in a final stage. The current map consists of 7,618 BAC clones anchored to the chromosome by 2,954 STSs. The distribution of STSs is not even across the chromosome. Approximately 50% of the STSs were derived from 1/3 of the chromosome at the end of the q arm where the average size of contigs is greater than 1 Mb. Most BACs were isolated as single colonies. Restriction maps and FISH maps were constructed for all contigs and are available on the web. The maps are updated regularly.

To date, 1,702 BACs have been selected for sequencing. These BACs contain a total restriction fragment length of 163.5 Mb or approximately 90% of the euchromatin portion of the 190 Mb chromo-some. In an independent experiment, we tested the degree of coverage provided by our map by probing the mapped clones with 796 new STSs (580 ESTs, 216 randomly derived). We found 88% of the STSs were contained by the mapped clones. Both restriction map length and STS analysis indicate that the selected BAC tiling path covers approximately 88-90% of the chromosome.

Sequence already generated for our mapped clones is enabling us to expand contigs by detecting overlaps between BACs that were undetected by restriction fragment analysis and STS content mapping. BAC end sequences in the TIGR database also enable extension of contigs.

The clone map and sequence information for human chromosome 5 are being used to isolate syntenic mouse BACs. Several large contigs have been built using mouse ESTs that were identified by high sequence similarity to human chromosome 5 sequences. To streamline the identification of syntenic mouse ESTs, we have generated a web interface to facilitate (1) blasting of GenBank databases with batches of masked sequences, (2) parsing of output based on length and level of similarity, and (3) reduction of background matches by identifying successive exons in the genomic sequence.


54. Sequence-Ready Characterization of the Pericentromeric Region of 19p12: A Strategy for the Analysis of Complex Regions of the Human Genome

Evan E. Eichler1, Anthony P. Popkie1, Laurie A. Gordon2, and Anne S. Olsen2

1Case Western Reserve University, Cleveland, OH, 44106 and 2DOE Joint Genome Institute, Walnut Creek, CA 94598

eee@po.cwru.edu

The pericentromeric region of 19p12 represents one of the most poorly mapped and sequenced regions of chromosome 19. This is due, in large part, to the virtual absence of unique sequence identifiers within this region. The proximal portion of 19p12 possesses sequence attributes consistent with both euchromatic and heterochromatic DNA including a large cluster of ZNF (zinc-finger) genes, an overabundance of human endogenous retroviral elements and an atypical higher-order (~10-30 kb) beta-satellite repeat structure. Analysis of ~425 kb of seed sequence from 19p12 has revealed that less than 15% of the region consists of bonafide unique sequence. This unusual organization has hampered the development of sets of large contiguous clones in this region, resulted in relatively poor clonal coverage (<60%) and has greatly limited the selection of suitable templates for sequencing. To complete mapping and sequencing in this region, we have designed a strategy that takes advantage of the known biological properties of 19p12 repetitive sequences. Our approach has been to distinguish between "generic" and 19p12-specific repeat elements; develop assays to rapidly identify 19p12 clones from different genomic libraries and to confirm the position of these clones at the level of sequence-overlap. Both high-resolution FISH techniques (extended chromatin analysis) and restriction fragment overlap are being implemented as complementary tools to confirm the integrity of the map and to identify potential sites of heteromorphism in the region. To date, a total of 403 19p12 BAC clones have been identified from RPCI-11 and CIT-D libraries. A subset of these have been used to extend clonal coverage into 12 different gap regions of 19p12; four of which are now tentatively closed. The data generated will be used in the selection of the most parsimonious tiling path of BAC clones to be sequenced as part of the JGI effort on chromosome 19 and should serve as a model for the sequence characterization of other difficult regions of the human genome. The complex organization of this region will be discussed in the context of its unusual biology.


55. IMAGEne 3.0: Clustering All Sequences Obtained from I.M.A.G.E. Clones

Peg Folta, Tom Kuczmarski, Tim Harsch, and Christa Prange

Lawrence Livermore National Laboratory, Livermore, CA 94550

pfolta@llnl.gov

To date over 1.9 million sequences have been submitted to GenBank from the 2.9 million available I.M.A.G.E.1 clones. This number will increase sharply due to the new Mammalian Gene Collection2 project. To maximize the value of this information, the IMAGEne3 product has been extended to group the human sequences into clusters that represent both known genes and "candidate genes". For known genes, clustering eliminates redundancy by providing the best representative clone for a gene. For clusters not associated with a known gene, the results provide evidence of a possible gene discovery.

IMAGEne was first released to the public in 4/98 to provide the user community with known gene clusters of I.M.A.G.E. clones. Since then the product has undergone significant enhancements, including use of NCBI's RefSeq to base the known gene set, indication of sequence verified clones, repeat masking, enhanced error checking, and faster response times. Version 3.0 is the largest enhancement, which extends the functionality by forming clusters on clones not associated with known genes.

Clusters are formed by sequence similarity, clone membership, and internal I.M.A.G.E. project knowledge. The user can query the resulting cluster database and view the cluster members, ranked primarily by size, in a user-friendly Java-based display. Currently I.M.A.G.E. has clone representatives for 93% of the known genes. It defines 61,083 multi-member candidate gene clusters and over 236,000 singletons. By the conference date, IMAGEne 3.0 will be publicly available on the web.

  • Lennon, G., et al (1996) The I.M.A.G.E. Consortium: An Integrated Molecular Analysis of Genomes and Their Expression. Genomics, 33,151-152.
  • Strausberg, R.L., et. al. The Mammalian Gene Collection, Science 1999 Oct 15;286(5439): 455-7
  • Cariaso, M., et. al. IMAGEne I: Clustering and Ranking of I.M.A.G.E. cDNA Clones Corresponding to Known Genes, Bioinformatics, in-press

This work was performed by LLNL under the auspices of U.S. DOE, Contract No. W-7405-Eng-48.


152. Optical Mapping: A Complete System For Whole Genome Shotgun Mapping

Anantharaman, T, Apodaca, J., Aston, C., Clarke, V., Gebauer, D., Delobette, S., Dimalanta, E., Edington, J., Giacalone, J., Gibaja, V., Huff, E., Jing, J., Lai, Z., Lin, J., Limm, A., Mishra, B., Ni, L., Paxia, S., Qi, R., Ramanathan, A., Skiadis, Y., Vafai, J., Wang, W., Schwartz, D.C.

University of Wisconsin - Madison

Optical Mapping is a single molecule approach for the rapid production of ordered restriction maps from single DNA molecules. Fluorescence microscopy is used to image individual DNA molecules bound to derivatized glass surfaces, and cleaved by restriction enzymes. Fragments retain their original order, and cut sites are flagged by small, visible gaps. The system has advanced in several critical areas for mapping both clones and entire genomes (D. radiodurans and P. falciparum). We mapped these entire microbial genomes using megabased-sized genomic DNA molecules. Because large fragments of randomly sheared DNA are mapped with high cutting efficiency, many overlapping restriction site landmarks allow contigs to be assembled and a shotgun mapping strategy can be employed. High resolution whole genome maps can therefore be assembled without library construction and associated cloning artifacts. Because ensembles of single molecules are analyzed, small amounts of starting material are required enabling mapping of microorganisms which are problematic to culture. Whole genome maps firstly, enable the size of the genome to be accurately determined, an important prelude to any sequencing endeavor and secondly, provide an in situ picture of the architecture of the entire genome, revealing the number of chromosomes, extrachromosomal elements etc. Populations can be potentially be characterized by comparing maps from different strains. Recent efforts have been to create maps of E. coli O157:H7 (5.4 mgb) as a scaffold for facilitated sequence assembly and verification (Collaborator: F. Blattner, U. Wisconsin). We will compare maps generated from the sequence of E. coli K12 (4.6 mgb) to identify regions unique to O157 which could be targetted for sequencing. Notably, we have constructed a map of the whole human genome at a coverage of 0.6X showing feasibility of complete mapping of the human genome. To map megabase-sized molecules, we created a system to tile overlapping microscope images, with proper pixel registration. "Gentig" then automatically generates contigs from optical mapping data by repeatedly combining the two islands that produce the greatest increase in probability density, excluding any contigs whose false positive overlap probability is unacceptable. The standard deviation, digestion rate, false cut rate and false match possibility can be altered to change the number of molecules that "Gentig" contigs together. Visualization of such information-rich data; whole chromosome maps composed of many restriction sites and deep contigs, presents a challenge. "ConVEx" (Contig Visualizer and Expander) creates contigs from maps and uses a scalable viewer to visualize assemblies for editing. "ConVEx" is a zoomable interface which allows annotation and integration of other related information such as STS markers, sequence contigs and even sequence reads. "ConVex" is built on top of PAD++, which can be run on all major operating systems.


The online presentation of this publication is a special feature of the Human Genome Project Information Web site.