Report from the Zebrafish Genomics Initiative Grantees, March 16, 1999 Meeting

Trans-NIH Mouse Initiative

Report from the Zebrafish Genomics Initiative Grantees Meeting
March 16, 1999

Progress reports from the following grantees are available on this site:

Mark Fishman
William Talbot
Len Zon
Stephen Johnson
Marnie Halpern
Monte Westerfield

Up to Top

Progress report of Mark Fishman, Massachusetts General Hospital
Project: Construction of a Genetic Linkage Map of Zebrafish
Grant Number: R01DK55390

Stated aims:
Increasing the density of microsatellite markers on the zebrafish genetic map to a total of 8,000 (providing an average inter-marker distance of 0.3 cM).

Update:
Our map is constructed using microsatellite repeats, primarily CA repeats. These markers are easy to use, occur frequently in the zebrafish genome. We have made them readily available, from Research Genetics. We began about 8 years ago with a small effort, supported by the NCRR, with the goal of placing 100 markers on the map. Because of the sense from the community that this was to be the essential genetic map for cloning and for anchoring of physical maps, we sought and obtained industrial sponsorship. This allowed us to generate a first edition map with 700 markers and by end of last year, a next installment of 2000. Assuming the zebrafish genome to be 2,400 cM, this gives a marker density of 0.8 markers per cM. The map is freely available on the web, and all markers available, without restriction, from Research Genetics.

With the new NIH funding, we will further increase the density and usefulness of our map. Our goals are: (a) to increase the density of markers to about three per cM, and (b) to develop informatics both for keeping, collating, and analyzing data and for distributing it to the community in a "user-friendly" way.

Trajectory as of 3/16/99:
To reach our goal of 3 markers per cM, we will need approximately 7,200 markers on the map; thus, we need to add 5,200 additional markers. From prior experience we know that approximately ten percent of the microsatellite-containing clones we isolated from our genomic library will end up on the map; the losses are taken early. After sequencing, clones are omitted for a number of reasons. Many are duplicates of clones we sequenced previously; others are too short to give reliable primers or do not give PCR products in the 200-300 base pair range (a range we chose, more or less arbitrarily, so that all markers could be resolved using the same gel conditions). In other words, to generate 7,200 markers we will need to sequence about 72,000 clones.

Progress to date:

Map	Clones Sequenced	Primer Sets	Polymorphic	Mapped
705 map	(not available)			705
2k map	12576	3250	1827	1295
TOTAL (previous maps)	16384	4183	1827	2000
Progress : as of 3/16/99	3808	933
Project goals	69921	18069	10158	7200
Remaining	53537	13886	8331	5200

Our goal is to add 400-500 new markers to the map every quarter, on average.

Update on informatics: Don Jackson (Massachusetts General):

While gearing up on the experimental side, we have been working to improve our methods for information management and distribution.

Our goal is to develop a database system that will allow us to easily track and consolidate information in-house, to submit it easily to the community resources, and to present it on a local server (providing multiple means of access to the same data). We want to make sure that the data is presented as rapidly and accessibly as possible. We have begun setting up an on-line database using data on the existing 2,000 markers. We are using this data to test and refine the design of our database so that as new markers are mapped they can be shared with the community as soon as possible.

Our first decision was selecting a database program. In the past, we used Filemaker Pro (a Macintosh-based system), but it could not easily handle the amount of information for even a 2,000-marker map. We needed a system with the power to rapidly search large data sets, to pull out solid data from all the steps of marker generation to a single resource, and to allow multi-user access simultaneously, where all of the functions involved, either entering or accessing data, could be automated. We also needed a system that would allow us to display our data on the Web and at the same time be inexpensive.

We selected PostgresSQL, a publicly available database program for UNIX/Linux computers that we chose for the following reasons:

Power: Postgres is a relational database, allowing rapid queries of large data sets while accommodating the various types of experimental data generated in the course of making the map.

Flexibility: Postgres is SQL compliant and internet-aware, allowing direct connections to other databases. For example, we are coordinating with Eck Doerry to allow ZFIN to directly query our database via the internet. This will allow ZFIN to update map data in the fastest and easiest possible manner.

Programmability: Interfaces are available for a number of programming languages including C, PERL and JAVA. We will use CGI scripts written in PERL to implement web-based user interfaces for local and remote access to the database.

Cost: the Postgres software is free. Our database server is a pentium II computer running the FreeBSD UNIX operating system and the Apache web server (cost: $1600).

Our database contains four primary tables: the first table contains sequence information for microsatellite clones that we have isolated; the second has information on primers (designed using STS-Pipeline, an automated sequence analysis package); the third has all the genotyping information; the fourth has position information for each marker placed on the map. We also have supporting tables showing information on homologies, targets, and eliminated sequences (undesired sequences, duplicate hits, etcetera).

Demonstration of Web site:
(URL: http://zebrafish.mgh.harvard.edu/mapping/ssr_map_index.html)

This site offers two display options:

(a) graphic display of all markers in a linkage group (or fragment thereof).
(b) detailed report on a single marker.

Option (a) dynamically generates a linkage map based on information in the database-i.e., as soon as the marker has been mapped and added to the database, it will appear on the Web site when someone pulls up that linkage group. Thus, the maps are always up to date. The linkage group figures are clickable image maps, allowing users to easily retrieve detailed information on any marker by clicking on it in the map figures. This displays a detailed report (as in option (b)). This option provides a choice of information: sequence information, BLAST results, characterization information on different strains, and detailed mapping information. This view includes links to the relevant sequence entries in NCBI and to map marker entries in ZFIN.

New information will be flagged according to the date it was added to the map. So far, we have deposited 2,600 sequences; that includes 1,800 sequences for markers and 800 other EST sequences that we are not calling markers.

Up to Top

Progress report of William Talbot, Stanford University
Project: Characterizing the Zebrafish Genome
Grant Number: R01DK55378

Stated aims:

1. Constructing a framework map in a homozygous diploid mapping panel by assigning 500 publicly available simple sequence length polymorphisms (SSLP; CA-repeat) markers. To be completed by end of first year.

2. Genetically mapping 3,000 genes in a homozygous diploid mapping panel by scoring single-strand conformational polymorphisms (SSCPs) in 3'UTRs and other nonconserved regions. First-year goal: 500.

3. Implementing informatics: Streamlining data management and allowing rapid public access to map information generated by the project, including comparative analysis, by means of a WWW interface.

Update:

We are working to create an integrated linkage map for the zebrafish with our collaborators: (1) John Postlethwait at University of Oregon, (2) informatics: Ruben Abagyan's group at NYU (now re-located at Scripps), (3) Ron Davis' group at the Stanford DNA Sequencing and Technology Center.

Rationales include

1. Functional genomics. A dense gene map for the zebrafish will facilitate the identification of candidates for mutations.

2. Comparative genomics. Genes are uniquely suited markers for comparing the structures of vertebrate genomes. Chromosomal segments conserved among vertebrates can be identified by comparing locations of genes in the zebrafish and their counterparts in humans.

Specific goals and progress on them:

1. Framework map construction. We have assembled a mapping panel with 47 homozygous diploid individuals (heat shock progeny from 2 C32/SJD F1 females) and collected enough genomic DNA to score more than 10,000 markers. The Postlethwait lab has distributed the panel to three additional mapping labs (Steve Johnson, Len Zon, Dave Beier). 189 SSLP markers from the Fishman group have been scored on the panel. We expect to complete the framework map by scoring ~300 more SSLP by fall 1999.

2. Mapping 3,000 genes and ESTs by SSCP. We have scored more than 140 SSCPs linked to genes and ESTs. 99 of these have been assigned tentative map positions; analysis of the rest is in progress. We plan to score at least 350 additional SSCPs by fall 1999.

We have also provided more than 600 of the primer sets we have synthesized to the Haffter group radiation hybrid mapping project. They have scored many of these markers, enabling a straightforward comparison of the genetic and physical maps.

3. Informatics tools: We have completed an SQL database for target selection and primer design. The database is accessible to project members via the WWW (http://saturn.med.nyu.edu:8080/zfish/pub/). We have designed primers for more than 2000 genes and ESTs in GenBank, of which about half have been synthesized.

We have also developed a tool for semi-automated phylogenetic comparisons. Starting with a mapped sequence, the system does BLAST searches, retrieves the 50 closest neighbors, and assembles a phylogenetic tree. At that point, we analyze the trees and decide which human genes are likely orthologues, or closest counterparts, to the zebrafish genes. Our comparative analysis has identified 28 chromosomal segments conserved between zebrafish and human.

We envision two protocols for data release.

1. Refined maps extensively checked for discrepancies. We will release the first of these (with 500+ markers) by July 1999 and generate new version every 4-6 months thereafter.

2. Between releases of refined maps, we will make our complete map data set available for download in MapManager format. This will allow any user to take the data and generate their own map using simple features of the MapManager software. These interim releases will contain preliminary map assignments, so it will be the responsibility of users in the community to evaluate the map data and decide which assignments are reliable enough for their purposes.

We will try to make it clear on the web site that preliminary assignments are not as reliable as assignments in the refined maps, for which all discrepant data points are identified and re-tested. We could devote more effort toward releasing refined maps more frequently, but this would divert effort from mapping additional genes. We hope this two-tier release system will provide rapid access to the map data for sophisticated users and at the same time allow us to release refined maps at a reasonable frequency.

Regarding integration of genetic maps: We have created a new haploid mapping panel (now published in Genome Research (9: 334-347). We scored 389 polymorphisms for a total of more than 18,000 genotype assays (about 10 percent of scope of the current heat shock diploid mapping project). These polymorphisms included 104 new gene/ESTs, 217 MGH SSLP markers, and 53 previous mapped genes. This facilitates comparison between the SSLP and gene maps because there are now quite a few common markers.

Recently we have mapped about 20 genes from the Thisses in Strasbourg. Their group is systematically examining expression patterns of clones in cDNA libraries, and they have provided 3' EST sequences for genes with interesting patterns. We plan to expand this mapping effort as part of our project, with the aim of adding map position to the Thisses database of expression pattern and sequence.

Up to Top

Progress report of Len Zon, Boston Children’s Hospital
Project: Construction of a Genetic Linkage Map of Zebrafish
Grant Number: R01DK55381

Stated aims:

1. Developing an RH map of the zebrafish genome.

A. Comparing four RH panels for retention frequency and resolution.

B. Constructing an anchored framework map of microsatellite markers and cloned cDNAs on the RH panel with the most appropriate resolution.

C. Positioning 5,000 EST markers on the RH panel.

2. Distributing information and RH information by means of a World Wide Web site.

Update:

Members of the team:
Marc Ekker
In charge of panel mapping: Yi Zhou

Regarding the RH panel mapping project: A lot of decisions are being made over the next 2 months about which panel to use, and on how the project will mature.

The choice of the Ekker panel or the Goodfellow panel is a choice that will have to be made.

Why do we need radiation hybrid panels? The real reason is to establish linkage to the zebrafish genes and candidates.

The goal: People in the community will want to know whether they have a candidate for the mutation. We can take candidate genes and put them on the panel, and integrate to see whether co-localization occurs with a mutant gene.

We now have two positional cloning projects in which the RH panels have helped. We have actually done a chromosomal walk in one of these cases. We have found a candidate gene very close to the linked marker. And when we have sequenced the candidate gene, there is a stop code indicating it is clearly the right gene.

We'd like this to happen more and more frequently. In other words, the effort is to make the RH panel accessible to the community so that they won't have to do too many positional cloning projects.

Update (Ekker):

The panel we made was done using a zebrafish AB9 as donor and a mouseB78 as recipient.

We first made some tests with varying doses of radiation, since this influenced the retention and the average fragment size in the panel. We did preliminary characterization of three panels, produced at doses of 3,000, 4,000, and 5,000 rads, and looked for retention rate and fragment size.

From there we created what we felt would be an optimal panel, with retention rate as determining factor. This panel of 93 lines was extended so we had at least 10 mg DNA per line, with possibly one or two exceptions. This panel has been distributed to the community. (Funded under contract with the National Institute of Child Health and Human Development (NICHD.)

Since the last meeting, we met for genotyping of this RH panel. It was a collective effort, obtaining data from my lab, as well as from Zon, Johnson, Dawid, Hudson in Montreal, and others.

We tested around 1,200 markers. The overall retention was 22 percent., varying according to the different linkage groups between the lowest group 12 (13 percent) and the highest linkage groups 20 and 14 (at 36 percent retention).

We had the opportunity to compare these with the Goodfellow panel, and for many of the linkage groups there seems to be a good correlation. But some linkage groups are more represented in our panel, and some are better represented in the Goodfellow panel (e.g., linkage group 5).

All of the analysis was done by Igor Dawid's group using RHMAPPER. (Note that analysis of the Goodfellow panel was done by another program, called SA Mapper.)

From these data we determined the average fragment size at 14.8 Mb (1 cR = 148 kilobases). (The average fragment size in the Goodfellow panel is about three times smaller.)

Retention was about the same: 22 percent for ours, 18 percent for the Goodfellow panel. This indicates a smaller number of larger fragments in Ekker panel, while a larger number of smaller fragments in the Goodfellow panel. In other words, we have confirmed our thinking in September: that what we probably have in our panel is a smaller number of larger fragments, and in the other panel a larger number of smaller fragments.

With the data, we established a framework map with 703 markers (total map size, 11,501 centi-rads). You can calculate potential resolution to about 750 kb.

We tested about 300 EST sequences or CA repeats, and cloned cDNA. So far, the coverage (the percentage of times we were able to determine map position for these sequences) is around 87 percent. (The Goodfellow panel's coverage is 83-84 percent.)

In progress:

1. Establishing a Web site where people can see the data and also utilize the framework map; collaborating closely with ZFIN.

2. In the coming 3 months, planning to work on more weakly covered regions of the RH map (i.e., either a lower density of markers, or where the linkage between markers doesn't give as strong a score as others).

3. Will make more detailed comparisons with Goodfellow panel, or Research Genetics, and ours-among other things, by analysis with the same program (whether RHMAPPER or SA Mapper).

The groups working on the RH panel are working very closely together. We have had at least one phone conference including NIH members. The goal is to get data analyzed in a way that we can make sense of it, to make a choice as to how to move further.

Our lab's major contribution is that we have done seven chromosomal walks, so we have the ability to actually test the resolutions of the panels. We have typed four of those walks. We are happy with the data.

The one walk for which we have physical data as well as genetic data is for the sauternes gene.

If you look at the Research Genetics panel and look at the order that is generated from the markers, the order matches the order that we got from Research Genetics and from the physical distances. If you look at the Mark Ekker panel, there are inversions; so it is not exact. In other words, for three of the four walks we have looked at, the Goodfellow panel predicts the order more accurately than does the Ekker panel (23.8 percent).

However, for one of the walks, the Research Genetics panel bunches all these markers in roughly a 200 kb region-but the Ekker panel can resolve that with the correct genetic order. In short, it is good that we have two panels for positional cloning projects.

Although for positional cloning projects it is great to have the two panels, in choosing the panel that will be utilized to put 5,000-10,000 ESTs on it, we are going to have to make a judgment call.

(Re genetic order: Research Genetics panel, 12.8 percent
Mark Ekker panel, 14.7 percent)

The data are almost available. In theory, we will analyze the data with same program and will use robots in 384 well format, but the system will not be set up for another month or two.

Up to Top

Progress report of Stephen Johnson, Washington University School of Medicine
Project: Integrated Zebrafish Genomic Resources
Grant Number: R01DK55379

Stated aims:

1. Generation of expressed sequence tags (ESTs) from various tissues and stages of zebrafish development.

A. Oligonucleotide hybridization fingerprint clustering of 278,000 independent zebrafish cDNA clones from various tissues and developmental stages to be analyzed to identify clusters, each likely to represent a different zebrafish gene.

B. Generation of 5´ and 3´ sequencing reads from representative cDNAs of up to 50,000 different clusters.

2. Sequence tagged site (STS) development from EST sequence and generation of a radiation hybrid map.

A. Generation of 7,500 STS markers from 3´ EST reads.

B. Genotyping of 2,500 EST-based STSs on a radiation hybrid (RH) panel. Provision of up to 5,000 EST-based STSs for collaborating RH typing projects (Len Zon) to generate a 10,000-marker RH map. Markers genotyped on the RH panel to include markers identified as SNPs and genotyped on meiotic panels (i.e., aim 3), allowing maximal integration between genetic and physical maps.

C. Improvement and maintenance of inbred genetic strains.

Update:

A. Informatics

Everyone has access to the data as soon as we do, and can visit home page-Washington University Zebra Fish Genome Resources-and link to our EST project. We do not yet have the RH mapping project on line or have access to a panel, but in a few weeks we will have a link to an integrated map.

With NCBI or BLAST, you can search with a sequence, keyword, or name of clone for what you might be interested in, but with NCBI it is actually very difficult to browse. We developed a way to browse groups of sequences on web. This enables users to look window shop for possibly interesting clones, or to detect possible errors in submission that might interfere with accurate clone retrieval. Early on, we found that occasionally when we did our annotation and assigned submission numbers corresponding to well positions, the 3' reads sometimes don’t match 5’ reads in register. Instead, they may correspond to reads apparently a few wells off, or there may be no correspondence between 3' reads and 5' reads. This can result from lane tracking errors on the sequencing gel, or alternatively from labeling errors during sequencing chemistry. Our browsing mode gives the zebrafish research a chance to survey clones that are likely to appropriately correspond to its archived well prior to ordering. We have recently resequenced ~ 2% of the EST project to identify such lane tracking or labeling errors that effect large numbers of clones, that will allow the appropriate correction to be made for more accurate clone retrieval.

B. EST Project

Total estimated zebrafish ESTs in database: 13,252 (100 percent):

Fishman (heart)	1971	15%
Gong (Singapore)	1080	15%
Talbot (NYU)	180	1%
WashU	10021	78%

Total	13252	100%

Currently we are shipping about 700 ESTs a week from NCBI to our own databases. There is about a 1-week lag; we are working on reducing it.

Now that we have done some sequencing, we can ask how well the project is working. If you did not precluster the library, if you did not normalize it somehow, you would find you were sequencing some of the highly expressed genes over and over again-i.e., the problem in the EST project is to remove the redundancy. That is what Matt Clark's project in Berlin is doing; what we are asking is how well it works.

Example: the fin:

About 3,500 reads, about 13 percent annotated (ribosomal protein)-similar to Talbot's project (about 16 percent). By comparison, the fingerprinting library we have been using has only about 3 percent ribosomal protein. Some of these proteins are duplicated in the library (hit 2 or 3 times). So the project isn't working perfectly, but pretty well. We are happy enough with it for the next year.

We are starting with two libraries for the first part of the project: a late somitogenesis zebrafish embryo library and an adult liver library. After that, the fingerprint project will additional, novel clones from shield stage library, fin regeneration libraries, kidney, brain, and nose. We would also like to find some other stages-e.g., when organogenesis is getting underway (2-3 days of embryonic development).

In part, the choice of libraries was up to Matt Clark: it was clear we needed good representation at the early embryo stage, since most people in zebrafish are working on early embryo. Others were just what we could get into the pipeline at the beginning of the project. We found some good brain libraries, for instance, but we do need more. Part of the project is to get more into the pipeline.

C. Regarding RH panels

The goal is to type a lot of the genes on an RH panel in order to create a high density of markers, and so that we know genes in regions of chromosomes that will give us ideas to send.

When we compared two panels on a fairly small sample, we found 25.4 percent retention of markers on the Ekker panel and about 13-14 percent on the Goodfellow panel. Overall, regarding the whole genome, Goodfellow has 18.3 percent retention, and Ekker 20.7 percent (1,000 markers). (That is not a random distribution.)

More recently on this panel, we have been typing ESTs (not mapping, because we do not yet have access to the data), and finding 23.4 percent retention.

In addition, I've been told that approximately 90 percent of ESTs we're typing are actually placed on the map and are located somewhere with high confidence. With the Goodfellow, the figure is more like 80 percent.

We have good retention, good reliability, and, likely, good resolution. We will probably use the Ekker panel.

Our contribution to the genotyping effort so far: We have to keep things going through the "factory"; we have genotyped 59 CA repeats, 34 genes, 125 ESTs, and 218 markers.

We are currently funded for 800 markers a year, so we are a little behind schedule. But we are now typing 20 markers a week. With a little more money, by the end of the year we could probably scale to 30 (in order to meet our original goal of 1,800 markers a year).

How we present the data:

At the last meeting on Non-mammalian Model Systems, many of us who do genomics in zebrafish were "politely reprimanded" for not integrating all the different maps, so I have been working hard over the past month to integrate them.

Progress report:

365 STSs on the haploid map.
3,077 cM.
215 SSLP/SSR.
150 genes on this panel from John Postlethwait’s lab and others (120 from Washington University).

We will integrate gene map and Massachusetts General Hospital (MGH) map; it, in turn, will give a framework of markers for presenting the data in a way that you can look at the map and say, "I can now draw information from other projects as well." (Time estimate: in about 3 weeks.)

Up to Top

Progress report of Marnie Halpern, Carnegie Institution of Washington
Andreas Fritz, Emory University
Project: Generation of a Deletion Panel for the Zebrafish Genome
Grant Number: R01DK55390

Stated aims:

1. Collecting, preserving, and cataloguing existing deficiency and translocation strains.

2. Recovering and characterizing recently isolated and newly gamma-ray-induced deficiencies

A. Mapping existing potential deletion mutants.
B. Screening for new mutations.

3. Localizing and determining the extent of gamma-ray-induced deficiencies for assembly of a deletion panel for the zebrafish genome:

A. Refining the existing deletion map.
B. Mapping already recovered mutations.
C. Mapping new gamma-ray-induced mutations.
D. Assembling the deletion DNA panel

4. Correlating expressed sequences with mutant phenotypes:

A. Cataloguing deficiency phenotypes.
B. Retrieving deficiencies from cryopreserved sperm.

Update:

1. We have obtained and recovered carriers from a number of the Oregon lines.

Our first aim was to collect and preserve existing deletion strains, which were scattered among different stocks/labs. A lot of these strains are not trivial to work with, because many are carried as balanced reciprocal translocation that exhibit non-Mendelian segregation frequencies. Deletion phenotypes are segregated out upon haploid production

2. We have obtained DNA for 13 previously identified deletions (i.e., from haploid mutant phenotypes in sufficient amount that it will end up part of a DNA panel that can be used in typing).

3. We have isolated DNA from 21 newly identified deletions.

We have carried out five mutagenesis screens so far (squeezed about 500 fish); 102 females have been productively screened, i.e., have obtained at least 50 embryos for phenotypic analysis at 24 hours. Out of these, there are more than 43 potential new lines. We have already recovered 12 with diploid phenotypes in the F1. We recover potential deletions in next generation by back-crossing to the initial mother or by F1 blind intercrosses. The quickest and easiest way of recovering deletions is by producing haploids from F1 females.

Summary of screen phenotypes:

Largest class:	CNS necrosis	(33)	(localized or global brain degeneration)
	Tail	(19)
	Early short axis	(10)
	Eyes	(6)
	General necrosis	(4)

Sometimes in the initial haploid screen you don't realize all of the phenotypes that are present; thus, new phenotypes (and hence deletions) can be recovered in the next generation.

In our screen, we are only using gamma-irradiated females. If zebrafish colleagues are interested in screening for specific mutations, we're willing to send them gamma-irradiated males and they can return recovered deletions for inclusion in the panel. We have already sent gamma-irradiated males to several other labs. For example, Dr. Solnica-Krezel at Vanderbilt was interested in finding a deletion uncovering the bozozok mutation to confirm that it was a null allele (described in Fekany et al., 1999). She returned an allele to our lab that she had identified and mapped.

In order to give out deletion strains to the community in the future, it will be important to have a catalogue of what the mutant phenotypes look like. We are documenting haploid and diploid phenotypes by digital imaging.

4. We have commenced sperm freezing.

5. We have begun to assemble multiplexed sets of primers.

Andreas Fritz has been working out procedures to multiplex either previously mapped genes or genes of interest that have not been mapped yet. He is particularly focusing on using markers for chromosomal regions that we do not already have deletions for. We are also in the process of arraying Z-markers so we will be able to take an unknown deletion and run a series of markers on it to map it. We currently have prepared DNA from over 20 new deletions that he is in the process of mapping. We would like to come up with a quick and non gel-based method of mapping deletions (e.g., using fluorescent primers that will give a simple "plus" or "minus").

On getting resources out to the community: We are not ready to establish a data base; however, even though we don't yet have a formal database, we receive deletion requests from researchers around the world. We send back either fish or DNA (if available).

Responding to requests can be "incredibly labor-intensive" and we're not at the stage where we can make this a general service. In fact, the Halpern lab is unlikely to have staff to generalize this service and it would be better dealt with by a stock center. Ideally, we would like to get the deletion strains out of our labs and into a stock center facility. Our principle goal at this point, though we do respond to all requests in a timely fashion, is to get more deletions generated and mapped.

Up to Top

Progress Report of Monte Westerfield, University of Oregon
Project: Informatics
Stated aim: To have all information presented in a centralized way.
Grant Number: P40RR12546

Update:

Usage. Use of the ZFIN database has continued to increase to over 80,000 "pages" of information requested each month. Most users are located in the United States, England, and Germany. The ZFIN staff conducted an email survey to learn how users feel about ZFIN. Approximately 15% of the registered users responded. The majority of users rated ZFIN as "very" or "somewhat" useful.

Contents.

ZFIN currently contains:

Record Type	No. of Records
Community Information
Person	1740
Lab	200
Company/Supplier	35
Publications	1991
Mutants and Phenotypes
Mutants	1915
Images	1851
Genomics
Meiotic Panels	3
Rad. Hybrid Panels	0
Anonymous Markers (RAPD, SSLP, AFLP)	3138
Genes	326
ESTs	43
Mutants	178

Release of confidential data. Based on discussion at the last awardees meeting, the ZFIN staff developed a method for information to be made public in an anonymous way. This allows researchers to post information about mutants and mapped genes and mutants prior to publication, thus encouraging collaborations without jeopardizing publication priority. The map positions of most genes and mutants from the Tuebingen laboratory are now shown on ZFIN under "anonymous" names with an email contact to obtain more information.

Development of tools for entering and viewing the genetic map. Mapped genes and mutants are now completely integrated with anonymous map markers. Thus, a user can search on a gene and get information about the gene, including both primary information and information on how it was mapped (i.e., data that support the map assignment) and a tabular listing of linked markers. The long-term goal is to provide graphical representations of the maps. However, given current financial constraints, effort to date has been directed toward getting the data into ZFIN and making it available in an, albeit, simple form. The gene and genomic data in ZFIN will be released publicly in April.

Submission of data. The laboratories funded by the Zebrafish Genome Initiative have been working closely with the ZFIN staff to develop methods for bulk submission of data. Meiotic maps have been submitted from the Fishman, Postlethwait and Talbot laboratories. To date, no RH data has been submitted. However, the ZFIN staff has been working with the Ekker and Dawid groups to develop tools for bulk submission of RH data and for making the RHMAPPER and MAPMAKER data available from ZFIN as flat files. They are also developing tools for automatic submission of data from the Fishman meiotic map.

Support of orthology/homology relationships. An immediate goal is to link each zebrafish gene record to records of homologues in other species where homologous relationships are known. A longer-term goal is to provide information common to all model organisms in a unified database. As a result of the NIH Model Organism Databases Workshop held in December 1998, the first steps are being taken to identify the minimal set of data shared by all the model organisms and then to provide links to these data in ZFIN. A central database, like NCBI, will probably ultimately handle this information.

Up to Top

Comments