About the Gene Viewer

What does the Gene Viewer display?

The Gene Viewer is a graphical display tool based on dbSNP<=>refseq. Both CGAP SNPs and SNPs submitted to the NCBI dbSNP database by other investigators are shown in the context of transcripts, open reading frames and protein motifs.

Transcripts fall into three categories:

  • reference mRNA sequences maintained by the NCBI Reference Sequence (RefSeq) project
  • full-length mRNA sequences derived by the Mammalian Gene Collection (MGC) project
  • consensus sequences of EST alignments generated by the NCI CGAP-GAI project for use in SNP prediction.

How are SNPs mapped onto transcripts?

A SNP is mapped onto a transcript or consensus sequence by sequence homology. A DNA fragment conaining the polymorphism is compared to RefSeq sequences, MGC sequences and EST alignment consensus sequences via BLAST. We consider 98% identity in the region of overlap between the query and target sequences to represent a reliable match.

Please note that it is possible that a CGAP SNP predicted to lie in a gene in silico will fail to map to that locus on the basis of BLAST analysis (see below). In addition, a CGAP SNP may map to more than gene by BLAST.

How are open reading frames and protein motifs predicted?

Predicted products of RefSeq and MGC transcripts are taken from GenBank records.

Prior to ORF prediction, EST assembly consensus sequences were quality masked; segments which have an average phrap score below 30 or an individual position with a score below 20 were excluded from further analysis. The GeneMark program was then used to identify potential open reading frames in EST assembly consensus sequences.

HMMER and Pfam version 6.6 profiles were used to identify protein motifs in the predicted proteins. A predicted motif is diplayed if it has an E-value <= .1 or lies in a protein that contains another copy of the motif with an E-value <= .1. If overlapping motifs are identified, only the motif with the lowest E-value is shown.

Why isn't my favorite SNP shown?

A CGAP SNP is predicted in the context of an EST assembly and is assigned to a UniGene cluster by virtue of which cluster the ESTs are members of. In contrast, it is mapped to a specific nucleotide in a transcript by sequence similarity. A SNP may not show significant BLAST homology to the gene from which it was derived for one of several reasons:

  • the SNP is in an alternatively spliced form of the transcript which does not correspond to a RefSeq or MGC sequence
  • the SNP is in a region of low sequence complexity
  • the SNP is in a region of marginal sequence quality (due to poor EST coverage) which leads to spurious mismatches between query and target sequences.

To view a complete list of CGAP SNPs associated with the gene of interest, use the "CGAP SNP Summary" hyperlink.