RefSeqGene Guide

A RefSeqGene sequence includes representation of a subset of mRNAs and coding regions that have been selected to serve as reference standards. The RefSeqGene sequence is also annotated with variation reported to dbSNP and dbVar and can be analyzed by a variety of tools at NCBI.  This guide is provided to help direct attention to major methods that can be used to:

  1. Determine how a set of sequences you have generated align to a RefSeqGene and compare to variation annotated on that sequence.
  2. Interconvert the location of sequence variation in genomic assembly coordinates and RefSeqGene coordinates.
  3. Calculate the HGVS expressions for large numbers of variation calls, including those based on the RefSeqGene.

There are also presentations and handouts that relate to RefSeqGene.

  1. Our education group posts many fact sheets about NCBI resources.  RefSeqGene is included in the set available from our Education pages.
  2. Recent presentations:
    1. Medical Genetics Resources  October, 2011
    2. Introducing RefSeqGene, November, 2011

Align sequences to a RefSeqGene, and compare to annotated variation

 

There is now an interface at NCBI to expedite comparing your nucleotide sequence to that of a RefSeqGene.

The interface is based on the standard BLAST submission page, and can be accessed from BLAST's home page in the Specialized BLAST section, and RefSeqGene's home page. This interface provides the following functionality for users of RefSeqGene/LRG.

Step 1: What sequence(s) do you want to align?

You can submit one or more sequences to be aligned to the RefSeqGene/LRG. Assuming that you want to align your own sequences, you might want to try something like this, where you have multiple sequences, separated by a description starting with a ">"

generate a file with all the sequences you want to align

>mysequence1
AGATATACTGGGCCCCTGCGCTCAGGAGGCCTTCACCCTCTGCTCTGGTTCATTGGAACAGAAAGAAATG TATTTATCTGCTCTTCGCGTTGAAGAAGTACAAAATGTCATTAATGCTATGCAGAAAATCTTAGAGTGTC CCATCTGTCTGGAGTTGATCAAGGAACCTGTCTCCACAAAGTGTGACCACATATTTTGCAAATTTTGCAT GCTGAAACTTCTCAACCAGAAGAAAGGGCCTTCACAGAGTCCTTTATGTAAGAATGATATAACCAAAAGG AGCCTACAAGAAAGTACGAG

>mysequence2
TGGACGGGGGACAGGCTGTGGGGTTTCTCAGATAACTGGGCCCCTGCGCTCAGGAGGCCTTCACCCTCTG CTCTGGTTCATTGGAACAGAAAGAAATGGATTTATCTGCTCTTCGCGTTGAAGAAGTACAAAATGTCATT AATGCTATGCAGAAAATCTTAGAGTGTCCCATCTGTCTGGAGTTGATCAAGGAACCTGTCTCCACAAAGT GTGACCACATATTTTGCAAATTTTGCATGCTGAAACTTCTCAACCAGAAGAAAGGGCCTTCACAGTGTCC TTTATGAGCCTACAAGAAAGTACGAGATTTAGTCAACTTGCTGAAGAGCTATTGAAAATCATTTGTGCTT TTCAGCTTGACACAGGTTTGGAGTATGCAAACAGCTATAATTTTGCAAAAAAGGAAAATAACTCTCCTGA ACATCTAAAAGATGAAGTTTCTATCATCCAAAGTATGGGCTACAGAAACCGTGCCAAAAGACTTCTACAG AGTGAACCCCGAAATCCTTCCTTGC

>mysequence3
GCACGAGGATTCTTCTGAAGATACCGTTAATAAGGCAACTTATTGCAGTGTGGGAGATCAAGAATTGTTA CAAATCACCCCTCAAGGAACCAGGGATGAAATCAGTTTGGATTCTGCAAAAAAGGCTGCTTGTGAATTTT CTGAGACGGATGTAACAAATACTGAACATCATCAACCCAGTAATAATGATTTGAACACCACTGAGAAGCG TGCAGCTGAGAGGCATCCAGAAAAGTATCAGGGTAGTTCTGTTTCAAACTTGCATGTGGAGCCATGTGGC ACAAATACTCATGCCAGCTCATTACAGCATGAGAACAGCAGTTTATTACTCACTAAAGACAGAATGAATG TAGAAAAGGCTGAATTCTGTAATAAAAGCAAACAGCCTGGCTTAGCAAGGAGCCAACATAACAGATGGGC TGGAAGTAAGGAAACATGTAATGATAGGCGGACTCCCAGCACAGAAAAAAAGGTAGATCTGAATGCTGAT CCCCTGTGTGAGAGAAAAGAATGGAATAAGCAGAAACTGCCATGCTCAGAGAATCCTAGAGATACTGAAG ATGTTCCTTGGATAACACTAAATAGCAGCATTCAGAAAGTTAATGAGTGGTTCTCCAGAAGTGATGAACT \ GTTAGGTTCTGATGACTCACATGATGGGGAGTCTGA

Step 2. upload the sequence to the RefSeqGene BLAST page

You can copy/paste your sequences into the query box or use the upload function.

You may upload multiple sequences at the same time; you don't have to process one by one. If you enter multiple, the BLAST, the BLAST result page will show the details of one at a time (select which to view based on the menu labeled A in Figure 1), but if you click on Graphics link in the overview section (labeled B in Figure 1), you will see all your query sequences aligned.

Step 3. Review the results

The alignments can be displayed using standard functions of NCBI's graphic sequence display. If you are not familiar with this tool, please take a look at the video tutorials we provide at YouTube.

Step 4. Compare any differences to known variation

If there is a mismatch, insertion, or deletion when aligning any of your test sequences to a RefSeqGene, you can compare the location of that variation to any annotation on the RefSeqGene using the following steps:

  1. Open the viewer controls labelled Configure.
  2. Open the Variation section options by clicking on Variation in the column at the left.
  3. Click on all selections, and press Configure
  4. Mouse over any annotation to learn more about what has been submitted to NCBI's databases about variation at that location.

Interconvert the location of sequence variation in genomic assembly coordinates and  RefSeqGene coordinates

NCBI provides a Genome Remapping Service, with a special section dedicated to processing RefSeqGene sequences. There is extensivie help documentation associate with the site, which will not be repeated here. Suffice it to say the process is as simple as:

  1. Define the coordinate system with which you are beginning, e.g. GRCh37 (hg19).
  2. Define the sequence set to which you want the coordinates mapped, e.g. RefSeqGene.
  3. Define what you want included in the report.
  4. Upload or paste in the your data.
  5. Download a report.

Calculate HGVS expressions and get a report of functional consequence for large numbers of variation calls, including those based on the RefSeqGene

NCBI provides Variation Reporter, which processes reports of locations of variation, and returns information about what is known about variation at those locations according the NCBI's latest annotation. The full report (available by download) reports the location of the variation in multiple coordinate systems, including RefSeqGene. It also accepts input by location on a RefSeqGene.

Last updated: Sun, 2012-02-26 13:01