RefSeqGene and LRG (Locus Reference Genomic)

Overview/ Time Line

The RefSeqGene project got started at NCBI early in 2006 based on a request from Dr. M. Gulley of the College of American Pathologists, reinforced by staff of the CETT program at NIH's Office of Rare Diseases. Discussion soon expanded to curators of locus-specific databases, especially since one of our early collaborators, Dr. Sue Povey, curated TSC2 which was also of interest to CETT. The first RefSeqGene records were released early in 2007, with over 1000 being public at the end of 2008.

In April, 2008, staff of NCBI (RefSeqGene, dbSNP and dbGaP), GEN2PHEN, and EBI met in Hinxton, UK, to discuss how to establish a stable internationally accepted sequence standard for reporting the position of human variation. The experience of the RefSeqGene project was reviewed, and a proposal for implementation was distributed to LSDB curators. This proposal was published.

The RefSeqGene/LRG collaboration has defined methods to establish LRG accessions to RefSeqGene squences. Tools were developed at NCBI to convert RefSeqGene sequences to the LRG format and a database to track the accessions is in place.  The LRG has an established web site ( http://www.lrg-sequence.org/) with comprehensive documentation.

How do RefSeqGene and LRG compare?

How RefSeqGene and LRG are similar

When a RefSeqGene record is assigned an LRG accession, it means that the sequence, the definition and labeling of exons, and the definition of product transcript(s) and protein(s) are identical for that version of the RefSeqGene and the LRG. In other words, it will make no difference if variants are reported in LRG or RefSeqGene coordinates.

How RefSeqGene and LRG differ

The format of the RefSeqGene record differs from that of an LRG.  Take for example NG_007400.1 ( GenBank format, Graphics format) compared to LRG_1. Also, there are still RefSeqGene accessions that have not yet been assigned an LRG accession.  Be assured that key identities will be maintained:

  1. the genomic sequence
  2. the location of exons
  3. the sequence of the reference cDNA
  4. the sequence of the protein product

Which RefSeqGenes have been established as LRG accessions?

A RefSeqGene that has been assigned an LRG accession can be identified by the following modifications:

  1. The LRG identifier is displayed in the sequence title, as in ( LRG_1)
  2. The t1/t2/... identifier(s) is/are provided as a cross-reference to each RefSeq cDNA.
  3. The p1/p2/... identifier(s) is/are provided as a cross-reference to each RefSeq protein.

A URL to retrieve/display the sequence in GenBank format can be constructed by appending the LRG identifier after
http://www.ncbi.nlm.nih.gov/nuccore/ as in http://www.ncbi.nlm.nih.gov/nuccore/LRG_1

All LRG sequences can be retrieved from the Nucleotide database by using a range query: LRG_1:LRG_9999[ACCN]

Last updated: Wed, 2011-09-07 16:32