NCBI Reference Sequence (RefSeq)

PubMed	All Databases	BLAST	OMIM	Books	Taxonomy	Structure
Search for

Brief Description
Scope
Announcements
Access and Availability
Distinguishing Features
References

NCBI Reference Sequences

The Reference Sequence (RefSeq) collection aims to provide a comprehensive, integrated, non-redundant, well-annotated set of sequences, including genomic DNA, transcripts, and proteins. RefSeq is a foundation for medical, functional, and diversity studies; they provide a stable reference for genome annotation, gene identification and characterization, mutation and polymorphism analysis (especially RefSeqGene records), expression studies, and comparative analyses. [more...]

Scope

NCBI provides RefSeqs for taxonomically diverse organisms including eukaryotes, bacteria, and viruses. Additional records are added to the collection as data become publicly available.

Announcements

May 2008: An update for the human CCDS set was released. This update adds 2,151 CCDS IDs, bringing the total to 20,159 consistently annotated coding regions that pass all CCDS QA tests. [more]

November 13, 2007: An update was released for the HIV-human interaction project. This update adds interaction data for the envelope protein.

October 3, 2007:In collaboration with UniProtKB, the RefSeq group is now reporting explicit cross-references to Swiss-Prot and TrEMBL proteins that correspond to a RefSeq protein. These correspondences are being calculated by the UniProtKB group, and will be updated every three weeks to correspond to UniProt's release cycle. The data are being made available from several sites within NCBI including Entrez Gene (Reference Sequences section), the NCBI Protein database (Links menu), and by ftp.

August 23, 2007:An appendix describing the distinction between GenBank and RefSeq is now available in the NCBI handbook.[more...]

July 13, 2009: RefSeq Release 36 available for FTP

This release includes:

Proteins:	8,181,910
Organisms:	8,665
Available at:	ftp://ftp.ncbi.nih.gov/refseq/release/

To receive announcements of future RefSeq releases and incremental large updates please subscribe to NCBI's refseq-announce mail list: refseq-announce

Announcing the Consensus Coding Sequence (CCDS) database. More information is available at: http://www.ncbi.nlm.nih.gov/CCDS/

BLAST databases: Formatted genomic, mRNA, and protein RefSeq BLAST databases are available for FTP.

announcing HIV-1 protein interaction data, more information at http://www.ncbi.nlm.nih.gov/RefSeq/HIVInteractions/index.html

Data Access and Availability

RefSeq is accessible via BLAST, Entrez, and the NCBI FTP site. Information is also available in Entrez Genomes and Entrez Gene, and for some genomes additional information is available in the Map Viewer. Special properties have been defined to facilitate Entrez-based retrieval. Also see: Entrez Query Hints

Distinguishing Features

The main features of the RefSeq collection include:

blank spacer gif non-redundancy

explicitly linked nucleotide and protein sequences

updates to reflect current knowledge of sequence data and biology

data validation and format consistency

distinct accession series (all accessions include an underscore '_' character)

blank spacer gif ongoing curation by NCBI staff and collaborators, with reviewed records indicated

References

Please refer to the Publications page for a full list of articles describing or using the RefSeq dataset. When using the RefSeq database, please cite one of the following:

The NCBI handbook [Internet]. Bethesda (MD): National Library of Medicine (US), National Center for Biotechnology Information; 2002 Oct. Chapter 17, The Reference Sequence (RefSeq) Project. Available from http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Books

NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins
Pruitt KD, Tatusova, T, Maglott DR
Nucleic Acids Res 2007 Jan 1;35(Database issue):D61-5
[Full Text in PubMed Central]