CCDS Home FTP Process Releases & Statistics AUG-guidelines
Collaborators EBI NCBI UCSC WTSI
Contact Us email CCDS
Genome Displays Related Resources Gene HomoloGene RefSeq UniGene
|
|
The Consensus CDS (CCDS) project is a collaborative effort to identify a core set of human and mouse protein coding regions that are consistently annotated and of high quality. The long term goal is to support convergence towards a standard set of gene annotations. Available information includes: | Announcements | | CCDS Update Released for Mouse August 14, 2012 The NCBI, Ensembl, and Sanger (Havana) annotation of the mouse reference genome (NCBI assembly ID GCF_000001635.20, NCBI annotation release 38.1, Ensembl annotation release 68) was analyzed to identify additional coding sequences (CDS) that are consistently annotated. CCDS data is available in the CCDS web site and FTP site and will become available in the collaborators' genome and/or gene browser web sites according to each browser's update cycle. This update adds 958 new CCDS IDs, reinstates 1 previously withdrawn CCDS ID, and adds 506 Genes into the mouse CCDS set. Mouse build 38.1 includes a total of 23,027 CCDS IDs that correspond to 19,945 GeneIDs. See the Releases & Statistics report for details. Reporting of CCDS Releases July 12, 2012 The BuildInfo file in the CCDS FTP site is being updated to add additional information about CCDS releases. This file contained the NCBI taxonomy ID and the NCBI build number and version. The new layout for this file adds five columns and reflects a new style for the NCBI build number: tax_id | NCBI taxonomy identifier |
---|
ncbi_release_number | NCBI build number and version (current; for example "37.3") NCBI annotation release number (future; for example "101") |
---|
ensembl_release_number | Ensembl release number |
---|
assembly_name | The short name of the assembly on which this CCDS release is based |
---|
assembly_id | The assembly accession and version |
---|
ccds_release_number | CCDS release number |
---|
date_made_public | Date when the CCDS release was made public, in YYYYMMDD format |
---|
The complete report of CCDS releases and their release numbers can be found by clicking the Releases & Statistics link on the left hand sidebar. Search by NCBI GI May 10, 2012 When searching for an NCBI sequence in CCDS by Nucleotide ID or Protein ID, you may now specify a GI or the more general accession identifier. As a new GI identifier is provided for any sequence change, addition of this query support facilitates rapid identification of a CCDS ID and version that is associated with a specific sequence. CCDS Update Released for Human September 7, 2011 The NCBI, Ensembl, and Sanger (Havana) annotation of the human reference genome (NCBI build 37.3) was analyzed to identify additional coding sequences (CDS) that are consistently annotated. CCDS data is available in the CCDS web site and FTP site and will become available in the collaborators' genome and/or gene browser web sites according to each browser's update cycle. This update adds 972 new CCDS IDs, reinstates 2 previously withdrawn CCDS IDs, and adds 91 Genes into the human CCDS set. Human build 37.3 includes a total of 26,473 CCDS IDs that correspond to 18,471 GeneIDs. See the statistics report for details. See Past Announcements
| Overview | | Annotation of genes is provided by multiple public resources, using different methods, and resulting in information that is similar but not always identical. The human and mouse genome sequence is now sufficiently stable to start identifying those gene placements that are identical, and to make those data public and supported as a core set by the three major public genome browsers. The long term goal is to support convergence towards a standard set of gene annotations. Toward this end, the Consensus CDS (CCDS) project was established. The CCDS project is a collaborative effort to identify a core set of protein coding regions that are consistently annotated and of high quality. | Access and Availability | | Initial results from the Consensus CDS project are now available from the participants' genome browser Web sites. In addition, CCDS identifiers are indicated on the relevant NCBI RefSeq and Entrez Gene records and in Map Viewer displays of RNA (RefSeq) and Gene annotations on the reference assembly. CCDS reports can be accessed by following provided links, or by directly querying the underlying database using the query interface provided at the top of this page. The CCDS dataset is also available for anonymous FTP. | Collaborators | | The CCDS set is built by consensus among the collaborating members which include: We envision the CCDS set will become more complete as the independent curation groups agree on cases where they initially differ, as additional experimental validation of weakly supported genes occurs, and as automatic annotation methods continue to improve. Communication among the CCDS collaborating groups is an ongoing activity that will resolve differences and identify refinements between CCDS update cycles. | CCDS Identifiers and Tracking | | Annotated genes that are included in the CCDS set are associated with a unique identifier number and version number (e.g., CCDS1.1, CCDS234.1). The version number will update if the CDS structure changes, or if the underlying genome sequence changes at that location. With annotation and sequence based genome browser update cycles, the CCDS set will be mapped forward, maintaining identifiers. All changes to existing CCDS genes are done by collaboration agreement; no single group will change the set unilaterally. | Process Flow and Quality Testing | | The CCDS set is calculated following coordinated whole genome annotation updates carried out by the NCBI, WTSI, and Ensembl. Annotation updates represent genes that are defined by a mixture of manual curation and automated computational processing. The main curation groups are the Havana team at the WTSI and the RefSeq annotation group at NCBI. In addition, the manually curated information on chr14 (Genoscope) and Chr7 (Wustl) has been brought in via the Vega resource. The automatic methods are via the Ensembl group and the NCBI genome annotation computational pipeline. Curated information is favored over automated information and the information has to be both consistent in the Hinxton (Vega/Ensembl) and NCBI groups and also pass stringent QC controls. The general process flow for defining the CCDS gene set includes: - compare genome annotation results
- identify annotated coding regions that have identical location coordinates on the genome
- quality evaluation
- remove lower quality CDSs from the core set pending additional review among the collaboration groups.
The CCDS set includes coding regions that are annotated as full-length (with an initiating ATG and valid stop-codon), can be translated from the genome without frameshifts, and use consensus splice-sites. The number and type of quality tests performed may be expanded in the future but includes consistency in cross-species comparative analysis, analysis to identify putative pseudogenes, retrotransposed genes, consensus splice sites, supporting transcripts, and protein homology.
| Publication | | Please use the following citation for CCDS:
The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes. Pruitt KD, Harrow J, Harte RA, Wallin C, Diekhans M, Maglott DR, Searle S, Farrell CM, Loveland JE, Ruef BJ, Hart E, Suner MM, Landrum MJ, Aken B, Ayling S, Baertsch R, Fernandez-Banet J, Cherry JL, Curwen V, Dicuccio M, Kellis M, Lee J, Lin MF, Schuster M, Shkeda A, Amid C, Brown G, Dukhanina O, Frankish A, Hart J, Maidak BL, Mudge J, Murphy MR, Murphy T, Rajan J, Rajput B, Riddick LD, Snow C, Steward C, Webb D, Weber JA, Wilming L, Wu W, Birney E, Haussler D, Hubbard T, Ostell J, Durbin R, Lipman D. Genome Res. 2009 Jul;19(7):1316-23. PubMed: PMID: 19498102
CcdsBrowse: 1.0.50
|