Conserved Domains and Protein Classification
 
 
 
What's New
 

To receive e-mail news about changes to the Conserved Domain Database and its associated resources, subscribe to the cdd-announce@ncbi.nlm.nih.gov mailing list by completing a brief form or sending an e-mail message with the word subscribe in the subject line to cdd-announce-request@ncbi.nlm.nih.gov.

 

CDD v3.15

[27 JUN 2016]  A new version of the Conserved Domain Database has been released. Version 3.15 contains 290 new or updated NCBI-curated domains, including models specifically built to annotate structural motifs (accession prefix "sd"), and now mirrors Pfam version 28. A fine-grained classification of the beta lactamase-like metallohydrolases has been added. In addition, the default sort order of conserved domain hits in CD Search has been changed, ranking hits by E-value without giving preference to NCBI-curated models. You can access CDD at https://www.ncbi.nlm.nih.gov/cdd and find updated content on the CDD ftp site at ftp://ftp.ncbi.nih.gov/pub/mmdb/cdd. Click on the database statistics at the right to retrieve the subset of records from any source database.

 

Updated version of the "rpsbproc" utility is now available

[29 JUN 2015]  An updated version of the "rpsbproc" command line utility for RPS-BLAST is now available from the CDD FTP site: ftp://ftp.ncbi.nih.gov/pub/mmdb/cdd/rpsbproc/. The output generated by the updated version includes a non-redundant list of structural motifs (accession prefix "sd"), eliminating overlapping structural motifs. Additional information about the "rpsbproc" command line utility is provided in the December 4, 2014 announcement of its initial release.

 

Improved consistency of domain annotation

[20 APR 2015]  The CD-Search service now offers two new options that are designed to improve the consistency of domain annotation, based on known domain architectures. The option to "Rescue Borderline Hits" allows you to see hits that have an E-value above the RPS-BLAST reporting threshold (anywhere between 0.01 and 1.0), and that are consistent with known domain architectures (illustrated example). The option to "Suppress Weak Overlapping Hits" suppresses hits that have an E-value close to the RPS-BLAST reporting threshold (in between 0.01 and 0.001) but overlap with stronger hits (illustrated example). Additional details are provided in a publication by Derbyshire et al., 2015.

 
 
Database Statistics
 
CDD v3.15, as of 27 June 2016:

 
52,411 total records from all Source Databases
11,474 domains from NCBI CDD curation effort
1,013 domains from SMART v6.0
16,230 domains from PFAM v28.0
4,873 domains from COGs v1.0
10,885 domains from PRK v6.0
4,488 domains from TIGRFAM v15
organized into 3,448 multi-model Superfamilies
 
 
Click on the numbers above to retrieve the domain records from CDD; click on the source database names for additional details.
 

 
News Archive
 
 

CDD v3.14

[28 MAY 2015]  A new version of the Conserved Domain Database has been released. Version 3.14 contains 560 new or updated NCBI-curated domains, including models specifically built to annotate structural motifs (accession prefix "sd"), and contains corrections to some short names for TIGRFAM records as well as updated names and classifications for many models derived from COGs. A fine-grained classification of the Myosin motor domains has been added.

 

CDD v3.13

[09 JAN 2015]  A new version of the Conserved Domain Database has been released. Version 3.13 contains 286 new or updated NCBI-curated domains, including models specifically built to annotate structural motifs (accession prefix "sd"), and now mirrors TIGRFAMs version 15.

 

New post-processing utility is now available for RPS-BLAST

[04 DEC 2014]  A new "rpsbproc" command line utility is now available, as an addition to the standalone version of Reverse Position-Specific BLAST (RPS-BLAST).

Standalone RPS-BLAST ("rpsblast") continues to be packaged with the BLAST executables ftp://ftp.ncbi.nih.gov/blast/executables/LATEST/, as it has been since 2000. It lists the conserved domain models that score above a certain threshold (default set to an evalue of 10), sorted by scores, on each of your query protein sequences.

The new "rpsbproc" utility is available from the CDD FTP site: ftp://ftp.ncbi.nih.gov/pub/mmdb/cdd/rpsbproc/. It post-processes the results of local RPS-BLAST searches in order to provide a non-redundant view of the conserved domains found in your protein query sequences, and to provide additional annotation on query sequences, such as domain superfamilies and conserved sites, similar to the annotation provided by the corresponding web services (e.g., the NCBI Batch CD-Search web service at https://www.ncbi.nlm.nih.gov/Structure/bwrpsb/bwrpsb.cgi). The README file provides additional details about the new "rpsbproc" utility.

 

CDD v3.12

[03 OCT 2014]  A new version of the Conserved Domain Database has been released. Version 3.12 contains 1526 new or updated NCBI-curated domains.

 

CDD v3.11

[15 FEB 2014]  A new version of the Conserved Domain Database (CDD) has been released. Version 3.11 contains 596 new or updated NCBI-curated domain models and now contains the most recent Pfam release 27. Also, position-specific scoring matrices (PSSMs) have been re-computed for many models in CDD, and frequency tables have been added to the PSSMs. The search databases distributed as part of this release can now be used with the composition-based scoring that is now available in the more recent versions of RPS-BLAST (version 2.2.28 and up). The new search databases also remain compatible with previous versions of RPS-BLAST.

 

CD-Search, Batch CD-Search, and CDART display style revised

[12 FEB 2014]  The display style for drawing domain models in CD-Search, Batch CD-Search, and CDART has been revised. The display style is now uniform among those tools, with a given domain model rendered in the same shape and color by all three tools. Additionally, a new display option, "Standard Results," is available in CD-Search and Batch CD-Search, and shows the top-scoring domain model from each source database. The Batch CD-Search graphical display of search results also offers a new "compact mode," which displays the domain architecture of each query sequence on a single line. This display type is particularly useful if you select two or more query sequences from the list and want to compare their domain architectures. All three tools (CD-Search, Batch CD-Search, and CDART) employ the latest version of RPS-BLAST, which, as of version 2.2.28, uses composition-based scoring and abolishes the need to mask out compositionally biased regions in query sequences. Live and pre-computed searches generated by the CD-Search web tool now use these settings, and as a result the domain annotations have changed for a number of protein sequences in Entrez.

 

CDD v3.10

[21 MAR 2013]  A new version of the Conserved Domain Database (CDD) has been released. Version 3.10 contains 1104 new or updated NCBI-curated domain models. Also, position-specific score matrices have been re-computed for a large fraction of the models in CDD, which has slightly affected the resulting sequence annotations. PSSMs are now provided in an extended format. They contain 28 rows instead of 26, and also come with intermediate data in addition to the final scoring matrix. The latter will make it possible to directly generate search databases for the current version of RPS-BLAST, DELTA-BLAST, as well as an upcoming new version of RPS-BLAST that supports composition-corrected scoring.

 

CD-Search "specific hits" now include domain models from external sources

[06 AUG 2012]  A specific hit is a high confidence association between a protein query sequence and a conserved domain, resulting in a high confidence level for the inferred function of the protein query sequence. The algorithm for identifying specific hits has been revised to include domain models from external sources. Previously, specific hits were limited to NCBI-curated domains. Now, if domain models from both the NCBI-curated data set and external sources meet a domain-specific threshold, the NCBI-curated domain will still be listed preferentially as the specific hit because it has been annotated with fine-grained evolutionary relationships, conserved sequence blocks, specific functions, and conserved features/sites based on careful review of sequence data, 3D structures, and literature. However, if no NCBI-curated domain meets the criteria for a specific hit, then the top-ranked domain model from an external source will be shown in the CD-Search results concise display if it meets all the criteria for a specific hit. As a result of this change, more sequences in the Entrez Protein database are now annotated with specific functional information.

 

CDD v3.09

[01 NOV 2012]  The CDD v3.09 release includes 42 new or updated NCBI-curated domains and now mirrors TIGRFAM v13.

 

CDD v3.08

[17 SEP 2012]  The CDD v3.08 release includes 239 new or updated NCBI-curated domains.

 

CDD v3.07

[06 AUG 2012]  The CDD v3.07 release includes 495 new or updated NCBI-curated domains.

 

CDD v3.06

[29 MAY 2012]  The CDD v3.06 release includes 310 new or updated NCBI-curated domains.

 

Conserved Domain searches now launched for blastx queries

[28 MAR 2012]  Conserved Domain searches are now being launched for all nucleotide queries shorter than 10,000 base pairs submitted to blastx, the BLAST program that translates a nucleotide query sequence in six reading frames and compares each translation against the protein data set. The blastx search results page includes a concise display of the conserved domains found on the translated reading frames, and that graphic links to the corresponding interactive view in the CD-Search tool.

 

CDD v3.05

[23 MAR 2012]  The CDD v3.05 release includes 161 new or updated NCBI-curated domain models and now mirrors TIGRFAM v12.

 

CDD v3.04

[08 MAR 2012]  The CDD v3.04 release includes 166 new or updated NCBI-curated domain models..

 

CDD v3.03

[19 JAN 2012]  The CDD v3.03 release includes 174 new or updated NCBI-curated domain models and now mirrors PFAM v26.

 

CDD v3.02

[07 DEC 2011]  The CDD v3.02 release includes 170 new or updated NCBI-curated domain models and now mirrors TIGRFAM v11.

 

CDD v3.01

[09 NOV 2011]  The CDD v3.01 release includes 298 new or updated NCBI-curated domains.

 

CD-Search now accepts nucleotide sequences as queries

[04 NOV 2011]  The CD-Search tool now accepts nucleotide sequences as queries. It translates them in all six reading frames and searches each protein product against the RPS-BLAST databases. CD-Search will combine the results for all the proteins into a single page, but will only display the translated reading frames that picked up a match in CDD. The help document provides additional details.

 

CDD v3.00

[28 Oct 2011]  The CDD v3.00 release contains the same conserved domain models as the previous release (CDD v2.32). However, the composition of some superfamily clusters have changed, and the single domain/multidomain status assigned to some conserved domain models has also changed. These changes are due to a slightly revised E-value calculation implemented in the BLAST program suite (including RPS-BLAST), which now uses a new variant of the Finite Size Correction that produces more accurate E-values for short query and/or subject sequences.

 

CDTree now includes the latest version of Cn3D

[18 OCT 2011]  A new CDTree software bundle is now available and includes the most recent version of NCBI's 3D structure viewing program, Cn3D 4.3. CDTree is:

  • a powerful tool to aid in the classification of protein sequences and investigate their evolutionary relationships
  • a web-based helper application used by the CDD on-line search service to permit user interaction with pre-defined protein domain hierarchies
  • an integrated software environment organized to help users assimilate large amounts of biological data from various resources by access to a suite of analysis methods
  • an alignment editor and 3D structure visualization program through its integration with Cn3D 4.3

 

New version of CDART offers new functions & features

[05 OCT 2011]  A new release of the Conserved Domain Architecture Retrieval Tool (CDART) is available and offers new functions and features for finding proteins that have domain architectures similar to a query protein (illustrated example):

CDD v2.32

[02 SEP 2011]  The CDD v2.32 release includes 14 new or updated NCBI-curated domains, suppresses pfam10695, and fixes errors in superfamily clustering.

 

Entrez CDD interface redesign

[25 JULY 2011]  The Conserved Domain Database now has a revised home page, search interface, and search results display, to have functions similar to those available in PubMed. Changes include: (a) a streamlined home page with links to related resources; (b) an "Advanced Search" page, which provides the ability to build a query one term at a time, browse the index of any search field, and combine earlier searches; and (c) new search results displays that provide links in the right margin to search filters, related data, and tools.

 

CDD v2.31

[19 AUG 2011]  The CDD v2.31 release includes 292 new or updated NCBI-curated domains and now mirrors SMART v6.0.

 

CDD v2.30

[16 JUN 2011]  The CDD v2.30 release includes 201 new or updated NCBI-curated domains.

 

CDD v2.29

[13 MAY 2011]  The CDD v2.29 release includes 256 new or updated NCBI-curated domains, and now mirrors TIGRFAM v10.1 and Pfam v25.

 

CDD v2.28

[30 MAR 2011]  The CDD v2.28 release includes 611 new or updated NCBI-curated domains and now mirrors TIGRFAM v10.

 

CDD v2.27

[02 MAR 2011]  The CDD v2.27 release includes 357 new or updated NCBI-curated domains.

 

CDD v2.26

[29 DEC 2010]  The CDD v2.26 release includes 124 new or updated NCBI-curated domains as well as the most recent data from PRK.

 

CDD v2.25

[07 OCT 2010]  The CDD v2.25 release includes 88 new or updated NCBI-curated domains.

 

Batch CD-Search

[30 SEP 2010]   A Batch CD-Search tool is now available for the computation and download of conserved domain annotation on large sets of protein queries. Input up to 100,000 protein query sequences as a list of sequence identifiers and/or raw sequence data, then download output in a variety of formats (including tab-delimited text files) or view the search results graphically. See the help document for additional details, including information on using Batch CD-Search for scripted data downloads.

 

CDD v2.24

[09 SEP 2010]  The CDD v2.24 release includes 196 new or updated NCBI-curated domains.

 

CDD v2.23

[29 JUL 2010]  The CDD v2.23 release includes 174 new or updated NCBI-curated domains.

 

CDD v2.22

[26 MAY 2010]  The CDD v2.22 release includes 443 new or updated NCBI-curated domains.

 

CDD v2.21

[13 APR 2010]  The CDD v2.21 release includes 489 new or updated NCBI-curated domains as well as the most recent data from PRK, which now includes domain models for plant-specific (non-chloropast) proteins, indicated by PLN accession number prefixes.

 

CDD v2.20

[19 MAR 2010]  The CDD v2.20 release includes 107 new or updated NCBI-curated domains, and now mirrors TIGRFAM v9.0.

 

CDD v2.19

[01 FEB 2010]  The CDD v2.19 release includes PFAM v24.0 as well as 532 new or updated NCBI-curated domains.

 

CDD v2.18

[10 DEC 2009]  The CDD v2.18 release includes 489 new or updated NCBI-curated domains.

 

CDD v2.17

[04 JUN 2009]  The CDD v2.17 release includes 484 new or updated NCBI-curated domains, as well as records from a new data source, TIGRFAM, and protozoan domains from the Protein Clusters (PRK) database.

 

Specific Hits

[08 MAY 2008]  The CD-Search tool now shows four types of hits in search results, including specific hits. A specific hit is a high confidence association between a protein query sequence and a conserved domain, resulting in a high confidence level for the inferred function of the protein query sequence. more...

 

Superfamilies

[08 MAY 2008]  The Conserved Domain Database (CDD) is now organized into superfamilies. A superfamily cluster is a set of conserved domain models that generate overlapping annotation on the same protein sequences. These models are assumed to represent evolutionarily related domains and may be redundant with each other. more...

 
 
 
 
 Revised 21 October 2016