Towards an automatic classification of protein str...[BMC Bioinformatics. 2008]

SearchDatabase nameforSearch term

Advanced Search

Warning: The NCBI web site requires JavaScript for full functionality. more...

Display Show

All: 1

Review: 0

Click to change filter selection through MyNCBI.

1: BMC Bioinformatics. 2008 Jan 31;9:74. Links

Towards an automatic classification of protein structural domains based on structural similarity.

Sam V, Tai CH, Garnier J, Gibrat JF, Lee B, Munson PJ.

Mathematical and Statistical Computing Laboratory, DCB, CIT, NIH, DHHS, Bethesda, MD, USA. vsam@mail.nih.gov

BACKGROUND: Formal classification of a large collection of protein structures aids the understanding of evolutionary relationships among them. Classifications involving manual steps, such as SCOP and CATH, face the challenge of increasing volume of available structures. Automatic methods such as FSSP or Dali Domain Dictionary, yield divergent classifications, for reasons not yet fully investigated. One possible reason is that the pairwise similarity scores used in automatic classification do not adequately reflect the judgments made in manual classification. Another possibility is the difference between manual and automatic classification procedures. We explore the degree to which these two factors might affect the final classification. RESULTS: We use DALI, SHEBA and VAST pairwise scores on the SCOP C class domains, to investigate a variety of hierarchical clustering procedures. The constructed dendrogram is cut in a variety of ways to produce a partition, which is compared to the SCOP fold classification.Ward's method dendrograms led to partitions closest to the SCOP fold classification. Dendrogram- or tree-cutting strategies fell into four categories according to the similarity of resulting partitions to the SCOP fold partition. Two strategies which optimize similarity to SCOP, gave an average of 72% true positives rate (TPR), at a 1% false positive rate. Cutting the largest size cluster at each step gave an average of 61% TPR which was one of the best strategies not making use of prior knowledge of SCOP. Cutting the longest branch at each step produced one of the worst strategies.We also developed a method to detect irreducible differences between the best possible automatic partitions and SCOP, regardless of the cutting strategy. These differences are substantial. Visual examination of hard-to-classify proteins confirms our previous finding, that global structural similarity of domains is not the only criterion used in the SCOP classification. CONCLUSION: Different clustering procedures give rise to different levels of agreement between automatic and manual protein classifications. None of the tested procedures completely eliminates the divergence between automatic and manual protein classifications. Achieving full agreement between these two approaches would apparently require additional information.

PMID: 18237410 [PubMed - indexed for MEDLINE]

PMCID: PMC2267780

Cross-over between discrete and continuous protein structure space: insights into automatic classification and networks of protein structures.
PLoS Comput Biol. 2009 Mar; 5(3):e1000331. Epub 2009 Mar 27.

[PLoS Comput Biol. 2009]
ProClust: improved clustering of protein sequences with an extended graph-based approach.
Bioinformatics. 2002; 18 Suppl 2:S182-91.

[Bioinformatics. 2002]
ROC and confusion analysis of structure comparison methods identify the main causes of divergence from manual protein classification.
BMC Bioinformatics. 2006 Apr 13; 7:206. Epub 2006 Apr 13.

[BMC Bioinformatics. 2006]
ReviewTowards a covering set of protein family profiles.
Prog Biophys Mol Biol. 2000; 73(5):321-37.

[Prog Biophys Mol Biol. 2000]
ReviewFold change in evolution of protein structures.
J Struct Biol. 2001 May-Jun; 134(2-3):167-85.

[J Struct Biol. 2001]
» See reviews... | » See all...

Cited by 1 PubMed Central article

Cross-over between discrete and continuous protein structure space: insights into automatic classification and networks of protein structures.
Pascual-García A, Abia D, Ortiz AR, Bastolla U. PLoS Comput Biol. 2009 Mar; 5(3):e1000331. Epub 2009 Mar 27.

[PLoS Comput Biol. 2009]

Recent Activity

Clear Turn Off Turn On

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

» See more...

Display Show

Towards an automatic classification of protein structural domains based on structural similarity.

Related articles

Cited by 1 PubMed Central article

Recent Activity