Genome Informatics Section 

DOE Human Genome Program Contractor-Grantee Workshop VII 
January 12-16, 1999  Oakland, CA


106. Protein Fold Prediction in the Context of Fine-Grained Classifications 

Inna Dubchak, Chris Mayor, Sylvia Spengler, and Manfred Zorn  
E. O. Lawrence Berkeley National Laboratory, Berkeley, CA 94720  
ildubchak@lbl.gov 

Predicting a protein fold and implied function from the amino acid sequence is a problem of great interest. We have developed a neural networks (NN) based expert system which, given a classification of protein folds, can assign a protein to a folding class using primary sequence data. It addresses the inverse protein folding problem from a taxonometric rather than threading perspective. Recent classifications suggest the existence of ~300-500 different folds. The occurrence of several representatives for each fold allows extraction of the common features of its members. Our method (i) provides a global description of a protein sequence in terms of the biochemical and structural properties of the constituent amino acids, (ii) combines the descriptors using NNs allowing discrimination of members of a given folding class from members of all other folding classes and (iii) uses a voting procedure among predictions based on different descriptors to decide on the final assignment. The level of generalization in this method is higher than in the direct sequence-sequence and sequence-structure comparison approaches. Two sequences belonging to the same folding class can differ significantly at the amino acid level but the vectors of their global descriptors will be located very close in parameter space. Thus, utilizing these aggregate properties for fold recognition has an advantage over using detailed sequence comparisons. The prediction procedure is simple, efficient, and incorporated into easy-to-use-software. It was applied to the fold predictions in the context of fine-grained classifications 3D_ALI1 and the Structural Classification of Proteins, SCOP2. In attempt to simplify the fold recognition problem and to increase the reliability of predictions, we also approached a reduced fold recognition problem, when the choice is limited to two folds. Our prediction scheme demonstrated high accuracy in extensive testing on the independent sets of proteins. 

A WWW page for predicting protein folds is available at URL http://cbcg.nersc.gov 

1Pascarella, S., Argos, P. (1992). Prot. Engng., 5: 121-137  
2Murzin, A. G., S. E. Brenner, T. Hubbard and C. Chothia. (1995). J. Molec. Biol., 247: 536-540. 


 
Home Sequencing Functional Genomics
Author Index Sequencing Technologies Microbial Genome Program
Search Mapping Ethical, Legal, & Social Issues
Order a copy Informatics Infrastructure