Bioinformatics Section 

DOE Human Genome Program Contractor-Grantee Workshop VIII
February 27-March 2, 2000  Santa Fe, NM


Home
Author Index
Sequencing
Table of Contents
Abstracts   
Instrumentation
Table of Contents
Abstracts
Mapping 
Table of Contents
Abstracts
Bioinformatics
Table of Contents
Abstracts
Function and cDNA Resources
Table of Contents
Abstracts

Microbial Genome Program
Table of Contents
Abstracts
Ethical, Legal, and Social Issues
Table of Contents
Abstracts
Infrastructure
Table of Contents
Abstracts

Ordering Information

Abstracts from
Past Meetings

67. Multi-Way Protein Folding Classification Using Support Vector Machines and Neural Networks

C.H.Q. Ding and I. Dubchak

National Energy Research Scientific Computing Center, Lawrence Berkeley National Laboratory, Berkeley, CA 94720

ildubchak@lbl.gov

In bioinformatics research, classification methods for multiple classes recognition employed so far are mostly based on the one-vs-others approach. We investigated two advanced approaches, the unique one-vs-others approach and the all-vs-all approach with increased classification accuracy.

We analyzed the traditional sensitivity and selectivity measures for multi-class classification from a new perspective of contingency table in categorical analysis, and provided some insights. These true positive and false positive based measures are combined and generalized to a new unique accuracy measure which characterize more accurately the performance of a recognition system. This measure can be applied consistently and uniformly to all multi-class classification approaches thus facilitating inter-comparisons of different classification methods.

We used the state-of-art Support Vector Machine (SVM) together with an earlier neural network (NN) two-class classifiers. SVM gives higher accuracy and runs much faster than NN.

Of the six different physico-chemical based parameter sets extracted from protein sequences, we found that the amino acid composition based parameter set is the most effective for the discriminative methods. The secondary structure

based parameters are also quite effective. These are followed by parameter sets extracted from hydrophobicity, polarity, van der Waals volume, and polarizability properties.

For more information about our projects, see the Web.


The online presentation of this publication is a special feature of the Human Genome Project Information Web site.