Prediction of Protein Tertiary Structure: Modeling Energy Surfaces, Global Optimization, and High Performance Computing

Teresa Head-Gordon, Lawrence Berkeley National Laboratory

Research Objectives

To predict protein structure in the size range of 1000-2500 atoms, for which no ab initio prediction method can reliably solve. To characterize the energy surface by determining the global minimum and all relevant low-lying minima using sophisticated mathematical optimization techniques.

Computational Approach

For a-helical proteins, we designed an algorithm to (1) predict protein class; (2) make a neural network prediction of 2 degree structure helix, sheet, and coil for each amino acid; (3) optimize the energy surface; and (4) use stochastic perturbation to find the global optimum in sub-space, where the sub-space is defined as the coil regions predicted by the neural network. We have ported our algorithm to the Cray T3E.

Accomplishments

In collaboration with University of Colorado researchers, we are developing a global optimization approach based on sampling, perturbation, smoothing, and biasing that has worked successfully on potential energy surfaces of small homopolymers and peptides. We have developed new techniques that incorporate secondary predictions in order to focus more effectively on tertiary structure by determining the global minimum in a succession of small configuration subspaces.

An equally important consideration is the adequacy of the model energy surface. It is thought that existing gas-phase protein force fields do not discriminate between correct folds and misfolds, since such potentials consider both structures to be energetically comparable. However, correct folds can be distinguished from misfolds by incorporating an additional energy term which describes the stabilizing influence of aqueous solvation.

From simulation and neutron and x-ray scattering, we are extracting simple mean force potentials for the interaction of amino acid sidechains in the presence of water. The enormous reduction in complexity means that we can evaluate the energy of polypeptide chain conformations with the water environment implicitly present, a calculation totally intractable using explicit water models and more accurate than empirical solvent surface area terms. On the J90s and the C90, we are using molecular dynamics to simulate neutron scattering data of single leucine and glutamine dipeptides in solution and are backing out the potential of mean force profiles by combining the simulation and experimental data.

Significance

The protein folding problem and the prediction of protein structure are the grand challenges in molecular biology. Understanding how and why proteins perform their evolved function is necessary both for reengineering defective proteins indicated in disease and for rational design of synthetic proteins relevant for biotechnical applications. The logical progression from amino acid sequence to protein structure to protein function makes timing critical for solving the protein structure prediction problem. As the Human Genome Project advances beyond mapping to sequencing the genome, we will be faced with an enormous database of amino acid sequences and a demand for protein structures for which X-ray diffraction and NMR methods will be inadequate.

Publications

Pertsemlidis, A., A. M. Saxena, A. K. Soper, T. Head-Gordon, and R. M. Glaeser. 1996. Direct, structural evidence for modified solvent structure within the hydration shell of a hydrophobic amino acid. Proc. Natl. Acad. Sci. 93:10769-10774.

Pertsemlidis, A., R. M. Glaeser, and T. Head-Gordon. 1997. Differences in hydration structure near hydrophobic and hydrophilic amino acid side chains. Biophysical Journal 73:2106-2115.

Yu, R. C., and T. Head-Gordon. N.d. Improved neural networks for protein secondary structure prediction without use of sequence or structural homologies. J. Comp. Biol., in preparation.

 

A comparison of the crystal structure of the A-chain of uteroglobin progesterone binding protein and that predicted by a global optimization strategy developed by our group. The current predicted structure was found at the end of three 4-hour runs with 64 processors on the T3E-900, but is not yet converged.



Next Page
Back to Table of Contents