Report on the Challenges in Docking and Virtual Screening Operational Meeting

February 24, 2006

Participants at the Challenges in Docking and Virtual Screening meeting at NIH in August 2005 concluded that progress toward computational tools for molecular docking and in silico screening would be significantly faster if research groups had access to common, high-quality data sets that could be used for development and benchmarking. The ultimate goal was defined as the quantitative prediction of binding affinity, given the independent structures of target protein and ligand. It was proposed that industrial partners might contribute valuable data sets for ligand-target interactions, provided the data would be curated, completed, extended as needed, and made available so that it provided substantial value to the docking and scoring research community.

As proposed in August, a smaller group of pharmaceutical industry, academic, and government scientists met on February 24, 2006 to develop a working plan for a public-private partnership that could accomplish this goal.

The February meeting was held by bicoastal videoconference from sites at GlaxoSmithKline and the University of California, San Francisco. Its goals were:

To identify all data elements necessary to both support benchmarking of current algorithms and enhance further research in docking and scoring.
To identify all activities necessary to obtain, curate, support, and promote such a resource.
To obtain preliminary information about datasets that might be available from industry partners for this effort.

WHAT DO WE HAVE?

The industrial groups identified about 25 ligand-target sets potentially available for release. Each set contains tens to hundreds of ligands and their affinities for a given target, and typically tens of X-ray structures of the ligand-protein complexes. Perhaps half of the datasets were for kinases, but the balance covered a wide range of target types. In addition, Mike Gilson mentioned the 13,000 Kd measurements in the BindingDB, some of which might complement the new data sets. Expression conditions, assays and crystallization conditions probably would be released. Materials (plasmids, compounds, crystals) probably would not.

The consensus was that to make a meaningful improvement in prediction methods, data sets will need both close-analog series within chemotype and more than one chemotype per target. This appears to already be the case in a fair number of the data sets under consideration. Data sets on identical targets from different companies might be combined, provided assay data can be rationalized (or re-measured). Some (few) other academic and extant PDB datasets might be suitable. Other organizations and companies may have data sets to contribute. An active recruitment effort should be mounted once a plan for the effort is in place.

WHAT WILL WE NEED?

Database/user interface. To make the new information readily available for use by academic and private developers, it will be essential to develop and maintain a multidimensional database and user interface capable of integrating target protein crystallographic data sets (PDB format); small molecule ligand crystallographic data; binding affinity data (IC50s, assay and SPR Kds, ITC); expression, crystallization, and ligand synthesis, and binding assay protocols. The Structural Genomics Knowledge Base of the NIGMS Protein Structure Initiative has many of the same requirements and might serve as a prototype, or even a service center. Discussions with interested individuals are recommended. Ideally the user interface would need to supporting download of the data to docking/scoring developers, as well as upload and testing of docking/scoring programs by resource staff in benchmarking exercises.

New experimental data. To enhance and complete datasets submitted by industry participants, several types of additional data will be required. Some binding data will need to be reassessed under consistent conditions, so structural data from different sources can be combined. Affinity constants will be needed for datasets that have only IC50’s and for additional compounds as needed to test specific hypotheses about improving prediction programs. Ideally, isothermal titration calorimetry using micro-scale high throughput methods would be performed. Possible sites discussed for such assays were the NIH Molecular Libraries Screening Center and the Protein Structure Initiative centers.

Provision of needed crystallographic data will require two types of effort: refinement of submitted datasets and curation to PDB level and format (for use as input for current prediction programs); determination of a limited number of new structures of target proteins with additional ligands, as needed to test hypotheses and for benchmarking exercises. Possible loci discussed included the PSI centers and the national synchrotron centers.

Determination of properties of unbound ligand molecules, such as solvation and solubility, may be needed to analyze energetics of bound vs. free ligands. National Institute of Standards and Technology could play a very helpful role here. Expression of new proteins and synthesis of additional ligands will be necessary on a limited scale. Contracting out this work may be the most efficient way to proceed.

WHAT WILL WE ACHIEVE?

Programs that can rank order by affinity, predict affinity, and predict new ligands— presuming new data are accompanied by continued funding of algorithm development.
Programs that allow us to predict significant changes in conformation.
Possibly, programs that predict whether a ligand that binds will be optimizable.

WHAT NEXT?

Cathy Peishoff and Brian Shoichet will prepare a summary of goals and scope for presentation to NIGMS upper management.
NIH staff will investigate possible funding mechanisms and/or distribution of tasks.
Realistic scope and cost estimates are needed to match with potential funding sources.
NIST will host a workshop April 18-21 to identify efforts they might carry out to facilitate development of docking and scoring methods.

MEETING PARTICIPANTS

Christopher P. Austin, M.D.
Senior Advisor to the Director for Translational Research
Director, NIH Chemical Genomics Center
National Human Genome Research Institute
National Institutes of Health
Building 31, Room 4B09
31 Center Drive
Bethesda, MD 20892
Phone: 301-594-6238
Fax: 301-402-0837
E-mail: austinc@mail.nih.gov

Jeff Blaney, Ph.D.
Vice President, Lead Discovery
Structural Genomix
10505 Roselle Street
San Diego, CA 92121
Phone: 858-228-1495
Fax: 858-558-0642
E-mail: jeff_blaney@stromix.com

Anne M. Chaka, Ph.D.
Computational Chemist
Physical and Chemical Properties Division
National Institute of Standards and Technology
100 Bureau Drive
Gaithersburg MD 20899
Phone: 301-975-4525
Fax: 301-869-4020
E-mail: anne.chaka@nist.gov

Wendy Cornell, Ph.D.
Director
Molecular Systems
Basic Chemistry
Merck Research Laboratories
126 East Lincoln Avenue
Rahway, NJ 07065
Phone: 732-594-4954
E-mail: wendy_cornell@merck.com

Michael K. Gilson, M.D., Ph.D.
Professor and CARB Fellow
Center for Advanced Research in Biotechnology
University of Maryland Biotechnology Institute
9600 Gudelsky Drive
Gaithersburg, MD
Phone: 240-314-6217
Fax: 240-314-6255
E-mail: Gilson@umbi.umd.edu

Jayne Kapur, Ph.D.
Computational Chemist
Physical and Chemical Properties Division
National Institute of Standards and Technology
100 Bureau Drive
Gaithersburg MD 20899
Phone: 301-975-2460
Fax: 301-975-3675
E-mail: jayne.kapur@nist.gov

Deborah A. Loughney
Director, Computer-Assisted Drug Design
Bristol-Myers Squibb Company
P.O. Box 4000
Princeton, NJ 08543-4000
Phone: 609-252-6054
Fax: 609-252-6012
E-mail: deborah.loughney@bms.com

Arthur J. Olson, Ph.D.
Professor
Department of Molecular Biology
The Scripps Research Institute
La Jolla, CA 92037
Phone: 858-784-9702
Fax: 858-784-2860
E-mail: olson@scripps.edu

Catherine E. Peishoff, Ph.D.
Site Director, Computational Analytical & Structural Sciences
GlaxoSmithKline
1250 S. Collegeville Road
UP12-210, PO Box 5089
Collegeville, PA 19426
Phone: 610-917-6585
Fax: 610-917-7393
E-mail: Catherine.e.peishoff@gsk.com

Emanuele Perola, Ph.D.
Applications Modeling
Vertex Pharmaceuticals
130 Waverley Street
Cambridge MA 02139
E-mail: emaneule_perola@vrtx.com

Brian Shoichet, Ph.D.
Professor
Department of Pharmaceutical Chemistry
University of California San Francisco
1700 4th Street, QB3 Building
Room 508D
San Francisco, CA 94143-2550
Phone: 415-514-4126
Fax: 415-502-1411
E-mail: shoichet@cgl.ucsf.edu

Janna P. Wehrle, Ph.D.
Program Director
Division of Cell Biology and Biophysics
National Institute of General Medical Sciences
National Institutes of Health
45 Center Drive, Room 2AS.19K
Bethesda, MD 20892-6200
Phone: 301-594-0828
Fax: 301-480-2004
E-mail: wehrlej@mail.nih.gov