New Direction for Protein Structure Initiative Includes New Bioinformatics Opportunities

Seeking to ensure that its Protein Structure Initiative is paying off for biologists, the National Institute of General Medical Science has set aside $37 million for a revamped version of the effort, called PSI:Biology, that will likely include a healthy computational component.

The overarching goal of the revitalized initiative, which originally kicked off in 2000, is to support research partnerships between biologists and high-throughput structure determination centers to create a "highly interactive network" dedicated to solving biological problems with knowledge gained from the PSI effort.

NIGMS is planning to offer $37 million in fiscal 2010 for several research programs, including the PSI-SG Knowledgebase, which it launched last year [BioInform, March 7, 2008], and the development of new computational methods for protein modeling.

While details for the component areas have not yet been disclosed, there are a number of instances for which "bioinformatics and modeling are going to figure into the program," Peter Preusch, the scientific director for PSI:Biology, told BioInform.

For example, one of the component areas, a network of high-throughput structure determination centers, may not appear to have a strong bioinformatics component at face value, but these centers will need to be "multi-functional," he said. Besides experimental structure determination, the centers will also be "doing bioinformatic analysis of targets before they actually enter the experimental pipeline."

Scientists at these centers will also need new methods for homology modeling to help crack difficult structures, he said. By solving "one of its cousins," a researcher might be able to generate a model for the original target from the solved structure, he said. "One of the roles of these centers would be to generate that model."

Although this approach to modeling was already part of the PSI research centers, it wasn't working "very effectively," Preusch acknowledged. "I think that's partly because they were so focused on solving, many, many, many structures and not as much focused on, 'What are those structures going to do to inform a particular biological problem?'"

While Preusch couldn't speak to the exact role of bioinformaticians in the new centers, he said, "I would expect that each of those centers would include at least one person who is a bona fide bioinformatics expert as in fact the existing centers do."

Restructuring PSI

The new thrust for the project was developed in collaboration with the scientific community via a solicitation for feedback on the program issued in 2007, and a meeting that NIGMS hosted last fall.

According to the report of the meeting, called the Future of Structural Genomics, recommendations for PSI included "engagement" of the broad scientific community in the selection of protein structural targets, improvements in experimental and computational functional annotation, closer collaboration between structural centers and computational scientists, and more experimental and computational methods in the functional determination of proteins.

"We tried to translate as much of [the meeting] as can be translated in one coherent package" into PSI:Biology, Preusch said.

The new name is reflective of the initiative's renewed focus on biological research. Now that PSI has experimentally solved more than 3,500 protein structures, the focus is on how these methodologies and structures can be applied to a "broader range of biological problems and therefore have a broader impact on biology," Preusch said.

To some extent, the overhaul appears to be an answer to some scathing criticism from researchers that a large-scale protein structure determination project was a wasted effort. For example, in a commentary in the journal Structure in 2007, Yale chemist Thomas Steitz described the project as akin to "collecting butterflies" and noted that the proteins it studied were not useful enough for the biomedical community.

In another Structure commentary that year, Peter Moore, also from Yale, noted that the project should be "called off," while Gregory Petsko of Brandeis University said in a Genome Biology paper that year that the project was "focused on cranking stuff out as fast as possible, with little attention to whether the structures that it's determining are worth determining."

Under PSI:Biology, NIGMS is planning to offer $37 million in fiscal 2010 for research programs in five, and eventually a total of eight, different component areas that are intended to ensure that the project's goals are aligned with the needs of the biological community. It will begin issuing requests for applications for the first five components in April. The first five component areas are:
• High-throughput structure determination centers that are to solve "community-nominated" sets of protein structures, and extend high-throughput technologies to increasingly complex structures;
• Scientific consortia to work with the structure determination centers on biological problems that require many protein structures, explore the benefit from high-throughput technologies, and drive additional technology development;
• Centers focused on membrane protein structures and on new methods so these structures can be amenable to high-throughput determination;
• The PSI-SG Knowledgebase, which disseminates information and coordinates activities across the research network and solicits community-nominated targets;
• The PSI-SG Materials Repository, which centralizes, maintains, stores, and distributes vectors and clones generated by PSI-supported researchers.
NIGMS said it also plans to issue program announcements for technology development for structure determination; new methods for protein modeling; and additional partnerships with members of the broader community.
The agency did not break out the funding amounts for the different components of the project.

3D Vision

PSI was launched in 2000 as a public-private effort to "to make the three-dimensional atomic-level structures of most proteins easily obtainable from knowledge of their corresponding DNA sequences," according to its mission statement, and to help reduce the costs of this process and develop new methods.

According to NIGMS documents, PSI's total budget for FY2008, which began Oct.1, 2007, was $68.1 million, which supported 97 investigators. Four large-scale centers each received $10 million that year and six specialized centers received $3.5 million each. Two homology modeling centers obtained $1.3 million, the PSI Materials Repository $1 million, and the PSI-Structural Genomics Knowledgebase $2.6 million.

Structural genomics continues to be one of the continuing "threads" for PSI in its new form, in the sense that the effort is geared toward "providing as much coverage of structure space as we can" and gaining leverage of the solved structures through models.

However, Preusch said, "I think we are going to do it in a much more focused way."

Rather than work on the "whole possible universe" of sequence space, the focus is on biomedically important proteins because solving these structures can have a "dual impact" in terms of shedding light on a specific biological problem as well as helping to clarify the relationship between sequence and structure, he said.

Homology modeling is part of that problem set, Preusch said. In 2006, PSI awarded seven R01 grants related to homology modeling in addition to two homology modeling centers.

However, he noted that the sense at the NIGMS meeting was that "the centers were not functioning as well as we might have hoped as centers" and that supporting that work through individual investigator-initiated grants could work "at least as effectively as it was prosecuted through the centers."

Indeed, according to the report of a joint meeting between PSI research center directors, the PSI advisory committee, and the PSI network steering committee that was held last December, "The modeling centers are, at best, making incremental contributions to the success of the PSI."

While "some of the work" at the modeling centers "may be of interest in its own right, it does not clearly add value to the PSI initiative as a whole," the meeting participants determined.

"If I am modeler, where do I come to get my money? The answer would be the second of the program announcements we are going to issue," Preusch said. NIGMS has not yet determined when it will issue this round of program announcements, however.

He said that the modeling program announcement might stretch beyond homology modeling to include pseudo-atomic models built into electron microscopy density maps, and model-building from multiple data types at multiple scales.

At the NIGMS meeting, Andrej Sali presented an overview of a protein modeling workshop that PSI hosted in July 2008. According to the meeting report, workshop participants recommended the development of standards for publishing models, formats for data and software exchange, standards for assessment of models, and outreach to raise awareness of the strengths and limitations of models.

Sali noted that the Swiss Institute of Bioinformatics' Protein Model Portal, which is under development as an ongoing part of the PSI-SG Knowledgebase, will "facilitate these activities and either serve or potentially compute models upon user request," according the NIGMS report.

However, the report notes, "Thus far, the state of the art in modeling does not appear to have been much impacted by the PSI, although the field has made use of the many new structures."

Modelers will play multiple roles in the next phase of the PSI, Preusch said, since they can assist with thinking about targets, target selection, and about how proteins might be approached experimentally based on insights from a given model.

"They may play a role in developing methods for solving structures that make partial use of models and partial use of experimental data and combine both," he said.

Being Independent

A new PSI focal point is about fostering partnerships with the "broader community," according to the NIGMS announcement.

Preusch cited a project that is independent of the PSI that has served as the "poster-child" for the forms this can take. In this effort, Frank Rauschel of Texas A&M, Brian Shoichet of the University of California, San Francisco, and John Gerlt from the University of Illinois at Urbana-Champaign worked together to study proteins solved by the PSI but for which the function was unknown.

Shoichet does computational molecular docking studies, Preusch said, which he matched with information from databases to arrive at leads that were then pursued experimentally by Rauschel and Gerlt. "We would like to see many more examples of this kind of collaboration, where people are using the structures, interpreting them," he said.

In PSI:Biology these partnerships will be set up at "the onset" and they will in part "determine what targets that the structure centers work on."

Preusch couldn't offer details but said he thinks these partnerships are not restricted to experimentalists but are open to computational scientists as well. The partnerships have as their focus function and annotation, so, for example, an informatics scientist might apply along with experimentalists, he said. Even a standalone informatics effort is imaginable. "They would just have to make their case," Preusch said.