Update on RGA
Messages to DDER@NIH.GOV continue to arrive. We're delighted to be
receiving so much feedback, and appreciate the time each person has taken to
read and consider the Rating of Grant Applications report and to respond to it.
Since we asked for specific comments on each recommendation, and since any
piloting or implementation decisions will be made for each recommendation
independently (as far as that is possible), we are pleased that the majority of
those responding have addressed pros and cons of the individual
recommendations.
Below is a compilation of the comments received to date, organized by
recommendation, and followed by more general comments about the report. In
addition, there are brief summaries (by date) of the first three extramural
scientists' discussions which were held in conjunction with NIH Review Meetings
this month.
From the DDER Mailbox...
Review Criteria Recommendations: The three proposed criteria
(Recommendation 1) of Significance, Approach and Feasibility, were generally
well received, although several people mentioned that they liked the additional
criterion of "impact" proposed by Dr. Keith Yamamoto in his comments
at the Division of Research Grants Advisory Committee meeting (available on the
DRG Home Page under "News and Events"). In addition, other criteria
such as creativity, innovativeness, investigators, and other support/ overlap
information were suggested. All of those who addressed Recommendation 2, that
review should be conducted criterion by criterion, agreed that this is a good
idea. No one was opposed to the idea, although as noted above not everyone
agreed on what those criteria should be. Some pointed out that reviewers
already consider these criteria, but were not opposed to organizing the review
in a more structured way. The third and fourth recommendations, that each
criterion should receive a numerical score and that reviewers should not make
global ratings, were not endorsed.
Rating Scale Recommendations: None of the three recommendations on
the rating scale received much attention.
Calculation, Standardization, and Reporting of Scores Recommendations:
While many people commented on these recommendations, the comments were against
implementation of any of these recommendations. Recommendation 8 states that
scores should be standardized on each criterion within reviewer and then
averaged across reviewers. A major criticism was that this recommendation
appears to be based on the assumption that individual scores are independent
measures; critics commented that without independent evaluations of the
applications, statistically manipulating the scores is not justified and will
not improve the accuracy or fairness of the process.
In addition to the specific comments on these recommendations, several
reviewers commented that they favored retaining the streamlining of review. A
few people commented that the proposed changes in review would needlessly
complicate the process and increase the already considerable time that is spent
in review meetings.
"External" Discussion Groups...
As part of several activities seeking the reaction and comments of the
extramural scientific community on the report of the Committee on the Rating of
Grant Applications, five meetings are being held in the Bethesda area in
conjunction with peer review study section meetings. Three of these have
occurred as of the writing of this update.
June 11: A group of approximately 25 reviewers attended a
discussion session in which Dr. Connie Atwell, Chair of the IPR Committee,
briefly presented the rationale and recommendations of the committee's final
report. These reviewers were predominantly researchers in the neurosciences.
There was relatively strong agreement that an overall score should be assigned
to each application by the reviewers. They voiced the opinion that no
arithmetic algorithm can reliably predict what the overall rating would be,
since the weight each of the criteria carry varies from application to
application. The reviewers were quite willing to format their critiques and
structure their discussions by the criteria recommended, but felt that scoring
each criterion would not be worth the extra time it would require. These
reviewers felt they could indicate with language in the critique the relative
strengths and weaknesses of the application for the criteria, but that scoring
each one would detract from the focus on science that should be the core of the
review process, and that the assigning of an overall score to each application
should continue to be based on the judgement of the scientific experts, rather
than an arithmetic calculation of three criterion scores. A few reviewers
commented that the standardization process presented in the RGA report is the
weakest part of the report and lacks sufficient detail; they also commented that
such a strategy is probably not very useful. They voiced enthusiasm about the
triage/ streamlining process but felt it could go even further toward cutting
down the time they spend discussing poorer applications, to allow more time to
discuss the better ones.
June 12: Approximately 20 reviewers met to hear Dr. Connie Atwell
present a brief overview of the RGA report and recommendations. The discussion
was led by Dr. Hugh Stamper, with participation by Drs. Tony Demsey and Donna
Dean. Reviewers represented the physiological sciences and various areas within
mental health and the behavioral sciences. They indicated that they were in
favor of retaining the assigning of an overall rating to each grant application.
Some reviewers in this group supported individual standardization, although this
opinion was certainly not unanimous. Also related to the standardization issue
was a concern about the assumption of independence on which the standardization
relies, since review groups have different scoring behaviors and a reviewer's
personal "history" of scoring with one group is not predictive of his
or her scoring behavior in another group. The reviewers favored reviewing by
criteria, but preferred "impact" to "significance." They
expressed enthusiasm for empirically testing some of the assumptions of the RGA
report, particularly the suggested improved reliability of disaggregated scoring
and how much additional time it might take to review by criterion.
June 19: Dr. Wendy Baldwin held an informal open forum in Wilson
Hall on the NIH campus for extramural scientists to present their views, ask
questions, and discuss the RGA recommendations. Approximately 50 reviewers from
various study sections attended. Dr. Keith Yamamoto discussed his modification
of the RGA recommendations on criteria and overall score, advocating use of four
specific criteria (Impact, Feasibility, Creativity/Innovation, and
Investigator/Environment) and an overall score assigned by reviewers.
Dr. Baldwin explained that we are not asking for more "tutorial"
critiques but are seeking a clearer method of distinguishing among good
applications, not all of which can be selected for funding. Dr. Baldwin
explained, in response to questions, that any changes in the scoring for peer
review would affect all grant application review, across the NIH, which includes
review in the institutes and centers, and is not restricted to DRG. She also
clarified the role of review and program staff, emphasizing that program staff
can best do their job if they are present for the review discussions, and there
was some discussion of the need for SRAs to clearly capture the essence of that
discussion in the Resume and Summary of Discussion which appears in Summary
Statements.
There was discussion of the specific criteria presented both by the report
and by Dr. Yamamoto, and while there was not consensus on exactly what these
should be, there was enthusiasm for using criteria to structure the review and
the discussion. The point was made that the discussion is the most important
part of the review process, and should be clearly structured to provide
strengths and weaknesses for each criterion. The reviewers present repeatedly
indicated their desire to continue to assign an overall rating to each
application, regardless of the use of criteria to structure the critiques and/
or discussion. There was also discussion on reporting of scores, with no real
consensus emerging. Various options were discussed, but the point was made that
how scoring is done is less important than having all study sections use the
same system. It was suggested that we might display not only the score but the
standard deviation, to assist program staff to make decisions when overall
scores are the same and to provide more information to investigators.
Dr. Baldwin made the point that we consider input from the scientific
community extremely important, but are not just weighing mail or counting "votes".
Therefore, she asked that reviewers and other interested parties send their
comments, with the substantive arguments for their preferences, to DDER@NIH.GOV.
The evening ended with the request that, once any changes are decided upon,
reviewers be offered workshops on implementation.
|