General
Interest-Enhancing Peer
Review: The NIH Announces New Scoring Procedures for Evaluation of Research
Applications; post until Feb. 1, 2009
Notice Number: NOT-OD-09-024
Key Dates
Release Date: December 2, 2008
Issued by
National Institutes of Health (NIH), (http://www.nih.gov)
Background
The
mission of the NIH is to support science in pursuit of knowledge about the
biology and behavior of living systems and to apply that knowledge to extend
healthy life and reduce the burdens of illness and disability. As part of
this mission, applications submitted to the NIH for grants or cooperative
agreements to support biomedical and behavioral research are evaluated for
scientific and technical merit through the NIH peer review system. In
June 2007, the NIH initiated a formal, agency-wide effort to review the NIH
peer review system (http://enhancing-peer-review.nih.gov/). After
careful deliberation and consideration of the recommendations resulting from
this year-long effort, a number of key actions will be implemented in the NIH
peer review system.
In current practice, each scored application is assigned a single, overall
priority score that reflects the consideration of all review criteria.
Individual reviewers assign scores on a 1 to 5 scale in 0.1 increments (e.g.,
2.2), resulting in 41 possible rating discriminations for reviewers to
make. The reviewers’ individual scores then are averaged and multiplied
by 100 to yield a single overall priority score for each scored application
(e.g., 253).
Although
this rating system has served the NIH and the research community well, several
concerns led the NIH to consider a revised rating system for grant
applications. Making 41 discriminations is difficult for reviewers to do
reliably, and scores increasingly have become compressed toward the positive
end of the scale. In addition, by averaging reviewer scores and
multiplying by 100, the resulting priority score appears to have more precision
than it actually has. To address these concerns, the NIH considered
scoring systems with fewer rating options to increase potential reliability and
with sufficient range and appropriate anchors to encourage reviewers to use the
full scale. To increase transparency, the NIH also considered methods to
communicate ratings from assigned reviewers even when the application is
streamlined and not discussed, or discussed and scored by the full committee.
Additional
information is available in Guide Notices NOT-OD-09-023
“Enhancing Peer Review: The NIH Announces Updated Implementation Timeline” and NOT-OD-09-025
“Enhancing Peer Review: The NIH Announces Enhanced Review Criteria for
Evaluation of Research Applications Received for Potential FY2010 Funding”.
Implementation
New Scoring System. The new scoring
system will be effective for all applications for research grants and
cooperative agreements that are submitted for funding consideration for fiscal
year 2010 (FY2010) and thereafter. The first standing due date for FY2010
is January 25, 2009; the new scoring system will be used for applications
submitted in response to Parent Announcements and Program Announcements,
including PARs and PASs published before or after this Guide Notice. An
important aspect of the implementation of the new scoring system is to use it
in a consistent manner for applications considered in a given fiscal
year. Therefore, some RFAs and PARs for funding consideration in FY2010
have due dates before January 25, 2009, and responses to those will be
evaluated using the new scoring system. Likewise some RFAs and PARs for
FY2009 have due dates after January 25, 2009, and responses to those will be
evaluated using the present scoring system.
The
new scoring system will utilize a 9-point rating scale (1 = exceptional; 9 =
poor). Although a 7-point scale was planned initially, a 9-point scale
was selected based on the desire for a scale with sufficient range. The
NIH also has prior experience with the distribution of scores from a 9-point
scale, based on data on the 1-5 scale when only 0.5 increments were allowed1. Moreover, prior
recommendations from measurement and decision science experts regarding the
scoring system suggested that an 8 to 11 point scale is appropriate2.
Not
Recommended for Further Consideration. An application may be designated Not
Recommended for Further Consideration (NRFC) by the Scientific Review Group if
it lacks significant and substantial merit; presents serious ethical problems
in the protection of human subjects from research risks; or presents serious
ethical problems in the use of vertebrate animals, biohazards, and/or select
agents. Applications designated as NRFC do not proceed to the
second level of peer review (National Advisory Council/Board) because they
cannot be funded.
Percentile Rankings. Percentile
rankings will be calculated anew, starting with scores from the May 2009 cycle
of review, and reported to the nearest whole number.
Scores for Individual Criteria. Before the review
meeting, each reviewer and discussant assigned to an application will give a
separate score for each of five core review criteria (Significance,
Investigator(s), Innovation, Approach, and Environment). For all
applications, even those not discussed by the full committee, the scores of the
assigned reviewers and discussant(s) for these criteria will be reported
individually on the summary statement.
Priority
Scores – Discussed Applications. Before the review meeting, each reviewer and
discussant assigned to an application will give a preliminary impact score for
that application. The preliminary impact scores will be used to
determine which applications will be discussed. For each application that
is discussed, a final impact score will be given by each eligible committee
member (without conflicts of interest). Each member’s impact score will
reflect his/her evaluation of the overall impact that the project is likely to
have on the research field(s) involved, rather than a weighted average applied
to the reviewer’s scores given to each criterion (see above).
The overall impact score for each discussed application will be determined by
calculating the mean score from all the eligible members’ impact scores, and
multiplying the average by 10; the overall impact score will be reported on the
summary statement. Thus, the 81 possible overall impact scores will range
from 10 - 90. (Overall impact scores will not be reported for applications that
are not discussed.)
Funding Decisions. The new
scoring system may produce more applications with identical scores (“tie”
scores). Thus, other important factors, such as mission relevance and
portfolio balance, will be considered in making funding decisions when grant
applications are considered essentially equivalent on overall impact, based on
reviewer ratings.
1Report of the Committee on Rating of Grant
Applications (May 17, 1996) (http://grants.nih.gov/grants/peer/rga.pdf)
2Cicchetti, D.V., Showalter, D., and Tyrer, P.J. (1985) The effect of
number of rating scale categories on levels of interrater
reliability: A Monte Carlo investigation. Appl. Psych. Meas. 9:
31-36.
Inquiries
Questions
should be directed to EnhancingPeerReview@mail.nih.gov.
For more information on NIH’s Enhancing Peer Review effort visit http://enhancing-peer-review.nih.gov/.