The U.S. Preventive Services Task Force (USPSTF) systematically reviews the evidence concerning both the benefits and harms of widespread implementation of a preventive service. It then assesses the certainty of the evidence and the magnitude of the benefits and harms. On the basis of this assessment, the USPSTF assigns a letter grade to each preventive service signifying its recommendation about provision of the service (see Table below). An important, but often challenging, step is determining the balance between benefits and harms to estimate "net benefit" (that is, benefits minus harms).
Table 1. U.S. Preventive Services Task Force Recommendation Grid*
Certainty of Net Benefit |
Magnitude of Net Benefit |
Substantial |
Moderate |
Small |
Zero/Negative |
High |
A |
B |
C |
D |
Moderate |
B |
B |
C |
D |
Low |
Insufficient |
*A, B, C, D, and I (Insufficient) represent the letter grades of recommendation or statement of insufficient evidence assigned by the U.S. Preventive Services Task Force after assessing certainty and magnitude of net benefit of the service (see the "Rating Scheme for the Strength of the Recommendations" field).
The overarching question that the Task Force seeks to answer for every preventive service is whether evidence suggests that provision of the service would improve health outcomes if implemented in a general primary care population. For screening topics, this standard could be met by a large randomized, controlled trial (RCT) in a representative asymptomatic population with follow-up of all members of both the group "invited for screening" and the group "not invited for screening."
Direct RCT evidence about screening is often unavailable, so the Task Force considers indirect evidence. To guide its selection of indirect evidence, the Task Force constructs a "chain of evidence" within an analytic framework. For each key question, the body of pertinent literature is critically appraised, focusing on the following 6 questions:
- Do the studies have the appropriate research design to answer the key question(s)?
- To what extent are the existing studies of high quality? (i.e., what is the internal validity?)
- To what extent are the results of the studies generalizable to the general U.S. primary care population and situation? (i.e., what is the external validity?)
- How many studies have been conducted that address the key question(s)? How large are the studies? (i.e., what is the precision of the evidence?)
- How consistent are the results of the studies?
- Are there additional factors that assist us in drawing conclusions (e.g., presence or absence of dose–response effects, fit within a biologic model)?
The next step in the Task Force process is to use the evidence from the key questions to assess whether there would be net benefit if the service were implemented. In 2001, the USPSTF published an article that documented its systematic processes of evidence evaluation and recommendation development. At that time, the Task Force's overall assessment of evidence was described as good, fair, or poor. The Task Force realized that this rating seemed to apply only to how well studies were conducted and did not fully capture all of the issues that go into an overall assessment of the evidence about net benefit. To avoid confusion, the USPSTF has changed its terminology. Whereas individual study quality will continue to be characterized as good, fair, or poor, the term certainty will now be used to describe the Task Force's assessment of the overall body of evidence about net benefit of a preventive service and the likelihood that the assessment is correct. Certainty will be determined by considering all 6 questions listed above; the judgment about certainty will be described as high, moderate, or low.
In making its assessment of certainty about net benefit, the evaluation of the evidence from each key question plays a primary role. It is important to note that the Task Force makes recommendations for real-world medical practice in the United States and must determine to what extent the evidence for each key question—even evidence from screening RCTs or treatment RCTs—can be applied to the general primary care population. Frequently, studies are conducted in highly selected populations under special conditions. The Task Force must consider differences between the general primary care population and the populations studied in RCTs and make judgments about the likelihood of observing the same effect in actual practice.
It is also important to note that 1 of the key questions in the analytic framework refers to the potential harms of the preventive service. The Task Force considers the evidence about the benefits and harms of preventive services separately and equally. Data about harms are often obtained from observational studies because harms observed in RCTs may not be representative of those found in usual practice and because some harms are not completely measured and reported in RCTs.
Putting the body of evidence for all key questions together as a chain, the Task Force assesses the certainty of net benefit of a preventive service by asking the 6 major questions listed above. The Task Force would rate a body of convincing evidence about the benefits of a service that, for example, derives from several RCTs of screening in which the estimate of benefits can be generalized to the general primary care population as "high" certainty (see the "Rating Scheme for the Strength of Recommendations" field). The Task Force would rate a body of evidence that was not clearly applicable to general practice or has other defects in quality, research design, or consistency of studies as "moderate" certainty. Certainty is "low" when, for example, there are gaps in the evidence linking parts of the analytic framework, when evidence to determine the harms of treatment is unavailable, or when evidence about the benefits of treatment is insufficient. Table 4 in the methodology document listed below (see "Availability of Companion Documents" field) summarizes the current terminology used by the Task Force to describe the critical assessment of evidence at all 3 levels: individual studies, key questions, and overall certainty of net benefit of the preventive service.
Sawaya GF, et al. Update on the methods of the U.S. Preventive Services Task Force: estimating certainty and magnitude of net benefit. Ann Intern Med. 2007;147:871-875 [5 references].
For I statements, the USPSTF has a new plan to commission its Evidence-based Practice Centers to collect information in 4 domains pertinent to clinical decisions about prevention and to report this information routinely. This plan is described in a paper that was published with the Skin Cancer recommendation: Petitti DB et al. Update on the methods of the U.S. Preventive Services Task Force: insufficient evidence. Ann Intern Med. 2009;150:199-205 (see "Availability of Companion Documents" field).
The first domain is potential preventable burden of suffering from the condition. When evidence is insufficient, provision of an intervention designed to prevent a serious condition (such as dementia) might be viewed more favorably than provision of a service designed to prevent a condition that does not cause as much suffering (such as rash). The USPSTF recognized that "burden of suffering" is subjective and involves judgment. In clinical settings, it should be informed by patient values and concerns.
The second domain is potential harm of the intervention. When evidence is insufficient, an intervention with a large potential for harm (such as major surgery) might be viewed less favorably than an intervention with a small potential for harm (such as advice to watch less television). The USPSTF again acknowledges the subjective nature and the difficulty of assessing potential harms: For example, how bad is a "mild" stroke?
The third domain is cost-not just monetary cost, but opportunity cost, in particular the amount of time a provider spends to provide the service, the amount of time the patient spends to partake of it, and the benefits that might derive from alternative uses of the time or money for patients, clinicians, or systems. Consideration of clinician time is especially important for preventive services with only insufficient evidence because providing them could "crowd out" provision of preventive services with proven value, services for conditions that require immediate action, or services more desired by the patient. For example, a decision to routinely inspect the skin could take up the
time available to discuss smoking cessation, or to address an acute problem or a minor injury that the patient considers important.
The fourth domain is current practice. This domain was chosen because it is important to clinicians for at least 2 reasons. Clinicians justifiably fear that not doing something that is done on a widespread basis in the community may lead to litigation. More important, addressing patient expectations is a crucial part of the clinician-patient relationship in terms of building trust and developing a collaborative therapeutic relationship. The consequences of not providing a service that is neither widely available nor widely used are less serious than not providing a service accepted by the medical profession and thus expected by patients. Furthermore, ingrained care practices are difficult to change, and efforts should preferentially be directed to changing those practices for which the evidence to support change is compelling.
Although the reviewers did not explicitly recognize it when these domains were chosen, the domains all involve consideration of the potential consequences-for patients, clinicians, and systems-of providing or not providing a service. Others writing about medical decision making in the face of uncertainty have suggested that the consequences
of action or inaction should play a prominent role in decisions.