Section 4: Evidence Report Development
4.1 Literature Retrieval and Data Abstraction for Topic Reviews
After literature searches are conducted, the team of evidence reviewers
uses a set of a priori inclusion/exclusion criteria as appropriate to each
key question to define whether identified literature is relevant to the
review. These criteria are applied twice first at the title, or title and
abstract review stage, and a second time at the article review stage. This
two-stage process is designed to be efficient, to minimize errors, and to be
transparent and reproducible.
Titles and abstracts are reviewed by broadly applying the inclusion criteria
for the review. When in doubt at the title/abstract review phase as to whether an
article might meet the inclusion criteria, reviewers should err on the side of
inclusion so that article is retrieved and can be reviewed at the article
stage. All citations are coded with at least an excluded or included code, which
is managed in a database and used to guide the further literature review steps.
This database is the source of the final tables documenting the review process.
Full-text articles are retrieved for all citations included at the
title/abstract stage, and are reviewed by a member of the review team, using
inclusion/exclusion criteria for relevance and for quality. Included articles
receive codes to indicate the key question(s) for which they meet criteria and
excluded articles are coded for a reason for exclusion. The reasons for exclusions
could be either the primary reason or the first reason encountered in reviewing
the article; and thus the distribution of reasons for exclusion do not necessarily
represent the state of the excluded literature. Similarly, all the reasons for
exclusion of an individual article may not be listed in the final exclusion
table. Before they are abstracted, articles are reviewed to ensure that they
meet minimal design-specific U.S. Preventive Services Task Force (USPSTF) quality
criteria.
The abstract and article review process generally involves a team of reviewers
and is conducted using established research methods in order to minimize reviewer
drift as well as inter-rater review and coding differences.
4.1.1 Procedures for Abstract and Article Review
Abstracts undergo "dual-review" in that either all abstracts are
reviewed separately and reconciled, or at least abstracts excluded by the primary
reviewer are re-reviewed by another reviewer to ensure that all appropriate studies
are included. Any studies excluded by the first reviewer but included by the second
reviewer are included in the next phase.
When the volume of abstracts is very high due to the non-specific nature of
searches possible within a specific literature (e.g., alcohol misuse), reviewers
may use a sampling scheme for quality assurance as follows. For each key question,
all of the searches (ML, CCRCT, PsycINFO) will be considered as one search. Reviewers
will dual review a set number (1000) of the most recent abstracts that
proportionally represents the key databases searched for that key question, and
will then review a random subset of the remainder. All abstracts resulting from
the CCRCT are dual-reviewed. The other database searches are proportionally reviewed
to get up to a total of 1000 abstracts that are dual reviewed, then a random
subset of the remainder, to equal about 20-25% of the total number of articles,
will be dual-reviewed. In the case of a sampling approach to dual review,
inter-rater reliability is calculated using the kappa statistic.
4.1.2 Database of Abstracts
For each systematic review, the review team establishes a database of all
articles located through searches and from other sources (i.e., both those
included and those eventually excluded from the final set of articles reviewed).
Information captured in the database includes the source of the citation (e.g.,
search source, outside source), whether the abstract was included or excluded, the
key question(s) associated with each included abstract, whether the article was
excluded (with reasons for exclusion) or included in the review, and other
coding approaches developed to support the specific review. For example, a
hierarchical approach to answering a question may be proposed at the work plan
stage, specifying that reviewers will consider a type of study design or a
clinical setting only if research data are too sparse for the preferred type of
study. While reviewing abstracts and articles, these can be coded to allow easy
retrieval during the conduct of the review, if warranted.
4.1.3 Documenting Search Results
Search terms used for each key question, along with the yield associated with
each term, are documented in a table or set of tables; these appear in the summary
of the literature search (early in any topic review project) and in the final
evidence synthesis. Follow-up searches to capture newly published data are
conducted periodically as the project progresses; the frequency of these searches
depends on the individual topic. A final search is conducted close to the time of
completion of the draft evidence report, with the exact timing determined by the
evidence review team. The final documentation of the search should indicate the
most recent time point searched.
To the extent that it suits the review rationale and is feasible, search dates
for different key questions should conform to one another.
4.1.4 Data Abstraction Approaches
- Use of forms: Data may be
abstracted in forms developed or adapted for the review, or directly into
evidence tables.
- Minimal elements to abstract:
Although the Task Force has no standard or generic abstraction form,
the following broad categories are always abstracted from included
articles: key question, study design, study participant description,
details of the intervention or screening test being studied, study
results with emphasis on health outcomes where appropriate, and
individual study quality information, including specific
threats to validity. Information relevant to generalizability is consistently
abstracted. Each team uses these general categories, and other categories
if indicated, to develop an abstraction form specific to the topic at hand.
For example, source of funding may be an important variable to abstract
for some topics.
- Abstraction of included articles:
The evidence review teams abstract only those articles that, after
review of the entire article, both meet criteria for quality and
focus on the key question at hand. Abstractions are conducted by trained
team members, and evidence review teams may, but do not routinely, double
abstract all included articles. Key articles are always read and checked
by more than one team member. All reviewers are trained in the topic, the
analytic framework and key questions, and the use of the abstraction instrument.
Initial reliability checks are done for quality control.
- Other quality assurance methods:
It is desirable to have more than one evidence review team member check
data accuracy for key data elements, including data included in a summary
table, a meta-analysis, or in calculations supporting a balance
sheet/outcomes table.
4.2 Critical Appraisal
By means of its explicit analytic framework and key questions, the Task Force
indicates what issues it must examine to make its recommendation. By setting
inclusion and exclusion criteria for the searches for each key question, the
Task Force indicates what evidence it will consider admissible. The critical
aspect used to determine whether an individual study is admissible is its
internal and external validity with respect to the key question posed. This
initial examination of the "quality" (i.e., internal and external
validity) of individual studies is conducted with established criteria (go to
Appendix VII and Appendix VIII) by the evidence review team or USPSTF topic work group.
If questions arise in the course of this process, Task Force members are asked
to review the articles in question. Studies with fatal flaws (i.e., with
"poor" internal or external validity) are not admissible for
further consideration. Likewise, studies of interventions that require training
or equipment not feasible in even high quality primary care would be judged to
have poor external validity for the key questions posed by the Task Force, and
would not be admissible evidence.
Once the admissible evidence has been found, and the internal and external
validity of individual studies has been assessed, the Task Force must consider
the level of evidence that the studies provide to answer the KQs. The Task Force's
process for determining the level of evidence over a key question involves answering
the following 6 critical appraisal questions about the admissible evidence. The
Task Force uses these same 6 critical appraisal questions to determine the overall
evidence of certainty of net benefit for the entire preventive service, including
all key questions in the analytic framework. (Go to Section 5 for a description
of the Task Force's methods for judging the cumulative evidence and
arriving at a recommendation.)
4.2.1 Critical Appraisal Questions
- Do the studies have the appropriate research
design to answer the key question(s)?
- To what extent are the existing studies of
high quality? (i.e., what is the internal validity?)
- To what extent are the results of the studies
generalizable to the general US primary care population and situation?
(i.e., what is the external validity?)
- How many studies have been conducted that
address the key question(s)? How large are the studies? (i.e., what is the
precision of the evidence?)
- How consistent are the results of the studies?
- Are there additional factors that assist us in
drawing conclusions (e.g., presence or absence of dose-response effects; fit
within a biologic model)?
|
4.2.2 Levels of Critical Appraisal
The evidence review process involves assessing the validity and reliability of
admissible evidence at 3 levels:
- The individual study;
- The key question (i.e., linkage in the analytic
framework); and
- The entire preventive service.
For individual studies, questions 1-3 and 6 are assessed. That is, a single study
will be categorized as to study design and whether internal and external
validity are "good," "fair," or "poor" to
answer the key question. For the key question and entire preventive
service levels, all 6 questions must be considered.
For the individual study level, the evidence review team finds admissible
evidence and then categorizes the internal validity (i.e., quality—Appendix VII) of each study into "good", "fair", and
"poor" categories. For critical or borderline studies, the Task
Force leads (and sometimes the entire Task Force) will also consider the
individual studies. The EPC also provides the Task Force with descriptions of
factors entering into the determination of external validity (i.e., applicability
or generalizability—Appendix VIII), as well as descriptions of each
study's research design and the number and description of studies relevant
to each key question.
For the key question level, the Task Force, using information about the evidence
supplied by the EPC, assesses the level of evidence across each key question using
all 6 critical appraisal questions. The body of evidence is often categorized as
to the highest level of applicable evidence available. The Task Force categorizes
the evidence across each key question into one of 3 categories:
"convincing," "adequate," or "inadequate."
For the preventive service, the entire body of evidence in the entire analytic
framework is synthesized by the Task Force into categories of "certainty"
of the overall evidence: high, moderate, and low. Again, the Task Force uses
all 6 critical appraisal questions for this determination. (Go to Appendix IV regarding topic workgroup procedures for assessing certainty.)
4.3 Assessing Evidence at the Individual Study Level
4.3.1 Critical Appraisal
All individual articles are critically appraised to determine the validity and
reliability of the evidence they provide. This assessment is
conducted primarily by the topic team (usually led by the EPC or by AHRQ team
leaders), with input from Task Force members for critically important or
borderline articles. The assessment of internal (i.e., "quality") and
external validity (i.e., applicability or generalizability) are based on explicit
criteria, given in Appendix VII and Appendix VIII.
4.3.2 Internal Validity
The Task Force recognizes that research design is an important component of
the validity of the information in a study, for the purpose of answering a key
question. Although RCTs cannot answer all key questions, they are ideal for
questions of the benefits or harms of various interventions. Thus, for these
questions, the current Task Force endorses a slightly revised version of the
"hierarchy of research design" used by the second Task Force:
I: Properly powered and conducted randomized controlled trial (RCT);
well-conducted systematic review or meta-analysis of homogeneous RCTs
II-1: Well-designed controlled trial without randomization
II-2: Well-designed cohort or case-control analytic study
II-3: Multiple time series with or without the intervention; dramatic
results from uncontrolled experiments
III: Opinions of respected authorities, based on clinical experience;
descriptive studies or case reports; reports of expert committees
In assessing individual studies, all are classified first according to this
design code, with additional designations added for other or unconventional designs.
Although research design is an important component of the information provided
by an individual study, the Task Force also recognizes that not all studies within
a research design have equal internal validity ("quality"). To assess
more carefully the internal validity of individual studies within research designs,
the Task Force adopted design-specific criteria for assessing the internal validity
of individual studies.
These criteria, given in Appendix VII, provide general guidelines for categorizing
studies into one of three internal validity categories: "good,"
"fair," and "poor." These specifications are not meant to
be rigid rules; individual exceptions, when explicitly explained and justified,
can be made. In general, a "good" study is one that meets all
design-specific criteria. A "fair" study is one that does not meet
(or does not clearly meet) at least one specified criterion, but has no known
"fatal flaw." "Poor" studies have at least one fatal flaw.
A fatal flaw is a deficit in design or implementation of the study that calls
into serious question the validity of its results for the key question being
addressed.
The Task Force views the level of evidence, whether for an individual study, a
key question/linkage, or an entire preventive service, as independent of the magnitude
of effect. Thus, a study (or a number of studies) could be classified as "good"
even if it (they) found no effect of the preventive service.
4.3.3 External Validity (Generalizability) and Applicability
It is necessary not only to assess the external validity (generalizability)
of the individual studies that contribute to answering the key questions, but also
to assess the body of evidence in order to judge its applicability to
the population or populations that are the target for the clinical preventive
service, to the settings in which the service will be implemented, and to the
providers who will deliver the service. In this document, the term
"external validity" will be used when discussing assessment
of individual studies, and the term "applicability" will be
used when discussing the assessment linkages across key questions and the overall
body of evidence, even though the external validity of individual studies is a
key element of the applicability judgment. The summative judgment about
applicability is more than the sum of the assessment of each of the parts.
For the USPSTF, the study-level assessment of external validity and the
assessment of applicability are done separately.
A description of the overall conceptual approach for both components is
provided below. Appendix VIII gives detailed information on criteria and process.
4.3.3.1 Assessment of the External Validity of a Study
Judgments about the external validity ("generalizability") of a
study pertinent to a preventive interventions address three main questions:
- Considering the subjects in the study, to what
degree do the study's results measure the likely clinical results in the
asymptomatic people who are the recipients of the preventive service in the
United States?
- Considering the setting in which the study was
done, to what degree do the study's results measure the likely clinical result
in United States primary care situation? and
- Considering the providers who were a part of the
study, to what degree do the study's results measure the likely clinical results
in providers who would deliver the service in the United States primary care setting?
4.3.3.2 Populations
The subjects that comprise the participants in a study may differ from people
receiving primary care in many ways. Such differences may include gender, ethnicity,
age, co-morbidities, and other personal characteristics. Some of these differences
have a small potential to affect the study's results and/or the outcomes of an
intervention. Other differences have the potential to cause large differences
between the study's results and what would be reasonably anticipated to occur
in asymptomatic individuals or people who are the target of the preventive
intervention.
The choice of the study population may affect the magnitude of the benefit
observed in the study through exclusion/inclusion criteria that limit the study
to people most likely to benefit; other study features may impact the risk level
of the subjects recruited to the study. The absolute benefit from a service is
often greater for people at increased risk than for people at lower risk.
Because of the presence of certain research design elements, adherence is likely
to be greater in research studies than in the usual primary care practice. This may
lead to overestimation of the benefit of the intervention when delivered to people
who are less selected (i.e., who more closely resemble the general population), and
who are not subject to the special study procedures.
4.3.3.3 Situations
Factors related to the study situation relative to the situation in U. S. primary
care settings must be assessed when assessing the external validity of a study.
The choice of study setting may lead to an over- or under-estimate of the benefits
and harms of the intervention as they would be expected to occur in U.S. primary care
settings. For example, results of a study in which items essential for the service to
have benefit are provided at no cost to patients may not be attainable when the item
must be paid for. Results obtained in a trial situation that ensures immediate access to
care if a problem or complication occurs may not be obtainable in a usual care
situation, where the same safeguards cannot be ensured, and where as a result the
risks of the intervention are greater.
4.3.3.4 Providers
When assessing the external validity of a study, factors related to the experience
of providers in the study should be considered in comparison with the experience
of providers likely to be encountered in primary care in the U.S. Studies may
involve providers selected for their experience or their high skill level.
Providers involved in studies may undergo special training that affects their
performance of the intervention. For these and other reasons, the effect of the
intervention may be overestimated or the harms underestimated compared with the
likely experience of unselected providers in the primary care setting.
4.3.3.5 Criteria and Process
The criteria used to rate the external validity of individual studies according
to the population, the situation, and the setting are described in detail in Appendix
VIII. As with internal validity, this assessment of external validity is usually
conducted initially by the EPC or AHRQ topic team leader, with input from Task
Force members for critically important or borderline studies. This assessment is
then used to give each study a rating using the same 3-tiered grading scheme as
for internal validity: good, fair, and poor.
The underlying question answered in the grading the external validity of a study
as good, fair or poor is:
If the study had been done with the typical U.S. primary care population,
situation, and providers, what is the likelihood that the results would be different
in a clinically important way?
4.4 Applicability of the Body of Evidence to the Target
Population/Situation/Setting
USPSTF members assess the applicability of the body of evidence to
populations/situations/settings as one of the components of the overall process
of making recommendations.
Judgment about applicability considers the populations, situations, and providers
in each study, but it also involves synthesis of the evidence from the individual
studies across the key questions, and for the overall body of evidence.
The overall goal of the assessment is to judge whether there are likely to be
clinically important differences between the results observed in studies as a
whole and the results expected when the intervention is implemented in the U.S.
primary care populations/situations/providers.
The following questions are addressed:
- Can an inference be made from the evidence that
the intervention has any effectiveness for the U.S. primary care
populations/situations/providers?
- Is the magnitude of benefit observed
in individual studies that comprise the body of evidence likely to be the same
for the U.S. primary care populations/situations/providers?
- Are the harms observed in individual
studies that comprise the body of evidence likely to be the same for the U.S.
primary care populations/situations/providers?
- What is the relationship between benefits
and harms derived from the evidence likely to be for the U.S. primary
care population/situation/providers?
- Is the time and effort required to provide
the interventions that comprise the body of evidence attainable in the U.S. p
rimary care situations/providers?
- Can people in U.S. primary care
populations/situations be expected reasonably to partake of the interventions
that comprise the body of evidence considering their time, effort, and cost?
- Is the extrapolation of data from the body of
evidence to large populations of asymptomatic people biologically plausible?
4.4.1 Relative Importance of Efficacy/Effectiveness
The USPSTF seeks to make recommendations based on projections of what would
be expected from widespread implementation of the preventive service within the
actual world of U.S. medical practice. For this reason, the Task Force considers
carefully the applicability to medical practice of "efficacy" studies,
which measure the effects of the preventive care service under ideal circumstances.
However, the USPSTF ultimately seeks to base its recommendations on
"effectiveness," which is what results could be expected with
widespread implementation under usual practice circumstances.
Questions arise about whether the USPSTF recommendations consider effectiveness
in usual practice or in ideal/excellent practice. The "situation" for
practices varies widely within the U.S. Some practices have greater support and
more resources than others. The TF attempts to makes recommendations for all of
these practice "situations," and may specify what resources are
required for implementation.
4.4.2 Definition of Primary Care
To further specify the situation that is the object of its concern, the Task
Force has adopted the Institute of Medicine's definition of primary care:
Primary care is the provision of integrated, accessible health care
services by clinicians who are accountable for addressing
a large majority of personal health care needs, developing a
sustained partnership with patients, and practicing in the
context of family and community. This definition acknowledges the importance
of the patient clinician relationship as facilitated and augmented by teams and
integrated delivery systems. (7)
4.4.3 Primary Care Interventions Addressed by the USPSTF
The USPSTF considers interventions that are delivered in primary care settings
or are judged to be feasible for delivery in primary care. To be feasible in primary
care, the intervention could target patients seeking care in primary care settings,
and the skills to deliver the intervention are or could be present in clinicians
and/or related staff in the primary care setting, or the intervention could generally
be ordered/initiated by a primary care clinician.
4.5 Other Issues in Assessing Evidence at Individual Study Level
4.5.1 Dealing with Secondary and Aggregate Endpoints
The Task Force adopted a policy of critically appraising all of the endpoints
(outcomes) of trials in a similar manner, following the 6 critical appraisal
questions listed earlier (Section 4.2.1). In its review, the Task Force takes note
of the biological plausibility of a study's finding, the supporting evidence, and
whether an outcome is a primary or secondary one. Similarly, the Task Force examines
composite (aggregate) outcomes carefully. It generally asks 3 questions of these
outcomes: (1) are the component outcomes of similar importance to patients? (2) did
the more or less important outcomes occur with similar frequency? And (3) are
the component outcomes likely to have similar relative risk reduction (RRR)?
4.5.2 Ecologic Evidence
Because biases may be present in ecologic data, the Task Force is careful in
its use of this type of evidence. The Task Force rarely accepts ecologic evidence
alone as sufficient to recommend a preventive service. Because this evidence is
widely accepted by others, the Task Force developed a policy for when it uses
ecologic evidence, and how this evidence is critically appraised.
By ecologic evidence we mean data that are not at the individual level; but
rather, that relate to the average exposure and average outcome within a population.
The "ecologic fallacy" is the erroneous conclusion that there is an
association when exposure occurs in some members of a population and an outcome in
other members. In addition, ecologic data sets often do not include other potential
confounding factors; thus, one cannot directly assess the ability of these potential
confounders to explain apparent associations. Finally, some ecologic studies use
data collected in ways that are not accurate or reliable.
Ecologic studies usually make comparisons of outcomes in exposed and unexposed
populations in one of two ways: (1) between different populations, some exposed
and some not, at one point in time (i.e., cross-sectional ecologic study); or (2)
within a single population with changing exposure status over time (i.e., time series
ecologic study). In either case the potential for making the ecologic fallacy is
a major concern.
As it is not possible to completely avoid the potential for making the ecologic
fallacy in these studies, the USPSTF does not usually accept ecologic evidence alone
as adequate to establish the causal association of a preventive service and a
health outcome. In some unusual situations (e.g., cervical cancer screening)
ecologic evidence may play the primary role in the Task Force's evidence review,
but this is rare.
More frequently, ecologic evidence is considered by the Task Force in the
following situations:
- For background, for an understanding of the
context in which the preventive service is being considered;
- When well-known ecologic data are being used as
evidence by others to justify either recommending or not recommending the service
the Task Force is considering;
- Where other evidence is inadequate but the Task
Force thinks that good ecologic evidence could add important information;
- When there are reports of dramatic results of
ecologic studies.
In the situations above, the Task Force critically appraises ecologic studies.
High quality ecologic evidence meets the following criteria:
- The exposures, outcomes, and potential confounders
are measured accurately and reliably.
- Other potential explanations and potential confounders
are considered and adjusted for.
- The populations and interventions being compared
are comparable.
- The populations and interventions are relevant to
a primary care population.
- Multiple ecologic studies are present that
are consistent/coherent.
4.5.3 Mortality as Outcome: All-cause Versus Disease-specific Mortality
When a condition is a common cause of mortality, all-cause mortality, instead
of cause-specific mortality, is a desirable health outcome measure. Few preventive
interventions attain the high standard set by use of this outcome. The fact that
there is a discrepancy between the effect of the preventive intervention on all-cause
and disease-specific mortality is important to recognize and explore. A discrepancy
may arise when (1) there is real benefit of the preventive intervention for a
targeted condition or (2) because of methodologic issues that are inherent in the
study of all-cause mortality:
4.5.3.1 Real Benefit for the Targeted Condition
Three situations can produce this kind of discrepancy. First, when a preventive
intervention increases deaths from causes other than the one targeted by the
intervention, all-cause mortality may not be decreased even when cause-specific
mortality due to the targeted condition is decreased. This indicates a potential
harm of the intervention for a condition other than the one targeted.
Second, when the condition targeted by the preventive intervention is rare
and/or the effect of the intervention on cause-specific mortality due to the
targeted condition is small, the effect on all-cause mortality may be very small
or even non-existent.
Third, when the preventive intervention is applied in a population with
strong competing causes of mortality, the effect of the preventive intervention
considering all-cause mortality may be very small or even non-existent even though
the intervention decreases cause-specific mortality due to the targeted condition.
For example, preventing death due to hip fracture by implementing an intervention
to decrease falls in 85-year women may not decrease all-cause mortality over reasonable
time frames for a study because the force of mortality is so large at this age.
4.5.3.2 Methodologic Issues
Methodologic issues can arise because of difficulties in the assignment of cause
of death based on records available to or used by a study. In the absence of detail
about the circumstances of death, death may be attributed to a chronic condition
known to exist at the time of death but without any true contribution to death.
Coding conventions for death certificates also result in deaths from some causes
being attributed to chronic conditions present at death routinely. For example, it
is conventional to assign people with a mention of cancer on the death certificate
to cancer as primary cause of death. The result of these methodologic issues is a
biased estimate of cause-specific mortality, which may not reflect the true effect
an intervention has on death from the targeted condition.
As indicated above, studies that provide data on all-cause and cause-specific
mortality may have low statistical power to detect even large or moderate effects
of the preventive intervention on all-cause mortality. This is especially true when
the disease targeted by the screening test is not common.
When data are available, the Task Force considers data on both all-cause
and cause-specific mortality in making its recommendations, taking into account
the real and methodologic contributions to potential discrepancies between apparent
and true effect.
4.5.4 Subgroup Analyses
The Task Force is interested in targeting its recommendations to those populations
or situations in which there would be maximal benefit for the harms and costs involved.
Thus, it often takes into consideration subgroup analyses of large studies. It attaches
varying levels of credibility to those analyses, however, depending on such factors as:
the size of the subgroup; whether randomization occurred within subgroups; whether
a statistical test for interaction was done; whether the results of multiple subgroup
analyses are consistent within themselves; whether the subgroup analyses were
pre-specified; and whether the results are biologically plausible.
4.5.5 Relative Versus Absolute Risk Reduction
The Task Force is interested in reducing risk both for populations and for
individuals. For this reason it takes into account both relative (RRR) and
absolute risk reduction (ARR) from intervention studies. It generally prioritizes
ARR over RRR. That is, it is less impressed with a large RRR in situations of low
ARR; it remains interested in an intervention with a low RRR if its ARR is high.
4.6 Incorporating Other Systematic Reviews in USPSTF Reviews
Existing systematic reviews or meta-analyses that meet quality and relevance
criteria can be incorporated into topic reviews done for the USPSTF. Quality
criteria for reporting meta-analyses are specified by the QUORUM and MOOSE
guidelines published, respectively, by The Lancet and the Journal of the American
Medical Association (JAMA) (8, 9). The USPSTF has specified its criteria for
critically appraising systematic reviews (go to Appendix VII and Appendix VIII). Relevance
is considered at two levels: at the general level of the review or analysis
question, and at a more specific level. At the general level, the question would
be "Is the review or meta-analysis relevant to one or more of the USPSTF
key questions for this review?" The more specific question would be:
"Did the review include the desired study designs and relevant population(s),
settings, exposure/intervention(s), comparator(s), and outcome(s)?" Recency
of the review is also a consideration, and can determine whether a review that
meets quality and relevance criteria is recent enough not to require any bridging
searches. Finally, existing reviews can be used in several ways in a USPSTF review:
(1) to answer one or more key questions wholly or in part; (2) to substitute
for conducting a systematic search for a specific time period for a specific
key question; or (3) as a source document for cross-checking the results of
systematic searches.
4.7 Use of Observational Designs in Questions of the Effectiveness/Efficacy
of Interventions
The Task Force prefers large, well-conducted RCTs to determine the benefits and
harms of preventive services. In many situations, however, such studies have not
been or are not likely to be done. When these studies can be done, and other
evidence is insufficient to determine benefits and/or harms, the Task Force advocates
that large, well-conducted RCTs be done. It notes that small, poorly-conducted RCTs
are often of little use.
In some situations, however, the Task Force does use observational evidence to
make recommendations. Multiple, large, well-conducted observational studies with
consistent results showing a large effect size that does not change markedly with
adjustment for multiple potential confounders may be judged sufficient to determine
the magnitude of benefits and harms of a preventive service. Also, large,
well-conducted observational studies often provide essential additional evidence
even in situations where there are adequate RCTs. Ideally, RCTs provide evidence
that an intervention can work and observational studies provide better understanding
of the populations where the benefits would be greatest.
Return to Contents
Proceed to Next Section