Chapter 4. Evaluation of the AHRQ QI Program
In this section, we discuss the results of our environmental scan and interviews
with regard to the evaluation of the AHRQ QI indicators. We organize the discussion
according to four factors that are used as criteria for evaluating quality
indicators: importance, scientific soundness, usability, and feasibility. Since
this report focuses on the AHRQ QI program as a whole, the comments and insights
should be interpreted broadly, and not as critiques of individual indicators.
For example, "importance" here refers mainly to interviewees' perceptions
of the AHRQ QI program as a whole, not the importance of the constructs underlying
specific AHRQ QIs.
4.1. Importance
4.1.1. Users' general views on the importance of the AHRQ QI program
Representatives of nearly all of the organizations stressed the importance
of the AHRQ QI program. When asked an open-ended question about the role of
AHRQ in quality measurement, 11 of 54 interviewees identified AHRQ as a "national
leader" in measurement development and research. The AHRQ QI program
was described by a vendor as "a major player, both nationally and internationally...a
leader, the top of the pyramid." One interviewee captured this
sentiment:
AHRQ is a very important player and has a rich history of research and
evidence basis. The products they provide help everyone develop measures, such as
the National Guideline Clearinghouse. The measures they have done to date
have an audience, a place and a role—I know states use them.
Interviewees stressed that without the AHRQ QIs, they would have few alternatives
and would likely have to drastically change or eliminate their quality reporting
and/or measurement activities. As discussed in more detail below, the scientific
soundness of the QIs was highly regarded, as was the transparency of the QI evidence
review and validation that was conducted as a part of the AHRQ QI development
process.
Interviewees generally felt that it was important that a federal agency like
AHRQ, which is regarded as credible and respected, develop and support a quality
indicator set for public use. AHRQ's credibility and the transparency
of the AHRQ QI methods was often mentioned as a key factor in overcoming opposition
to quality measurement and reporting by stakeholders, particularly providers. We
were told:
There is a lot of good documentation regarding how rigorously the indicators
have been analyzed by AHRQ, researchers, academics, etc., in a collaborative
effort. This is important, especially for hospital administrators,
who have to convince medical staff that at least there is a rigorous process
behind the indicators.
Overcoming this type of opposition is particularly important for public reporting
and pay-for-performance initiatives, where providers' reputations and
revenues are at stake. In the scenarios described by many of our interviewees,
providers are typically not opposed on conceptual grounds to increasing the transparency
of the quality of care they provide. However, providers are sensitive
to being evaluated using measures that are unreliable or invalid, and they value
the opportunity to be able to review and evaluate the measures they are subjected
to and to raise objections to the results, where appropriate.
4.1.2. Importance of the Individual AHRQ QIs and Underlying Constructs
Although interviewees were not asked to comment on the importance of the constructs
underlying the AHRQ QIs or on individual indicators, a few interviewees raised
these issues. When asked why they use the AHRQ QIs, some interviewees
mentioned that the AHRQ QIs provide a "good estimate" or that they
offer a "reflection of reality."
Several interviewees also remarked that they appreciated having access to
the evidence showing that the AHRQ QIs represent important opportunities for
quality improvement, which is made available in the AHRQ technical documentation
under the headings "face validity" and "fosters real quality improvement."
35
A number of interviewees (10 of 54) mentioned that the availability of this information
in the documentation is a key reason why they decided to use the AHRQ QIs, or
described the documentation as a factor that facilitated the use of AHRQ QIs
in the face of opposition from stakeholders.
4.1.3. Impact of AHRQ QI use
Although only one organization in our sample had formally measured the impact
of AHRQ QIs on the quality of care delivered to patients, many interviewees
provided anecdotal evidence of the effect of the indicators on quality. The
one organization that did report conducting a study of the impact of its use
of the AHRQ QIs was The Alliance, a Wisconsin employer-purchasing cooperative
that publishes a quality report called QualityCounts. The evaluation
of the impact of QualityCounts was conducted by Judith Hibbard and a team from
the University of Oregon and was published in Health
Affairs.36
The study found that public reporting resulted in increased hospital quality in
the clinical areas included in the QualityCounts public report. The improvement
appears to be driven by hospital administrators' concerns about their
reputation.
When asked whether they had measured the impact of using the AHRQ QIs, a
number of interviewees (9 of 29 answering this question) responded that indicator
use began too recently to allow for observation of any impact. In addition,
several interviewees stated that the results of the AHRQ QIs can be difficult
to track longitudinally, since the specifications of the indicators have changed
over time.
However, 12 of the 29 interviewees who answered the question on impact reported
anecdotal evidence that their or their clients' use of the AHRQ QIs was
having some type of impact on quality of care. The impacts observed usually
consisted of an activity such as putting a new quality improvement process in
place, rather than an improvement in outcomes. Examples of this type of
anecdote include:
- A hospital representative reported:
We've definitely seen an impact on quality in areas flagged by the AHRQ QIs.
Some have been data problems and some have been actual quality improvements. For
example, using the infection indicator (PSI 7) we were able to see improvement
after implementing the ventilator and central line bundles. Similarly
with the sepsis indicator (PSI 13), we implemented the Surgical Care Procedure
Practices —a group of interventions to decrease sepsis, and we
saw improvements.
- A hospital network representative reported that staff were able to observe
the impact of a quality report card on quality improvements in network hospitals. Two
interventions introduced in response to the report card were: 1) new guidelines
on the angle of the hospital bed for ventilator-assisted pneumonia patients
and 2) implementation of a rapid response team.
- From a hospital using a vendor to implement AHRQ QIs:
We identified that we had high failure to rescue rates... This was the
information we needed to present to our executive team and board to obtain
resources to effectively establish and run a rapid response team.
- A hospital association representative reported:
There have been some changes in [the AHRQ QIs] data [over time], but I don't
know if they've been caused [by our use of the AHRQ QIs for quality improvement]. From
2001 to 2004 there is less variation among hospitals, and mortality has decreased
for several indicators; on the other hand, fewer hospitals are at or above
the volume thresholds. We have looked at trends in other available
data and, to the extent there is overlap, there is some correlation and
indication that quality is improving.
- A representative of another hospital association provided anecdotal evidence
of quality improvements, and also revealed a barrier to conducting more rigorous
assessments of impact:
Hospitals have taken action in terms of identifying individual cases
[from the numerator of AHRQ QIs where a problem is flagged], reviewing
them [using clinical data], and developing improvement plans (especially
moderate cases, such as infection). There are no published impact studies. The
climate (in terms of lawsuits, etc.) stands in the way of publishing
studies and until the climate is supportive, hospitals will not publish
anything.
- A representative of a state that publicly reports AHRQ QIs noted:
One example of where the report had an immediate impact was one hospital
that wasn't hitting the volume threshold for carotid endarterectomy [IQI 7]. They
decided to stop performing them. We would like to evaluate effectiveness of
reports at some point but don't have specific plans at this point.
- An insurance company representative using the AHRQ QIs for pay-for-performance
believes that the program has had an impact by garnering attention for quality
improvement from hospital management:
The indicators for patient safety have raised awareness. Because real
money is now on the table, the result has been that the hospitals' financial
people now have a more substantive dialogue with the quality people.
- A researcher who participated in a study that used the AHRQ QIs to evaluate
a state-wide hospital policy change reported substantial press coverage of
the results and an effect on other states considering the same policy.
The primary type of impact observed, however, was improvement to data quality.
Representatives of several organizations stated that they viewed improved data quality
as a natural progression in the implementation of a quality measurement program. When
a potential quality problem is first flagged using the AHRQ QIs, the first
response is to investigate whether the observed issue is due to a problem in
the data or a problem with the actual quality of care. Once the provider
organization has determined that the result in question is not a data artifact,
the provider often examines clinical data and/or performs some other type of
quality improvement activity to determine the cause of the quality problem. One
state government representative described this process:
The first step hospitals take, naturally, when they see a potential problem
is to ensure that it is not a data artifact. Hospitals found that they
were consistently up-coding or down-coding measures. They usually started
with initiatives to fix their data. Hospitals in some cases threw up
red flags and started quality initiatives but the first step is to answer the
question—is it an artifact of data or real issue? One hospital had 3 flags
[potential quality problems indicated by the ARHQ QIs]; two turned out to be
data problems, but one—stroke mortality—was a quality problem. However,
most of the feedback from hospitals has been around trying to make data better. We
don't have plans to evaluate the impact of our program because we just
don't have the resources.
Return to Contents
4.2. Scientific Soundness
4.2.1. Reliability
Users largely felt that the AHRQ QIs can be reliably constructed from hospital
discharge data, but that there was a certain learning curve during which hospital
coding departments had to adjust to the requirements for the QIs. Thus far,
coders had mainly been trained to apply coding rules to fulfill reimbursement
requirements, but now they had to understand that coding practices also had
implications for quality reporting. In selected instances, we heard concerns
about ambiguity in the coding rules—that the coding rules did not provide
sufficient guidance on whether to code an indicator-relevant diagnosis. For
example, we heard repeatedly that coders found it difficult to apply coding
rules for vaginal trauma during birth (5 of 36 users).
4.2.2. Validity
Our interviewees were impressed by the quality and level of detail of the
AHRQ documentation on the face validity of the indicators and stated that the
indicators captured important aspects of clinical care. Very rarely were indicators
challenged on conceptual grounds. One exception were the VBAC measures (IQIs
22 and 34), because a current American College of Obstetricians and Gynecologists (ACOG)
guideline37
recommends VBAC only for facilities with a sufficient infrastructure for emergency C-section,
which is commonly not present in smaller hospitals. Two interviewees who do
public reporting with AHRQ QIs challenged the validity of the volume-based
IQIs, as they did not think the scientific evidence was unambiguous for a positive
impact of high volumes of complex procedures on outcomes.
Sample size issues (whether due to the rarity of certain procedures or events
or the infrequency with which some procedures are conducted at certain facilities)
were repeatedly mentioned as threat to the validity of the indicators. In particular,
the adverse events underlying some of the PSIs (e.g., PSI 5: foreign body left
in during procedure) fortunately occur quite rarely, even in larger facilities.
Smaller facilities, such as rural hospitals, are commonly only able to report
on three QIs, mortality for acute myocardial infarction (AMI) and pneumonia
(IQIs 15, 20, and 32), because they do not have the minimum required number
of cases (20) for other indicators. While interviewees agreed on the face validity of
the indicators, a third of the interviewees (16 of 54) argued that such sample
size limitations would render some indicator rates unstable and thus hard to
interpret.
On construct validity, most users stated that the indicators were correctly
operationalized within the constraints of the underlying data source. Isolated
findings of specification errors were brought to our attention, but interviewees
emphasized that the AHRQ team was always able to address those quickly. The
limitations of administrative data were frequently mentioned as a threat to
validity, because the UB-92 format would not provide a sufficient level of
clinical detail to account for all factors that should be considered in constructing
the measures. Several potential improvements were mentioned, such as the addition
of flags for conditions that were present on admission or for do-not-resuscitate
orders. The AHRQ QI team is incorporating functionality for a condition
present-on-admission flag into the next iteration of QI specifications.
Some users thought that formal validation studies should be conducted to
assess the validity of the indicator results in relation to indicators based
on medical records data. As discussed above, we learned that hospitals are
conducting analyses to find out whether poor performance on a given QI is due
to an organization's coding practices or points to a real quality problem. But
those efforts are typically driven by unusually poor performance, are not
systematically analyzed, and focus on identifying false positive events (i.e.,
an adverse event was flagged by the indicator that could not be ascertained
through chart review). False negative events (i.e., the indicator algorithm
failed to identify an actual adverse event) were rarely researched.
4.2.3. Risk Adjustment
Since the AHRQ IQIs and PSIs generally represent health outcomes, they are
sensitive to the morbidity burden of the patient population and must be risk-adjusted
to provide a valid comparison of quality. The IQIs and PSIs currently
use different risk adjustment methods, although AHRQ will move to a single
method for all of the QIs in the future. Currently, the IQIs use the
All Patient Refined Diagnosis-Related Groups (APR-DRGs), a proprietary system
owned by 3M Health Information Systems. The PSIs use a public-domain
risk-adjustment system developed by AHRQ. The current risk adjustment
methods for both the PSIs and the IQIs were regarded as adequate.
Users particularly emphasized that the AHRQ method for the PSIs had the
advantage of being transparent and easy to understand. Even though the
APR-DRGs are based on proprietary software, interviewees were generally
comfortable with using them for IQI risk adjustment, because they already
used the software for other purposes, such as payment, and were familiar
with its structure and logic. However, 22% (12 of 54) of interviewees
thought that the risk adjustment approach used for the AHRQ QIs should be
improved. In particular, interviewees would like the see both PSIs and IQIs
using the same risk adjustment method and would like AHRQ's method to be aligned
with that of CMS, University Healthsystem Consortium, and other developers.
Return to Contents
4.3. Usability
As discussed in detail above, the AHRQ QIs have been used by many types of
organizations and for a variety of purposes. Most interviewees stated
that the AHRQ QI products provided a workable solution for their needs and
were very appreciative of the support that the AHRQ QI team provides for implementation
and ongoing use. Despite these overall favorable impressions of the usability
of the QIs, two issues were raised repeatedly: the need for development of
reporting templates, and the need for clearer guidance on the use of the AHRQ
QIs for public reporting and pay-for-performance programs.
4.3.1. Reporting Template
A number of interviewees (9 of 54) highlighted as a top priority the need
for a standard format for reporting AHRQ QI results. At the simplest
level, some interviewees wanted AHRQ-supported, standard, basic names for the
AHRQ QIs in plain language, as some of the current indicator names are difficult
for non-clinical audiences to understand. Other interviewees expressed
a desire for more guidance and support on other aspects of presentation. Currently,
many organizations have developed their own reporting formats. Interviewees
were interested in information such as:
- How should indicators be analyzed and reported?
- How should outliers be identified?
- Which indicators are consumers expected to respond to most?
- How should consumers interpret the results of the indicators?
- How do results compare to national, state, or other benchmarks?
4.3.2. Composite indicators
Twelve interviewees expressed a desire for an AHRQ-supported methodology for
constructing a composite indicator. Forming composites would allow organizations
to summarize the results based on multiple indicators into one statistic, which
is easier to grasp and communicate, in particular for non-expert audiences.
Composites would also help overcome sample size limitations by allowing information
to be pooled. Four organizations whose representatives participated in our
interviews have developed their own AHRQ QI composite indicators but most would
prefer an AHRQ-developed approach. The AHRQ QI team is currently working on
the development of composite indicators to meet those needs.
4.3.3. Guidance on the use of the AHRQ QIs for public reporting and pay-for-performance
Not surprisingly, our questions on suitability of the AHRQ QIs for public
reporting and pay-for-performance programs led to vivid and often emotionally
charged discussions and comments. Interviewees who are currently using the
AHRQ QIs for public reporting and pay-for-performance generally felt that they
provided a workable solution for their needs. The introduction of those programs
typically followed a similar sequence: following the initial decision to start
a public reporting or pay-for-performance program, a controversial debate would
start on the merits of such initiatives in general, and the suitability of
administrative data for quality measurement in this context in particular.
Then, hospitals and physicians would slowly start to participate rather than
resist. Many interviewees told us that AHRQ's reputation for high quality
research, the excellent documentation of the scientific basis of the indicators,
the transparency of the method, and the ease of implementation and use were
crucial factors in obtaining buy-in. The first release of the data was commonly
accompanied by media attention and anxiety on the part of providers. Both would
subside in subsequent releases, as all stakeholders became more familiar and
comfortable with the program.
Still, half of the interviewees who use AHRQ QIs for public reporting stated
that additional standards and guidance on the reporting of AHRQ QI results
were needed. Some interviewees (10 of 54) expressed dissatisfaction with the
current AHRQ stance on the appropriateness of the AHRQ QIs for public reporting.
These interviewees described the current guidance as "difficult to find,"
"weak," and presenting "mixed messages." The lack of clarity
is perceived to be largely due to shifts in AHRQ's stance on appropriate uses of
the QIs.
Previously published guidance contained much stronger caveats against
inappropriate uses than the current guidance. Interviewees felt that
clearer guidance from AHRQ would help to counter opposition from those who
argue that the AHRQ QIs should only be used for quality monitoring and improvement
and research, but not as a public reporting or pay-for-performance tool.
Taking the opposing view were several interviewees (mostly hospitals) who
would like to see AHRQ make a clear statement that the AHRQ QIs are not appropriate
for use in public reporting, pay-for-performance, or other reporting activities. A
representative of one hospital told us:
The AHRQ QIs are fabulous tools, but they are assessment tools, not judgment
tools. AHRQ's white paper was very clear in saying that this was
not AHRQ's intent. The issue is that AHRQ allowed folks to go too
far without a caveat. They tried with that white paper, but now they're
endorsing states using it for public reporting—it's not appropriate.
Return
to Contents
4.4. Feasibility
We were told consistently that a major advantage of the AHRQ QIs was the feasibility
of their implementation. They require only administrative data in the UB-92
format to which many users have routine access, since those data are already
being used for billing and other administrative purposes and have to be collected
and reported by hospitals in most states.i
Interviewees emphasized that another substantial advantage of the AHRQ QIs is
that the indicators have clearly defined and publicly available specifications,
which helps with implementation of measurement. These specifications were regarded
as of particular importance for hospitals, as the originators of the data, because
the specifications enable hospitals to work with their coding departments to
ensure that the required data elements were abstracted from medical records consistently
and with high quality. In addition, users who analyze data with the QIs, such
as researchers, appreciated the fact that they could dissect the indicator results
and relate them back to individual records. That capability helped researchers
gain a better understanding of the indicator logic and distinguish data quality
issues from actual quality problems.
i. Similarly, many
hospitals currently use the APR-DRG grouper, which is the basis for risk
adjustment of the IQIs, for billing and rate setting so that they are familiar
with its logic.
Return to Contents
Proceed to Next Section