Your browser doesn't support JavaScript. Please upgrade to a modern browser or enable JavaScript in your existing browser.
Skip Navigation U.S. Department of Health and Human Services www.hhs.gov
Agency for Healthcare Research Quality www.ahrq.gov
www.ahrq.gov

AQA Second Invitational Meeting Summary

Session on Performance Measurement

Determining Clinically Relevant, Standardized Performance Measurement for Ambulatory Care

John Tooker, American College of Physicians
Carolyn Clancy, Agency for Healthcare Research and Quality (AHRQ)

John Tooker said that the workgroup hoped that at this meeting the broader group would:

  • Reach consensus on core principles for selecting a starter set of performance measures for ambulatory care.
  • Reach consensus on the conditions covered in a starter set of measures for ambulatory care.
  • Review the proposed starter set of measures and identify other measures for consideration.

He then outlined the parameters that the workgroup has proposed for selecting a starter set of performance measures for ambulatory care:

  • Measures should be reliable, valid, and based on sound scientific evidence.
  • Measures should be aligned with priority areas of the Institute of Medicine (IOM).
  • Measures should be limited to ambulatory care, and as much as possible complement the hospital reporting measurement project.
  • Measures should focus on areas that have the greatest impact in making care safe, effective, patient centered, timely, efficient, or equitable (IOM's six aims for improvement), and in which the most improvement can be made ("80/20 rule").
  • Implementation of measures should be the least burdensome possible (i.e., electronic data systems should be considered whenever possible).
  • The starter measurement set should be of a manageable size (for physician practices, health insurance plans, and consumers).
  • Measures should be selected based on where there has been strong consensus among stakeholders and should be predictive of overall quality performance.
  • Measures should reflect a spectrum rather than a single dimension of care (i.e., prevention and health promotion, chronic illness, and acute care).

Tooker asked participants to review the parameters and offer comments and suggestions for other parameters that might be added.

Tooker also discussed the proposed starter set of performance measures for ambulatory care, which the workgroup had selected based on application of the general parameters outlined above. The proposed starter set includes prevention measures (in the areas of cancer screening and immunization) and chronic care measures (in the areas of coronary artery disease, diabetes, asthma, depression, and congestive heart failure).

The next step, said Tooker, is for the workgroup to develop the "supra criteria" by which to select a starter set of specific measures. He said examples of supra criteria include clinical/scientific issues (clinical relevance, impact, and demonstrated improvability and good outcomes), implementation issues (ease of implementation), and contextual issues (to what degree the measure is about physician versus other system performance and who is, could, or should be responsible for patients who are treated by multiple physicians). Once this is accomplished and a starter set of measures is defined, he said, then we need to propose a sustainable process for stewarding the starter set into a larger set of measures.

Discussion

The discussion opened with a specific question about the parameter that states that measures should reflect the spectrum of care. The participant noted that a separate group has developed system-level measures (i.e., overarching measures, such as time to appointment and drug dosages, that don't drive to a conclusion around quality). The process question, she said, is whether it is on the table to take into account the possibility of going beyond specific measures to system-level measures. Another participant echoed the remarks, noting that consumers won't think the process is meaningful if it does not encompass a more comprehensive viewpoint. Yet another person said that it was important to include a statement that the goal is to select measures that are very relevant to consumers. The most frequent health care quality assurance criticism, she said, is that we are generally not reporting data that are relevant to the average consumer.

In response, John Tooker said the subject of system-level measures was on the table—but for a subsequent discussion. Carolyn Clancy added that it was a question of timing. She stressed that the discussion at hand was only about a starter set, and that it needed to focus on achieving measures that can work in both small and larger practice settings.

Another participant expressed concern about conditions versus measures in view of the fact that the measures have not yet been developed by the workgroup. What we are reporting out today are the conditions, replied Tooker, with the specific measures to be selected later.

Is doing well on one measure a proxy that someone is overall a good physician? asked one participant. She added that it was her sense as a consumer that if a physician is noted for doing one thing well, then she generally thought people believed that the physician was overall a good provider.

Another participant said that it was important to make sure that the data reported were different from the measure in order to make it simple and cheap for physicians to produce the data they have.

One participant suggested a new parameter: that measures should reflect important health conditions and care processes for each group being measured. She observed that the starter set generally isn't relevant to children and adolescents, or for evaluating pay for performance under Medicare. In response, John Tooker stressed that the workgroup had a "deep, strong understanding" that it was just scratching the surface. We realized that we needed to address consumer and purchaser needs. We recognized that we needed to build measures, he said, so we illustrated measures based on those of the Physicians Consortium Group and the National Quality Forum. We recognize that this is not necessarily where we will end up. Our focus now, Tooker continued, is this: Can we find five conditions and measures that we can push out the door, which we can then build into a much more comprehensive, deep, and balanced set?

There was discussion about efficiency measures. One participant voiced an urgency to address them now. Another suggested that the measures that have been proposed are broad enough that many would have an impact on cost. Would it be worth adding something that says these "should include measures that would have an impact on cost and efficiency?" he asked. He also expressed concern that the measures do not address processes or systems within physician offices. Another noted that it was important to send a signal that the meeting attendees recognize that there must be quick and urgent attention to efficiency. Yet another noted that the measures are focused on what happens once a person seeks care, but do not discuss access to care. While this may not be pertinent to individual physicians, he said, it is very important across the spectrum of society. Another asked, If efficiency is part of Phase II of the discussion, what's the timetable for getting there?

One participant noted that the opportunity exists to look at where we have been traditionally and where we want to go. I think we need to look forward to push some of the social issues on standards, he said. While acknowledging the value in starting where strong consensus already exists, he stressed the need to do so with eyes wide open and to recognize the shortcomings of the process. We need to improve efficiency and quality of care, he said.

Concern was expressed that the process could be going down an unproductive path in looking at one set of measures, rather than a crosscutting set of measures, because different constituencies have different needs. Providers are interested in a measure set that is detailed enough to do something and allow them to take action, said one participant, while consumers are interested in the specifics of conditions they have and want to know about a particular practitioner. Payers, meanwhile, want to look at effectiveness and efficiency (which gets at pay for performance). What are the data elements that we can put together that are useful for one group, he asked, that can then be rolled up in different ways so they meet needs of other groups? In other words, he said, we need to think about how to gather the same data and then analyze it in different ways to address the needs of various stakeholders.

Following on the previous comment, a participant noted the need to look at developing starter sets where there are data, including system measures that consumers can understand. This could also help address the small-practice issue, she said, because there are enough patients across a disease to measure some of these issues.

Stepping back from the specific discussion, one participant said that those assembled had a unique opportunity to align private and public purchasing efforts. Step one, she said, was being informed by the National Quality Forum and the National Committee for Quality Assurance (NCQA) in order to move forward and start the process of quality measurement. This should give providers some clarity on how they will be evaluated. Step two involves determining which issues should be taken into account. Can we ask AHRQ and NCQA how this might be rolled out? she asked.

Carolyn Clancy said she thought it was important, as a short-term priority, to get very concrete and specific measures that can be incorporated for a variety of purposes. But this shouldn't get in the way of discussion of how we measure quality now, she stressed, and how we should be measuring it moving forward.

How measures are applied is very important, said John Tooker, who echoed Clancy's call to quickly develop and put in place a starter set of measures. At the same time, he said, it is important to place the starter set in a larger context and to develop a process that is sustainable over time. We are asking today for endorsement of the conditions and endorsement of the principle to develop a starter set of measures.

Motion: To adopt the general parameters:

That core principles be adopted, with an amendment to insert "condition-specific" into the title before performance measures.

Result: The motion was unanimously adopted.


We all agree that these are good principles, said one participant, but we cannot do everything right away. He added that it was important to address the question of timetables for moving forward.

Motion: To address efficiency:

That the principles documented be augmented to reflect the will of the group that efficiency measures should be added to the list of priority measures, and that measures of patient satisfaction should be added to bullet #8. The chair will determine the timeframe for moving to Phase II.

Result: The motion was tabled.


The motion was subsequently tabled, but not before considerable discussion. One participant said the language of the motion should be amended to encompass the patient experience (not just patient satisfaction). Another participant said that the aim was really patient effectiveness, patient-centeredness, and efficiency, leading the first participant to stress the value of measuring patients' perspectives on what is important in their care (as this affects patient decisionmaking). A third suggested that the workgroup address the question of how to arrive at a set of measures that respond to very urgent social issues.

One participant also expressed concern that the motion not be overly broad so that Phase II could be implemented at some level within 6 months. The speaker, who authored the motion, added that it was an attempt to make clear that efficiency was an urgent priority, and to address the larger bucket of a patient's experience with care. Implicit in this, he said, is that we move forward expeditiously.

We need to keep in mind the multiple audiences that will be using these measures, said one participant, who reiterated the importance of going beyond the patient experience to address crosscutting measures.

Once the motion was tabled, the discussion returned to current (Phase I) activities. One participant said the value of Phase I is to send a signal that measurement is coming, here's what we're trying to measure, and you had better be ready. But we haven't yet tried to grapple with other questions that need to be addressed at the individual physician level, she said. There needs to be a process that says: Here are the things you need to pay attention to so that measures that apply to you represent how you're doing as fairly as possible. Otherwise we will have severe pushback from those measured, she warned.

The key issue for me, said one participant, is how care is organized. The measurement will flow from that, she said, adding that she wasn't sure there was enough political will in the room to sustain the discussion. This led another participant to remark that the issue depends on the data you're gathering and how you slice and dice them. I think it's a lot easier to look at performance measurement for internal quality improvement, she said. The second speaker added that the overarching principle that she wanted to get on the table was that it was very important that the group speak explicitly to the appropriate uses and limitations of quality measures. Using these measures for pay for performance is where the rubber hits the road, so we need to be prepared to speak to this issue.

Next, the discussion returned to the issue of efficiency. I understand the need for efficiency measures, said one participant, but this is the area where we have the least evidence (especially at the physician level). I think our real goal should be to figure out how to operationalize measure-driven improvement, which will yield efficiency. We need to start with a direction, he said, and then get to a result. Another participant noted that from a consumer point of view, there was a difference between effective and beneficial. A third participant said she thought one aim of the workgroup was to come up with a long-term strategy for ambulatory performance measurement—thus, the discussion regarding efficiency and patient satisfaction was helpful.

Another participant noted that everything is focused on underutilization of services. Given that we are trying to limit the universe to the NQF and American Medical Association sets, he said, I don't think there is anything here dealing with overutilization. Do you think any of this discussion will wade into that element of efficiency? he asked. I thought that misuse and overuse were the next step after underuse, said another participant, who noted that there were several measures in the NQF pipeline dealing with overuse. He added that developing resource utilization measures at the individual physician level was not going to be easy (but might be more achievable at the group level in the short term).

Turning to the timetable, one participant suggested that the group develop a list of deliverables and ask the workgroup on performance measures to provide incremental updates every 6 months. He noted that little disagreement has been voiced about the parameters and the starter set, and that the discussion has focused more on the timetable and subsequent priorities.

Carolyn Clancy stressed that meeting participants were not endorsing measures, but merely fostering public/private sector alignment on measures in order to provide momentum for these issues to be addressed in the short term. Let's let the researchers figure out how to operationalize some of this, she said.

The discussion returned to some of the specific parameters for performance measures. One person proposed deleting language in the fourth parameter addressing "and in which the most improvement can be made" (but keeping the 80/20 language). As a consumer, he said, I'm not particularly concerned if everyone's not doing particularly well. I just want to know whether my physician is doing well or not.

In response, a participant said she thought the discussion really centered around two separate issues. One is the gap between where the field is and what's possible; the other, the gap between parts of the field. As a result, she said, I think the phrase "in which the most improvement..." is important.

Another person said she believed that the group should look at measures of efficiency, patient experience of care, and systemness. In addition, there's a fourth dimension that isn't being captured: that consumers really care about outcomes. Someone else suggested a parameter that says "look under the hood" and "what are the chances to do better?"

Still regarding the fourth parameter, one participant asked, What are the things that will lead to the greatest improvement? He said he thought it was important to discuss what to look at given the huge range of options. I think we need to focus on outcomes, he said, or at least make sure that there is a strong movement toward outcomes.

The discussion then shifted to the proposed starter set of conditions.

Motion: To adopt the proposed starter set of conditions for performance measures.

Result: No final vote.


There was considerable discussion about the proposed starter set, but no final vote on adoption of the conditions. One participant expressed hope that the workgroup would include advice about diet, exercise, and weight management in the starter sets on diabetes and coronary heart disease. Another expressed concern that prenatal measures were not included in the starter set. Comments were also made about including obesity (which is applicable to children as well as adults), and one person suggested that while the eighth parameter addresses acute care, none of the proposed conditions in the starter set does. I think we should have some measures focused on acute care, he said.

Carolyn Clancy pointed out that the workgroup today was seeking consensus on the conditions and would move ahead in the coming weeks to operationalize them. A member of the workgroup concurred, noting that the group's intent moving forward was to develop the supra criteria for the performance measures themselves, taking into account available data. The intent is not to use one type of records (i.e., only claims or clinical data) but to expand the use of medical data records.

Another participant expressed concern that the process could produce a set of standards that are based on administrative data and don't have the desired impact on efficiency and effectiveness. Carolyn Clancy reiterated that the group was not looking to develop only standards that could be assessed with administrative data. There is a consensus, she said, that some of these performance measures would likely require data from patient charts.

John Tooker reiterated that the workgroup has just begun its work. This discussion shows that we didn't have enough time before this meeting to go through the entire list of conditions identified by the IOM and NQF. For those conditions that we did include, he said, we asked, Where are there established measurement sets, and are those measures being submitted to NQF for expedited review for the initial set? Then we asked, If this is an initial list, should all the measures be reasonably reflected in administrative (claims) data? We have yet to have this discussion in any depth.

Carolyn Clancy also stressed that the workgroup was proposing a starter set as a jumping-off point for discussion and a place to start to build a process moving forward. Another workgroup member echoed her comments, saying that the group's intent had been to develop a reasonably robust starter set, obtainable mostly by administrative data. More long-term, he said, we're looking at data that can be collected in the future (and not just at what we can collect now using administrative claims data). A third workgroup member noted that it was the group's intent, following input and approval of parameters, to develop the supra criteria. The group will then take these to propose a starter set, and then propose a sustainable process to transition from administrative to electronic health records-based measures, as needed.

One participant observed that a good number of the standards would have a hard time making it through the NQF review process based on the evidence and supporting documentation.

There was considerable discussion in the workgroup about an 8-week post-meeting horizon. One participant expressed the need to tie down a date for the next meeting of the full group. What are our expectations with respect to Phase II? she asked. Is this a 2-month process? Four months? Six months?

We'd like to work collaboratively with the larger group, said one participant, but that process needs to move forward. Our CEOs and constituencies are putting a lot of pressure on us and we're not going to slow down.

What is the deliverable we need, and when do we need it? This is a question that is applicable to all the workgroups, stressed one participant, and we need to answer it in order to have an impact. Otherwise, he said, we run the risk of taking too long, or perfecting it too much, so that the trains are already out of the station. A second person echoed the call for urgency, saying that the measures would otherwise come down in the form of higher deductibles and other cost-cutting measures. Others echoed the need for speed.

The conversation returned to the issue of efficiency. One participant noted the need to pay attention to efficiency and to give people around the table a belief that the process will give them what they are looking for. The starter set is wonderful, said another, but since it does not address efficiency, it will not get us to the end point. Mark McClellan also mentioned efficiency measures, pointed out a third speaker. It's part of the overall approach, not just cost containment. We must remember that this has to be somewhat science-based, she added. We really need to be sure that when people say "efficiency," it is not code for immediate cost containment (and double-digit increases in premiums).

We cannot afford to stop and wait for measures, said another person. Health care now accounts for 15 percent of gross national product, and we must do something about health care costs and liabilities. Individual companies are competing with foreign companies that don't have these problems. We have to improve efficiency immediately and we have to do it right. Another suggested that a strong signal was being sent by the group that, as a whole, it believes that efficiency performance measures are very important.

The real tension, suggested another person, is between the very real and heartfelt need to move much faster than we've moved in the past on performance measurement and feedback and the need to not abandon the principles of science. This, he said, is the core dilemma. Are there ways to bridge the gap between the need for speed (which is very real) and making sure all the work doesn't fall apart because we're going too fast?

Carolyn Clancy noted that there were no efficiency measures endorsed by, or pending endorsement at, the National Quality Forum. She said the workgroup did not see its job as creating new measures but rather working with existing ones. This led another participant to note that the Leapfrog Group had released a white paper on guidelines for efficiency that might be helpful in this process. Another person asked that the workgroup be directed to readdress efficiency now, not in Phase II, especially in the diagnostic arena.

John Tooker indicated that the workgroup was hoping to come back by the end of April with a proposed starter set of measures. The intent is there to do this, he said, but our success depends on how well we can get moving, given that you've added new direction (notably on efficiency standards) to our work.


Previous Section Previous Section        Contents         Next Section Next Section


AHRQ Advancing Excellence in Health Care