[This Transcript is Unedited]

NATIONAL COMMITTEE ON VITAL AND HEALTH STATISTICS

WORK GROUP ON QUALITY

November 3, 1999

Wyndham City Centre Hotel
1143 New Hampshire Avenue, NW
Washington, D.C.

Proceedings By:
CASET Associates, Ltd.
10201 Lee Highway, Suite 160
Fairfax, Virginia 22030
(703)352-0091


List of Participants:


TABLE OF CONTENTS


P R O C E E D I N G S (6:10 p.m.)

DR. COLTIN: This is the National Committee on Vital and Health Statistics. One of the objectives of the work group is to produce recommendations for the Department of Health and Human Services about data and information systems requirements for the collection and use of quality of care information, including mechanisms for improving the completeness, accuracy, timeliness, confidentiality and efficiency of data collection and reporting, in order to support quality measurement and accountability.

So we have a broader charge than that, but the main thing that we are trying to solicit testimony on tonight is around that issue, of what are some of the limitations in the kinds of data systems that we currently have in place for producing quality measures that are comparable across populations, across delivery systems and so forth, and what have people encountered in their experience in trying to do this, what kinds of limitation. So that is why you have been invited here.

The first thing I'm going to do is have us go around and introduce ourselves, because they are making a transcript, and it will also help with getting the voices, which voice belongs to which person. Then after that we'll start.

Jeff, why don't you start?

DR. BLAIR: I'm Jeff Blair with the Medical Records Institute. The other activities I am involved within the NCVHS are the CPR work group and the NHII.

DR. COLTIN: For those of you who may not know it, that is the National Health Institute Infrastructure.

DR. BLAIR: I'm sorry. Thank you.

DR. COLTIN: I'm Kathy Coltin, and I chair this work group. All of you know me, but I guess for purposes of the transcript, let's just say I'm with Harvard Pilgrim Health Care, and I have done work over a number of years on the development and implementation of quality measures.

DR. KRAMER: I'm Andy Kramer, and I am a professor of medicine at the University of Colorado Health Sciences Center, and I am a relatively new member of the committee.

DR. WARD: Elizabeth Ward from Washington State. I am currently with the Health Information Institute.

DR. COHN: I am Simon Cohn. I'm the national director for data warehousing for Kaiser Permanente and member of the committee, and I am chair of the Subcommittee on Standards and Security.

DR. GREENBERG: I'm Marjorie Greenberg from the National Committee on Vital and Health Statistics, CDC, and I am the executive Secretary for the committee.

DR. ETTINGER: I'm Stan Ettinger, Agency for Health Care Policy and Research. I am staff to the committee.

DR. GOH: Leon Goh, Health Care Financing Administration, and I am on the staff.

DR. KLAUSER: Steve Klauser, Health Care Financing Administration, and I am here to give some perspective.

DR. COCITAS: I'm Carolyn Cocitas, Performance Measurement Coordinating Council, and here to share, or really learn from you all probably, at this point.

DR. SMITH: I'm Mark Smith from the California Health Care Foundation.

DR. DIXON: I'm Richard Dixon from the Lewin Group in San Francisco, also a member of the CPM, and here to share with you some information about evaluation that we have conducted in California.

DR. SEIDMAN: I'm Joshua Seidman. I am the director of measurement development at NCQA, the National Committee for Quality Assurance, and I am responsible for the development of measures for HEDIS.

DR. MC GLYNN: I'm Beth McGlynn. I am the director of the Center for Research on Quality Health Care at Rand, and I am a member of the CPM, and I also am now the chair of the technical advisor group to the CPM.

DR. COLTIN: Thank you. We are going to get started. The first group of speakers, at this point, Beth and Josh, are going to talk to us about some of the broad issues in the limitations of data for developing quality measures and implementing them, that they have encountered in their efforts to try to do this.

So Beth, why don't you start, and we'll let you introduce John after your remarks?

Agenda Item: Issues of Limitations in Existing Data Sources for Developing Quality Measures

DR. MC GLYNN: I am pleased to be here to talk to you. I guess I want to start by saying, I think in a couple of months -- we are engaged in a new research effort to develop a much more comprehensive system for assessing the quality of care delivery in what I now call an accountable entity of your choice. When we started, it was managed care organizations, and now it is transitioned to including not only managed care organizations, but medical groups and communities, actually.

What we have done is to develop a system of roughly thousand indicators that cover 50 different clinical areas, and include care for children, young adults and older adults. Then we have a sister project that is developing a similar system in areas for the vulnerable elderly. So it is from that perspective. Then we through AHCPR have been funded to try to work with NCQA to develop new measures for HEDIS. So it is those two developmental efforts that are the principal perspective that I am speaking from. We will shortly have numbers to put on some of what I'm going to say, so it feels in some ways a bit general, but so be it.

I think there is a substantial tension between what it is we would like to know about the health care system and what we are able to know, given the limitations of costs and logistics. I think that almost every talk I give about this has a bullet about information systems being the limiting factor.

I think that it is particularly the case when we are looking to develop measures related to management of chronic disease. That is where one feels the lack of detailed clinical information the most acutely.

What that has largely meant is going into the medical record which for a variety of reasons is getting increasingly difficult, or conducting surveys which also has some challenges. So finding ways to improve what information is more readily is critically important.

Further in some of the work we have done, it is clear that contracting mechanisms, the way in which we do the business of health care, substantially complicate these efforts. That is not only capitation; it has to do with provider delegation. There are a lot of issues around just the way in which contracts are written for service delivery that make then access to data that much more difficult.

I think that I would really like us to stop looking under the lamppost. I think we have done a lot of quality measurement, where we measure what we can measure. What I would like, my vision for the future, is that we measure what is important, and that we not get held up by what is possible to measure.

So I think that is how I view the general sense of the problem.

Now, in terms of data, I started trying to think of three categories of data from the perspective of how do we move the information for quality forward. The three categories are: data that are already automated and readily accessible, data that are automated but not easily accessible; somebody has it on a computer somewhere, and then data that aren't automated.

An example of data that are already automated and accessible tend to be billable events, things for which a claim exists, or in an encounter form, but really, I think we're talking about claims. So visits, procedures, drugs, whether a lab test was done sometimes, so measures like the screening rates for mammography and Pap smears, to some extent looking at the rate of beta blocker use after heart attack, those are the things we have been able to do because we have these really accessible data.

The cautionary note is that even with readily accessible data, we continue to have concerns about the validity of a lot of those data and the variability. So does this code for asthma mean the same thing at Harvard Pilgrim that it means at Kaiser. Often we don't even know the answer to that, and our temptation, like our temptation to generalize from clinical sciences. If one place found that their code worked pretty good, then we conclude the rest of the world is using them well.

So I think that what I don't want us to lose sight of in the effort to automate is an inclination I see among a lot of people, which is, if it is on a computer, it's right.

The second category of data are data which are automated but not easily accessed. I think that in this category, things like laboratory and radiology results. So these are -- somebody has them on a computer somewhere, and often how they are sitting out there in the world is the printout from the computer and readily accessing the automated information is difficult.

We have done some work under the QSPAN effort to try to develop measures of the timeliness of followup after an abnormal Pap smear and an abnormal mammogram. The single biggest challenge, when we started those measures, we had six health plans each that had agreed to participate in testing the measure.

We ended up with two health plans -- that would sound like four, but one did both measures. Only three plans were actually able to produce data. They were able to produce data under very limited circumstances, meaning that we didn't require them to go to every single vendor they had for those services. We picked the largest vendor and worked with them.

So the ability to do that measure readily relied on our ability to identify which women had had an abnormal mammogram or Pap smear. A side issue with that has to do with, now that we have done the work, there is some disagreement amongst the plans themselves about whether the way in which we -- the field we chose to lift to characterize a mammogram as abnormal, whether that was the right field to pick, even though that is what other people had done in the past.

What we found was that this was a good example of where contracting issues may make access to the data very difficult. Health plans in many cases didn't have access to the data because the contract was between a provider and a lab, and the health plan wasn't a party to the contract, and so they could not capture the data.

In some instances, the way that the capitated contracts were written made it impossible to track an individual's experience through the system. So they knew they had done 20,000 Pap smears, they knew that 10 percent of those were abnormal, they couldn't necessarily tell you that Jane Smith's Pap smear on October 15 was abnormal, and that we could then go look for another again to determine if she had had appropriate followup.

So in addition, there are issues with these data with these data. I think getting access to these data are important, but there are going to be continuing concerns about validity and variability. But these seem like a class of data that are quick wins to me.

Then there is the class of data that are not automated. Good examples of this are blood pressures, history items, counselling and education, the kinds of things for which there aren't specific procedure codes, but they are really key events when you are looking ta the diagnosis and treatment of chronic disease.

So I think that those require quite different strategies, because we don't really -- they are a classic case of an important part of the interaction between a doctor and her patient for which a unique bill is not generated, so they require a slightly different strategy to think about how we might capture.

So those are categories of data. I think to lay across the top of that, the other challenge we have is discontinuities in the availability of data that are based on insurance coverage patterns.

We are doing some work right now to develop a new measure, test a new measure on colorectal cancer screening. This is characterized by a couple of things -- wildly varying and long periodicities of the various things that count as screening. So a colonoscopy in the last 10 years, or occult blood test this year.

So the ability to use automated data to do a look-back of 10 years relies on the individual having been enrolled in a health plan for that length of time. This is actually true any time we get past about a two-year look-back. The high rates of turnover, particularly in health care populations, mean that our ability in an automated form to access the data is really quite limited. I think that that kind of challenge is separate from what is automated, but it is equally limiting in terms of one's ability to characterize a longer term pattern of care for an individual.

What do we do while we are waiting for Godot? I was actually -- and Godot in this case is the computerized medical record, which will solve all of our problems. I was at a meeting recently at the IOM, where George Lumberg said, the '60s were going to be the decade of the computerized medical record, and then the '70s and the '80s and '90s. Maybe in Y2K we will have the computerized medical record.

I think that I myself am tired of making that the holy grail. So I think that there are strategies that we ought to seriously consider taking, which don't run us at odds with that. Certainly, if we are able to achieve that, isn't that fabulous, although I could have some other comments about what I think the utility of that is.

But what do we do in the meantime? First of all, I think we need to sit down and decide what it is we need to know and at what levels we need to know it. So it is basically to do a priority setting exercise. In a lot of ways, the quality assessment system that we have developed is our way of doing that. It says, here are a thousand things we think are the most important things to know about medical care. Here are the -- I won't even tell you, X thousand variables it takes in order to score that. We can circulate those in terms of which ones can we pull up in automated data today, which ones could be pulled off if you had access to things like laboratory and radiology results, and which things for now are going into the medical record that you might be able to get.

That doesn't have to be your list; I just think there needs to be a list. I think that one of the failures of a lot of automation in the past is the sense that if we just computerized it,it will therefore be useful. I actually think that is the wrong way to go.

I think we really need to do what I tach my graduate students to do in all their research projects, which is to start by saying what are the questions you are trying to answer.

The second is to work on the validity, improve the validity and reduce the variability in the existing data systems we have. I think there are a lot of efforts underway to achieve that, and I just want to be supportive, and not let that fall off the radar screen, in terms of the importance of making the data that we do have available today more useful for quality assessment. It is really trying to make data that we had in the past used to pay bills, it acknowledges that it now does double duty, and making sure that it is capable of doing both duties.

So that is one step. The second step is to really work on aggressively expanding our ability to get to automated data that aren't currently readily available.

Kathy and I have talked a lot over the last two or three years about, if we could just have laboratory radiology results. There are a huge number of things that in chronic disease management we can just make huge strides forward if we just had those data.

When I talk to people about the opportunities to change these contracts, I think one of the challenges is not so much that the contracts can't be rewritten, but what user's price will be charged by the laboratory vendors to provide access to those data. I think that some leadership around this would help substantially in moving us forward.

Then finally, I think we need to design mechanisms to prospectively collect some of these additional data elements that aren't right now readily available, nor is it easy to imagine quite how one might get access to them. I think that there are some creative ways we might think of. Kathy and I were talking last night about claim attachments, but one might imagine for instance that in every encounter, a blood pressure should be taken, and isn't there a way we can just enter a blood pressure from an encounter prospectively. I think to capture it in an automated fashion, we are talking about picking a few things, and not necessarily on every encounter. It might be driven by, this is a visit for diabetes, so now we want to know did you look at the feet, that is a checkmark. But there ought to be a few things that we could pick to automate.

I think if you have a list of questions that you want to answer -- and we have been doing some work on this, it turns out that there are sometimes some elements that repeatedly come up across a variety of indicators. So you often want to know something about the neurologic exam, for instance, whether it is a patient presenting with a headache or low back pain or whatever. So there are some elements like that that may cut across multiple areas, and maybe those are the high priority areas. But those are some of the things that I think should be on the list as well.

I think that we are at a very exciting point. I have been working in quality measurement for something like 14 years now. When we started, we had to spend all of our time convincing people why it was important to think about measuring quality. I think those battles don't need to be undertaken. Now we need to find a way to make quality measurement relevant. I think our ability to access the right kinds of data to produce answers to the questions that people want answered is going to be absolutely critical in maintaining the enthusiasm that is currently afloat in the country for this kind of information.

Thanks.

DR. COLTIN: Thank you. Josh?

DR. SEIDMAN: Thanks. I will just talk about some examples for the development of measures, and focus along the same lines as Beth did, on measures for cornice disease, because I think that is where the biggest issues come up.

HEDIS when it first developed earlier in this decade was focused primarily on process measures in preventive care. There is just a lot of data inadequacies there as well, but compared to the issues involved in getting data for chronic disease measures, it is a completely different story.

The first issue has to do with vital linkage capabilities, and the fact that we have data in some cases that are easily accessible, and in some cases that are not, as Beth says, but they are often in different places. If we have an administration measure that relies on multiple administration databases to collect it, it creates some problems in terms of how does that information get married.

So you might have a measure for example on asthma, and we need a certain basic information, demographic information, membership database, then we need claims or encounter information from claims data or encounter data. Then you have information on drug utilization that might come from a pharmacy database. All of those things may be coming from different places, and we hear over and over about the challenges in creating those kinds of measures, because the data are often stored in different ways.

The other way that this comes up, the vital linkage issue, is that we have data that are stored in different levels of the health system. So we have data for example in our beta blocker measure. Some of the data is coming from physicians' offices, some of the data is coming from the health plan, but some of the data is coming from the hospitals. Again, all those are coming from different sources, and marrying those data is a challenge.

There is a big issue in terms of chronic disease in how we define chronic disease, levels of severity, how intense the disease is. That of course makes a big difference in terms of what we want to see, how we would define a performance measure. For mild disease, we may expect a different kind of response from the health system than for more severe disease.

So for example, with asthma, we are creating a measure on appropriate medications for people with asthma. We had a number of data challenges. The first is defining who has persistent asthma, that is, asthma for which we would expect long term control medications to be prescribed. There is no administration way to get at the underlying clinical severity of the disease, and that creates a big problem in terms of defining who is the appropriate population that we want to apply this measure to.

That issue also comes up again when we are looking at some of these data sources. Even once we define who the population is that we want to measure, we may still have different expectations for the intensity of the treatment and therefore the process measure that we are looking at. So for example, with asthma, we have patients who have mild persistent asthma, moderate and severe, and we would expect different numbers of medications to be prescribed. Then we have to get back to the issue of how to combine this kind of information and how to measure not only that the medication was prescribed, but how much of it was prescribed. Those things aren't standardized, either. One setting may not be equal to one prescription from another setting.

There is a big issue in terms of capturing cognitive and diagnostic information in health plan settings and in other settings. There is a certain amount of information that we would like to be able to get from administration data that we can't on these kinds of things. Is counselling taking place, what kind of information are patients getting to manage their own disease? Because we can't get that information from administration data sources, we have to go to other sources, which typically mean developing surveys, which again can crete an additional burden in terms of those who are responsible for collecting the data, and coming up with other creative ways or sometimes not so creative ways of trying to find that information.

There are issues about -- in terms of chronic disease about trying to identify incident versus prevalent disease, that is, new develop or incident disease that we may have a different performance measure that we want to look at for a disease that a patient has just gotten, and a disease that a patient has had for awhile. It is very difficult to tell if a patient is newly diagnosed.

This comes up in a lot of diseases, certainly in the area of cancer. That would be a good example where obviously there are certain things that you would expect to happen within a certain amount of time when someone is diagnosed with cancer, but we don't really know exactly when that is happening, and it is difficult to separate those things out.

There are certainly issues in terms of the maintenance process, that is, that we have a lot of pressures in terms of producing data in a timely way. We want to produce information that helps purchasers and consumers to make decisions about the health plans that they are selecting. That is very time sensitive information. If that information gets to purchasers too late, then that information isn't very useful.

Right now, for example, there are a lot of purchasers that are going through health plan selection decisions, and their employees are going through open enrollment periods and trying to select which plans they are going to use for the following year. The process for that data, and getting to them so they can make those decisions involves many steps, and any delay in terms of changes that are made to our specifications and so forth affect that process. It is a long chain reaction of events.

So when codes change or our own specifications change, that throws a wrench into the health plans' data collection processes. They have to reprogram and go back and do the administration process again, and then from there go through the rest of the process. As CPG codes change late in the year or as other things come up, change even in clinical evidence that might affect the kinds of medications that we would include or new drugs are produced, and we have to change our MBC code list, all those things affect the algorithms the health plans are using for these data.

Then the last thing I'll mention is the dependence on surveys for a whole host of information. It is really very important in terms of making decisions about outcomes. Certainly there are a lot of things that we would like to know about functional status and how people are coping with disease, what their ultimate outcomes are in terms of how they are dealing day to day.

A lot of these things we just can't tell through administration data, and ultimately we would like to be able to do that. Until we can, we will have to go to other more burdensome or more expensive sorts of things.

DR. COLTIN: Thank you. Why don't we take questions from work group members at this point, and then we'll go on to the next couple of presenters. Does anybody have any questions for Beth or Josh?

I actually have one that really just came up today that all of the people here were involved with. It is one that hadn't really occurred to me before it actually happened. That was the question around standardization of survey items.

We have been very wrapped up in standardization around administration data, and a lot of the concerns that we heard from others and are hearing from you as well as other perspectives that you brought to the table. We were looking at creating this measure of pneumonia vaccination rates, and looking at the fact that the question is worded differently in the national health interview survey, the behavioral risk factor survey, the Medicare current beneficiaries survey and Medicare CAPS. Trying to reconcile and say, can we all ask this question the same way, so that we can have national estimates and state estimates and estimates for health plans and for physicians, and so forth.

I think it is an interesting issue that we really hadn't thought about, but it came up in the context of the meeting. I didn't know if any of you wanted to offer another perspective on that, and how important you think that is as an area where we as a committee might perhaps try to make some recommendations for the department about standardization.

DR. SEIDMAN: I'll just say a couple of things. First of all, I think that while the primary use of HEDIS is to offer information for health plan to health plan comparisons, certainly there are a lot of people who are using it for other purposes as well.

One of those is to try to get a sense of how managed care is performing through other types of delivery systems. That was a way that that issue played out at this meeting, because we can create a measure and we can use any survey question we want, and it will be more or less standardized from health plan to health plan. But obviously there are Healthy People 2000 or 2010 goals that we are setting, and there are certainly out in the public about how managed care is performing relative to fee for service, and these data are being collected with a slightly different question. We don't have a lot of confidence that we can compare those data, and we would like to be able to do that.

So I think that is something that we are always trying to think about as an ancillary goal of HEDIS, to provide information that can be used for other purposes as well.

DR. SMITH: I have a slightly more aggressive suggestion for this committee. I think standardization is absolutely essential. I think there are probably ways in which the department unwittingly promotes proliferation of different standards. These are two different surveys by the same agency; that is pretty obvious.

But in a slightly more subtle way, I think it ought to be a priority to move towards standardization. I'll give you an example. These days when an NIH RFP is put out, part of the boilerplate at the bottom to which respondents have to respond is, since racial disparities are a problem in the department, how does the proposed research serve to shed light on the issue of racial disparities.

I would argue that maybe at not quite the same level, but in some way, there ought to be some effort to look at -- because the department through AHCPR, through HCFA, through other agencies, stimulates a lot of research.

I'll give you an example. We get proposals all the time, and I know the department has funded some proposals for researchers who are interested in testing the applicability of these tools in nonwhite, non-commercial populations. The buzz word is cultural competence. We have seen some proposals funded to develop new questions that supposedly test the cultural competence of health plans and physician groups and others.

Now, I understand that. It is in the interest of researchers to get research funds in order to invent something new. I'm not trying to be disparaging, but that is the thing we are interested in. We just started a project to look at standard questions like the ones on caps and value check and say, of all these questions, let's not be developing a new instrument. Let's start by saying, of all these questions, which ones perhaps are most germane to the issue of cultural competence. So how quickly does the phone get answered is probably not, but are you treated with dignity and respect, probably is.

Could we construct a proxy for cultural competence out of the existing standardized questions, as opposed to proliferating yet another instrument to ask in Latino populations or low income populations. So it seems to me to the extent that the department is the source of both authority and prestige and to some extent funding for innovation in this work, that setting an explicit priority on developing and refining the standardized instruments, as opposed to going off to create something innovative and new which is the centrifugal incentive of researchers, would be one thing that the Secretary might do to help solve this problem, in addition to having that same victim for the agency itself.

Is that clear?

DR. COLTIN: Yes.

DR. LUMPKIN: I actually have two questions, and the first one is probably simpler. I kind of have the impression from what Josh was saying that --

DR. COLTIN: John, can you speak up?

DR. LUMPKIN: I kind of got the impression that you referred to the questionnaire, the survey data, as being required because we don't have good access to clinical data. My question is, in your conceptual model, would you say, if we had good access to c data, that we could measure quality without doing surveys?

DR. MC GLYNN: No. I would say no.

DR. SEIDMAN: Oh, I thought you were about to ask a different question.

DR. MC GLYNN: Actually, it's funny. I wrote down a note to myself, if we had an opportunity to come back around, about things that weren't clear. This was actually one of them. I actually think that if we are thinking about quality measurement, there are a variety of appropriate sources of data.

When we designed the tool I mentioned in passing, we designed it to focus on clinical information. What we said is, there is a class of information that we are never going to get from the clinical record, automated or not, because that is not the appropriate source.

So I think that there are a variety of things. They start with the patient's experience of care, but honestly, it is quite a bit broader than that, where I think the individual is the correct source of the information, and there probably isn't a way around that.

What I think is challenging to think about when we are in this automated data era is that surveys often mean a couple of things which I think technology should be taking us out of. They tend to mean samples, and they tend to mean periodicities that are infrequent. They also tend to mean slow access to the information.

I think increasingly, as the Web proliferates, we ought to be thinking more creatively about how we routinely get information on populations that are -- the kind of information that only an individual can provide about themselves, that we wouldn't expect to find from any other source. We ought to be thinking creatively about how we get that level of information more broadly into a health information system.

It offers a lot more challenges around confidentiality, probably around bias in terms of who is likely to respond. But I do think that there is a class of information -- one example I often use around that is patient education. We can find a note in the chart, talk to the patient about glucose monitoring.

But really, what I want to know is, did the patient hear that instruction, did they understand what to do, did they have the right mechanisms in place to make adjustments as their level of activity changes, or whatever the thing is. So it is really not, was the thing provided, as much as, did the patient understand what they were supposed to do, and can they participate actively in this care.

So the real quality measure is -- or one real quality measure might be what the patient's level of understanding is, and you are never going to get that through administration data or even a good clinical record. You are really going to have to get that from the patient's perspective.

So I think there is always going to be a role for patient generated information. I think it takes some creative thinking to figure out how to access that kind of information a bit more routinely, and maybe not linked to insurance coverage. There are things about an individual that are often important to know that don't change a lot over time. Some of the family history stuff doesn't change as dynamically as your blood pressure this week.

DR. SEIDMAN: I don't disagree with that at all, actually. But what I would say, building on it, that to the extent that we can, those things that we feel we can get through other data sources, we should push to try to get there.

There are a lot of potential problems with doing more and more surveys, and Beth talked about a few of them -- the infrequent periodicity, the sampling issue, which gets even bigger when you think about what I guess for a lack of a better term I would call survey exhaustion, where we are exhausting the samples of health plans and all other kinds of people in terms of all the questionnaires that are going out and are being fielded.

I have no data myself to suggest that the correlation is because of this, but I think that we are seeing a lot lower response rates than we would like, and I don't know if there is a relationship or not. But certainly that brings up a whole set of other issues when we are getting lower response rates than we would like, we obviously have to worry about non-response bias, and that affects of course the validity of the data.

Then of course there is the issue of cost. There are things that we can be collecting through administration data that we are now collecting through survey. It just means that there are a lot fewer pieces of performance data that we can be collecting overall.

DR. MC GLYNN: But just as an example of creative, most of us when we go to see the doctor spend more than a minute in the waiting room. If an office was wired so that a patient was online, completing a history form, or online completing a complaint form or something like that, entertain them while they wait, that is a way of routinely accessing information that we might otherwise go to the medical record for, which is probably much better gotten from the health, may even improve the efficiency of the encounter, because the providers would have that information in front of them.

I think there are things like that that we ought to have on the planning horizon in terms of who is the source and how do we most efficiently get at some of this information, and in our march to wire up doctors' offices for the 21st century.

DR. KRAMER: Just a real quick question. Beth, you talked about when you tried to obtain some of your second type of information that was administration data, it was difficult to access. You tried in several organizations and only a couple of them would come through.

Can you just say what the characteristics are in places that are able to do that kind of thing? Do they represent more models of the way we should be headed, in terms of being able to access information from multiple sources?

DR. MC GLYNN: They were staff and group model HMOs that had a small number of vendors. Just to give you a contrast, one of the plans we talked to initially had two lab vendors that processed their Pap smear, and another plan in Florida had 100 vendors. And the vendors each did five percent of their business or one percent of their business, or whatever.

So the Herculean task of them trying to get access to an adequate sample, and having to go piece by piece across all those vendors was incredibly difficult. One of the places had an in-house lab that did about 60 percent -- that processed about 60 percent of their Pap smears, and that is the place we started.

DR. KRAMER: So there is an issue here of scale.

DR. MC GLYNN: Some of it is scale and some of it is how the contracts are written.

DR. KRAMER: So that is not something you could really apply.

DR. COLTIN: We're running a little late.

DR. MC GLYNN: Can I just put a marker in? The other type of data that I think are out there and difficult to access, which given what I was doing today cracks me up, but I forgot to say with this committee, and that is vital statistics information.

We also had a similar experience with birth certificate data to do some analysis for birth weight and prenatal care, and the rules in the different states in terms of how you could get access to that data, ranging from no problem to would you like the file next week, to, you have to get permission from every single mother that you might want the data on. So that is information that has a lot of potential to mine that is very differential in your ability to access it.

The data that we did look at also had -- which I think is one of the better data sets in the country -- had a lot of really serious missing data problems on variables we care deeply about, and are highly correlative with the outcome that we are interested in. So while those data are certainly a much better source of information and some other data sources that we have, what they required to make them really useful is some -- and it looks like the missing data problem was at the institution level, so very differential missing data rates by hospital. This was smoking status on women who delivered, and the smoking status was missing much more frequently in low birth weight mothers than normal birth mothers. We actually modeled that in and had a higher odds ratio and logistic regression than whether the mothers smoked or not.

But I do think that there is an untapped potential in vital statistics information and with some leadership we might have a better opportunity to access some of that information. But those data sets also need some improvement. There is also a lot of inconsistencies in the content reports on the vital statistics records.

DR. COLTIN: Now we are going to move on to hear a little bit about what has been going on in California with respect to data quality and quality measurement.

Agenda Item: Experience with Data Limitations in Quality Measurement in California and Current Initiatives Around Data Improvement

DR. DIXON: I'm passing around a handout, and I am only going to show about three overheads, but I put in a couple of other illustrations as context for what I'm going to talk about.

I think ambulatory records are becoming increasingly important, ambulatory data sets. These record systems which as you know are designed primarily to get reimbursement for services rendered are becoming our primary source for predicting -- estimating disease occurrence, evaluating quality of care, even doing work force projections, have a major influence on research priorities and public policy.

I think as more and more care moves to the ambulatory care area, the role of these kinds of records are going to become only more important. So what I want to do today is share with you some observations and some research that we have conducted in California to look at how good are the data as they wend their way from the physician's perception down to the administration data set, which is in a health plan.

It is a circuitous, complex route with many alternative pathways, and many opportunities for additions, subtractions, modifications in the record. So what I want to share with you is some of what happens to the data as they go from point to point.

The background for this I think adds some context to some of Josh's and Beth's comments earlier. California has a long tradition of collaboration, looking at health care quality. From the very beginning of HEDIS, we decided that it would be inefficient for health plans to go out on their own and collect data independently. So we formed a collaborative organization of equal governance of plans, provider organizations, IPAs and medical groups and purchasers to collect HEDIS data using a single standard method, single vendor, to go out and collect all the data. This is called CCHRI.

That continues, and it has been I think a very efficient way of collecting HEDIS data. It has also produced some very important standardization.

After two or three years of this effort, however, I think most of us who were very involved in that process became quite frustrated. After three years, we were still spending millions of dollars to go out and do medical chart reviews to collect data that ought to be parts of administration data sets. As we looked at it, we thought, gee, this is enlightened self interest says that providers and purchasers and health plans ought to share data with one another. Everyone will benefit if data can be shared, because we can all do ours jobs better, and how one ever possibly expect managed care if you don't have laboratory data if you are a health plan, if you don't have pharmacy data if you are a provider organization. How can you possibly manage care effectively?

So in December of '96, we called a summit conference, 60 people, basically 20 of the leading purchasers, 20 of the CEOs and senior management of the health plans, and representatives of major provider organizations, hospitals, physician organizations, et cetera to talk about, hasn't the time now come to begin to do something about improving our access to data. Doesn't enlightened self interest tell us that we need to share?

Everyone bought in. Everyone signed pledges that yes, we are going to start sharing data, and everyone signed an agreement saying basically, the problem is not technological. The technology is a problem, it was coming along, many of the technological problems are going to be solved. It is basically a problem of will and political agreement. We started a process to try to get political agreement among all the stakeholders to arrange the agreements about sharing.

We discovered very early that there were a lot of proprietary concerns. Health plans for example had pharmacy data, and since they paid for prescriptions, it was their data, and they were very fearful of sharing those pharmacy data with provider organizations, because they were fearful -- one of the reasons might be that physician organizations would take those data and do an end run and directly contract with health plans. Providers were very fearful of sharing full encounter data with health plans, because those data would be used against them in contracting discussions.

So very quickly, the political concerns came to the forefront, and we developed a whole process to try to develop rules of exchange and be able to encourage that sharing. Let me just say, that was a COLINKS project that was funded in large part by the California Health Care Foundation. We have achieved some small, but I think very important progress in getting some of that sharing.

We have physicians organizations for example that are getting electronic laboratory data. We have physician organizations that are getting pharmacy data. We are sharing encounter data in a much better way between health plans and provider organizations. So I think that there are technology problems, but there are also big political and business reasons why we have trouble sharing data.

That is the context. As a part of that effort, we also wanted to explore some of the conventional wisdoms about quality of data. I think there are a number of assumptions. In a capitated environment, which California is, as I think most of you probably know, health plans capitate physician organizations, IPAs and medical groups almost entirely throughout the state. Not only capitate them, but delegate most clinical and administration management functions to those provider organizations. So provider organizations pay claims. They are the first recipients of encounter data. They process the data, and then typically those data go on to the health plans.

So a clinician's observation can go a variety of ways on this coding process. Occasionally the data go directly to the health plan, but usually through an IPA or a medical group, sometimes to a clearinghouse, other times directly to the plan where the data may have to be coded, go through some sort of system, whatever, until it gets to the administrative data set.

We also conducted several other projects to try to understand better the quality of the data. We looked at a random sample of 400 physician offices, non-Kaiser physician offices throughout the state to see what their data processing activities were. We did a census survey of all IPAs and medical groups in the state to see what their data handling processes were, and we also conducted a study called CAMUS, which looked at the completeness and accuracy of medical records. I'm going to spend most of my time talking about CAMUS, because we tried to look at what happens to clinical observations at these various points. We have some data which I think are very interesting and which cause us to question that conventional wisdom.

What are my conclusions? Well, at the ends of these studies, I think we have concluded that both physician offices and provider organizations in California seem better structured to manage indemnity insurance than to manage care and costs.

Many of the procedures, even in a managed care environment, seem designed around the fee for service environment and paying claims, and that produces some real problems in terms of the way the data are handled. It is clear that IPAs, groups and health plans get information that is incomplete and inaccurate, and furthermore, groups and IPAs employ practices that further reduce the quality of the data. It is no surprise, therefore, that the HMO data are poor, but at least I believe that delegation and capitation is only a small part of the problem. Contrary to conventional wisdom, which says this is really a problem of delegation and capitation, I believe that it is a more central problem in terms of ambulatory care claims.

Let's look at the CAMUS study. This was a very hard project to do. We found many of the same things that Beth found. Many of the organizations, both health plans and provider organizations, really wanted to participate in this project, and volunteered and tried to participate, but could not marshal the resources, could not find the data in order to meet the relatively minimum requirements that we had to participate. We originally invited 16 physician organizations to collaborate; we had seven or eight that committed themselves and tried to participate. We ended up with only three IPAs or medical groups in the state out of those 16 that we invited, who did participate.

A few comments about those groups. This is a pilot study, not randomly selected groups. We did choose the best and the worst. We probably show some of the better IPAs in the state, however, and we tried to choose those that reflected the diversity in the state. So we had one well-integrated, well-established medical group. Had centralized medical records, which is common among integrated medical groups that bear risk in California, but in addition, it also had a well-trained medical records librarian who was in charge of medical records. That is unusual..

We found in our physician organization surveys that almost all the coders in physician offices are self trained, trained on the job, and have very little experience. We found that the coding knowledge and practice in physician offices are very rudimentary, there is very little quality control.

So here is a physician organization that had committed major resources to coding properly. Then we had two IPAs that were also pretty sophisticated. In fact, you had to be pretty sophisticated to do this project.

In each of these groups, we chose 10 physicians, six primary care physicians, two each pediatricians, family physicians and internists, and four sub-specialists, two ob-gyns and two cardiologists. For each of those physicians, we chose approximately 10 patients and looked at every episode of care that occurred in a six-month period. So three groups, about 10 physicians in each group, and about 10 patients per physician.

We then used Medstat as our contractor, and used trained Medstat medical records reviewers, who went in and coded every encounter with that subject physician over a six-month period. We asked those coders to go beyond what they normally did and to try to capture not only those billable items, but also to capture those services that might be valuable to a patient, that nobody is willing to pay for.

My real concern has been that a lot of things occur in physician offices that, since they don't get paid for, never get reported. I must say, I don't have a lot of confidence that we collected that, because it was really difficult to get those coders who are used to finding unbundling and finding over coding. It was very difficult to train them to think about capturing episodes of under coding. But we were interested in looking to see to what extent the physicians' documented diagnoses and services were really reflected at various points in the pathway.

It was a hard project to do, as I said. We were only able to look at three points in the system. We used the medical records as the gold standard. That was what was coded. We could not collect in any of the offices superbills, and could not collect consistently even what those physician offices submitted to the health plans. They were often available, but filed in deep storage, not alphabetical. We could have gotten them, but we just didn't have the resources to do it. When we do this study again, and I hope we will, and I think it is important to do it again, I think we need to do it prospectively.

So we had the chart. We really also couldn't tell what happened in the IPA or the medical group, but we did know what the IPA or the medical group interpreted from this clinician's observation. So we were able to capture and compare what the OPA or medical group had in its administrative data set with what was available in the clinical record.

Then we were able to look and see what happened at the health plan, in the health plan administrative data set. You notice there were a lot of points of potential transaction that we didn't look at. We have no clue what clearinghouses did, for example. As a matter of fact, we learned that clearinghouses were used fairly often, but many of the organizations had no idea that they even used clearinghouses. Many of them had no idea of what clearinghouses did. Many of the offices used billing services and felt that billing services -- and about a third of them gave the authority to the billing services to make changes in the codes, but they had no criteria or rules for how those billing offices ought to make changes.

So those were things that we weren't able to look at. This was just a pilot project.

Let me go through and tell you what we found. We had 786 encounters, in contrast to conventional wisdom. Conventional wisdom is, capitated doctors don't report. 99.4 percent of the encounters that we found were also found to be IPA or the medical group. Only give of the 781 did ont show up in the IPA or the medical group administrative data set. That doesn't mean that they were accurate, but there was an encounter that matched by physician and by date.

Of the 781, we then looked to see how many of those matched at the health plan. Here was where our first major decrement in data occurred. Only about 80 percent of the encounter records, occasional claim records that were apparently sent out by the IPA or the medical group to the health plan, actually ended up at the health plan.

Now, we don't know why they didn't end up, but I have a number of speculations based on some of the other studies. Many of the plans and provider organizations will reject an encounter form if it is late, if it is more than two months late or three months late. So an encounter form may be submitted late, doesn't call for a payment of claims, but using the claims mentality; the claim is too late, therefore we're not going to pay you and we're going to reject your claim. They are even rejecting encounter forms which includes data, data which they need. This is fairly common. We saw that in IPAs and medical groups.

We also saw that there were huge problems with eligibility data. Physicians organizations would think that a patient is a member of health plan A when in fact she is a member of health plan B. So we think that some were misdirected and went to the wrong plan and were just lost in the ether.

One of the major problems is that most of the California health plans will accept data in one of three formats: ASCII, ANSE standard, there is a California health plan standard, and every one of them has its own proprietary standard. One of the things that we discovered is that many of these submissions by IPAs and medical groups don't meet the data standard and are just rejected by the health plan. So I think that explains where these 20 percent were lost. But we were able to match an encounter, at least a patient, with a plan 80 percent of the time.

When we try to match specific dates and patients at a health plan, it fell even farther. We were only able to match 241, so we were able to find most of the patients in the health plan, but we were not able to match specific encounters most of the time. So out of the 781, by the time we got to specific encounter, same date, same intervention, we only had about 241 or about 39 percent.

Then we looked to see, how accurate were the diagnoses, complete and accurate were the diagnoses at the health plan level. We found ultimately that only about nine percent of them matched precisely, the same date, the same physician, complete list of diagnoses within a reasonable degree of latitude, and a complete list of services and procedures. There are a number of errors that occurred at this level, and as far as we could figure it out, half of them were physician organization coding errors, and about half of them were plan level coding errors.

DR. COLTIN: We're running low on time, so are you about done?

DR. DIXON: I'm just about through. So I think the cautions of our -- this was only a pilot study, and the question is, since this was California, is this reflective of what is going on in the rest of the country.

I suspect it is not reflective of what is going on in all areas of the country, but the systematic errors that we saw I think are not unique to a capitated environment. I think they probably are reflective of what is happening in much of the rest of the country, probably not only in the managed care area, but also in the fee for service area.

We don't know what the problems really are. I think additional study needs to be done. I think one of my conclusions is, this only causes worry; it doesn't really tell us what we have to do to solve the problem, and there really does need to be a prospective followup study looking outside of California, looking at a fee for service as well as capitated environment to see whether this is reflected.

I think there are a couple of lessons. I think HIPAA will probably help, but there is a real problem with garbage in, garbage out, and electronic transmission of claims won't work unless the data are good, and that the transactional rules, the rejection of claims because they are outdated, will not b solved by HIPAA regulations. A health plan or a provider organization may still decide to reject an encounter form or a claim that is over 60 days old, and that is I believe compatible with HIPAA regulations. So we have to look not only at the transactional formats, but also the rules and the incentives.

I think we found clearly that there are major problems with the business case. People view the data as proprietary. One of the hypotheses is that some of these physicians organizations had the data, refused to send them to the health plan because of some of these proprietary issues. They denied that to us, but it is entirely conceivable that some of the provider organizations are intentionally holding data back from health plans, and those are issues that we have to look at.

So this is a difficult study to do, but I think it calls into question some of the conventional wisdom about where the data are, how good they are and what the problems are.

DR. COLTIN: Thank you. Mark?

DR. SMITH: Thank you. I'll be brief, probably because I'm probably the least expert person here in matters of data. So I thought what I would do is speak a little bit from my own experience and about some of the things that we are doing and talk at a somewhat more abstract level, because you all are more expert than I am.

I am going to basically talk a little about California again, talk a little about AIDS, and then talk about some of the things that the foundation is doing.

California is important as Richard as illustrated, not only because it is by itself big and a lot of people, but because it has been a hotbed of innovation, and because any effort that you undertake that works for the country will have to work for California. The thing we have to remember is how heterogeneous the organizational and contractual relationships are. So not only does California have huge managed care penetration, but this phenomenon of capitation means that some of the data flows that we have been expecting because they are tied to payment, the ultimate incentive out there.

In addition, what is not sometimes appreciated is that there are huge numbers of other organizations right below the surface. You heard about data warehouses. There are PVMs, there are PPMs, there are intermediaries of all sorts who are involved in these various relationships, many of whom have data. There are carveouts from the provider to the health plan, so that behavioral health care, mental health and substance abuse are carved out for medical care. There are disease management carveouts, sometimes by the purchaser, sometimes by the health plan, sometimes by the medical group or integrated system. Even within the world of capitation, the arrangements vary, depending on what the capitated entity is.

So we have at least three. There are IPAs, there are medical groups and there are so-called integrated systems, or what we call hospitals in drag. Depending on the plan and depending on the capitated entity, sometimes pharmaceuticals are in, sometimes they are out. Sometimes hospitalization is in, sometimes there is a risk pool. Sometimes it is a risk-in risk pool, sometimes it is a withhold risk pool. There are huge numbers of different sorts of arrangements and relationships, and there are lots of different entities here that are relied on contractually and operationally, all of whom have different parts of data.

One of the lessons then, with all respect, forget Kaiser Permanente. I keep telling this to my colleagues in the CPM. I remember Kaiser Permanente, I believe in Kaiser Permanente, but most of the world does not and will not; that is not the way we are going. So the notion that we are going to have one big data warehouse where we can put it all where we can go get it is just not going to be the case.

So California is instructive for that reason. An approach to this problem will have to take into account lots of heterogeneity of relationships, lots of different organizations. In my view therefore, the answer has to be in exchange rules and not in the mother of all databases. So that is one of our lessons from California.

AIDS. Ten years ago, I was an assistant professor at Johns Hopkins, in charge of out-patient HIV services, and we were building what was to be the first Medicaid AIDS HMO. I have an interest in questions of quality of HIV care in managed care since at least that long. That was a long time ago as far as managed care and AIDS, and lots of people, not surprisingly, have concerns about, if you put AIDS patients in an HMO, how do you guarantee that they will get good care.

I have done a lot of stuff on AIDS quality of care since then. I've got to tell you, I'm not sure we have moved very far. In 1994, we held the first conference on HIV and managed care that included commissioning people at Rand and elsewhere to think about how you could develop quality of care measures. In 1997, I helped fund a panel that to this day, puts out guidelines for HIV care with the Kaiser Family Foundation and Public Health Service, and more recently we funded Rand and NCQA to develop a series of measures, including both clinical measures and patient derived measures on quality of care for patients in HIV.

The stumbling block frankly, in addition to the conceptualization of this, is that the standards of care move much faster than our ability to gather data. So I have been an advisor successively to first AXIS and HIXIS. These are huge studies of AIDS care in the United States now being co-led by one of Beth's colleagues at Rand, Sam Mezziti and Mark Shapiro, prior to that Fred Hellinger in AHCPR. I'm not sure what the total bill for these studies is now. By my calculation it is at least $35 million, dedicated to studying how patients in the United States are getting cared for their HIV, what services are utilized.

These are good people. They are doing very good complicated, tough work. But imagine, we have spent $35 million, and the main product of these studies is that -- I just reviewed a paper that has been submitted to a journal that reviews what AIDS care was in the spring of 1997. For all that money, so far as I could tell, not a dime of it has improved the ongoing infrastructure of any of the institutions whose patients are being studied, to monitor care in real time.

Now, for awhile, I thought this was just unique to AIDS. I said, the problem with AIDS is, it moves so fast, there are so many new drugs that change so quickly. It is true that AIDS is an extreme example, but if you think about advances in peptic ulcer disease care or in the care of MI or in lots of other diseases in the last five years, and think about how we know what we know about the care of people with diseases, it is basically because we fund a big study. Some PIs get an annuity, and we summon an army of nurses who go out and abstract the charts or, if we really have computerized data, an army of programmers to write special programs just for this study. We go out and we gather this data and we get the data set and we massage it and we analyze it and we publish it. In AIDS it is more obvious, but in other diseases it is equally true: by the time we know it, it is largely out of date. We are putting Windex on the rear-view mirror about what the care was of peptic ulcer disease two or three or four years ago.

If you stop and think about the research enterprise in quality of care, which the department which you advise funds for the most part, so far as I can tell, very little of that enterprise goes towards building a permanent infrastructure that might inform our understanding of quality in real time, because of the way we do this work.

So that was my lesson about AIDS. At first, I thought it was idiosyncratic to AIDS. After all this time, I really can't tell you what the quality of AIDS care is in Kaiser Permanente or Aetna Prudential, because who knows? But as I thought about this, while that is an exaggerated problem in AIDS, it stems more from the fact that our basic approach even of the federal agencies that are responsible for funding their investigations into care, is to do these one-off efforts, because what we know about the appropriateness of coronary artery disease surgery is basically in these off-off, a la carte, hugely expensive studies that have huge operational setup and take-down costs, and that leave behind very little of ongoing value in terms of monitoring care in real time.

Lastly, I want to say a little bit about what we have been doing in response to some of these issues. It is not because we have answers, but because we have been trying to work on it.

Richard has told you a little about the COLINKS project. We have just embarked on what is for us the logical extension of COLINKS. It involves an investment in Santa Barbara that will amount to some $10 or $12 million by us in the next five years, in partnership with some of the well-known makers of software.

I want to repeat something that Richard said. The problem here is not a technical problem. The things that we want to do are trivial by technological standards. The problem here has to do with in our view, one, the historic lack of investment in information systems in the health care industry, which is in part related to, two, the structure of the industry, particularly the cottage industry of physicians, where category three of information is basically generated.

So hospital data pretty much exists, and lab data. This is either category one or category two, as far as Beth's typology. I think it is a good typology, but the things that lots of people want to know that we are a long way away from are in category three, and it has to do partly with the fact that we have so many hundreds of thousands of physicians all in their little organizations, without capital and without expertise, and they are too cheap to buy computers, and their culture doesn't involve using computers, and that is where a lot of this difficulty is.

The third problem does have to do with standardization. We talked about this a little bit before, and my sense is that as a member of the CPM, we would all be happier with one imperfect standard than five standards, each of which is perfect in one fifth of its domain. That basically is the way that we go about doing these things.

Fourth is that in health care, unlike most other industries, at least in the care enterprise, we are basically finding an impossible task, because the data that people use to run the enterprise is not the data that we are using to manage quality. We are asking people to set up and record and maintain an entirely different set of data that from their perspective does not help them do what they do.

If you asked any other industry to do that, I suspect you wouldn't get very far. Yet, because of the primitiveness of the industry and its form of organization, our efforts to put on top of whatever it is people use to run their practices, their hospitals or whatever -- oh yeah, and by the way, we also want you to tell us, did you ask this patient about -- that is impossible.

So there is something about that that seems to us ultimately futile in the sense that everybody feels strapped for money as it is, which brings me back to Santa Barbara.

From our viewpoint, this is now a critical financial question as well. The health care industry is now in part trying to make up for its historical under investment, and whatever money we have saved in the last 10 years in inappropriate care, so far as I can tell, has all gone to Oracle and SIDEX. Every hospital I know has this huge MIS budget, and every medical group and every IPA that now is at the underpinnings of the financial instability of providers. It is not a technical problem. It is because for competitive and business and governance reasons, a few examples of which Richard described, every provider feels the need to have its own system, its own hardware, its acolytes that follow the hardware around, and so there is really no model for how it is we might all be able to exchange data as opposed to each of us building our own capacity.

That is what we are trying to test in Santa Barbara. Can you build a model not for one huge database, but for a collaborative approach to sharing data as opposed to, everybody is going to build his own proprietary data, which from our standpoint in California is literally bankrupting the health care system. Our attempt to solve this trading partner term that we started to use, trading partner by trading partner, each of them having their own proprietary data, that is literally bankrupting the system.

So my advice to you -- again, I'm not an expert here, is that part of what the department could do is to help think about a larger strategy to provide incentives, demonstration programs and other assistance in developing not the technical issues, but the governance, political, business model approach towards a more collaborative approach to data sharing, to try to help solve these questions of the capital that is necessary as well as the scientific developments necessary to get at best category three, which is some of the data everyone would need.

I think the promising thing is that this issue has now attracted a fair amount o capital from the investor owned sector. There are opportunities here to rationalize these processes. The danger is that if this is a vendor driven movement and each vendor has an incentive to have a proprietary system, then we have another level of the problem, which is part of why the COLINKS approach was not to say, you all have to buy the same machine. It is to say, we have got a set of interoperability and exchange standards to which all vendors will subscribe.

For instance, in Santa Barbara now, all vendors which will be selling to -- we basically have every provider in the town as part of this coalition. All of the hardware and software has to be COLINKS certified.

So the first thing that Microsoft and COLINKS did when they saw this RFP was, they said, what the hell is COLINKS? We said, here is what COLINKS is. So it is not that someone can or should dictate exactly what goes on, but there really is no substitute for the government in helping to set standards of exchange and interoperability and therefore help circumscribe the area in which vendors can and should compete.

To the extent that they are competing over proprietary standards, that is destructive for this movement. To the extent that there is some agreed-upon set of standards and therefore interfaces and usability and speed and cost, that is a different question. That it seems to me is an area in which this committee and its advice to the Secretary of the department could really play a critical role, because we become more and more convinced. When you see the sophistication of the technology in lots of other areas of our life, this is not a technical problem. This is a question of governance, of business model at work.

DR. COLTIN: Thank you. I'm sure that some of the work group have questions for you. In the interest of time, I'm going to move on, and then we'll have questions at the end.

So Carolyn and Josh are going to talk about the issues that they have been beginning to deal with around looking at standardizing data collection and performance measurement across different levels of the health care system. Steve is going to talk about the other way, how you can collect information on fee for service that is comparable with managed care.

Agenda Item: Issues of Data or Design Limitations

DR. SEIDMAN: I'll keep my remarks brief, because I think it would be good to have more of a discussion, interactive approach.

One of the projects that spurred our development of trying to think about how to use the quality measures that we have been using across different kinds of settings for different levels of the health system was a project called DIQUIP, the diabetes quality improvement project.

DIQUIP was an effort that was initiated by HCFA, but had several sponsors. Aside from HCFA and NCQA, also the Foundation for Accountability, the American Diabetes Association and ultimately the American College of Physicians and the American Academy of Family Physicians as well.

The goal was to try to create a common core set of measures that could be used by the different organizations measuring at different points in the system. We basically began by saying, we are all going to be out there trying to collect quality of care information to give us an idea of how different providers and plans are performing with respect to diabetes care; maybe we should all be collecting the same thing, or at least collecting things in the same way.

Obviously, it was a lot of work, but eventually we did come up with a core set of measures that are actually being used in multiple settings right now. That was no small feat.

But the next stage of that is the bigger challenge. It is now do we make the -- we have basically the same measures, but how do we make the data collection process efficient so that we don't have to collect data more than once. That is a much bigger challenge, because even though we are collecting -- for example, we all want to know, did the diabetic get an HB1C test in the last year. But how we go about collecting that piece of information is maybe different, depending on the setting.

So for example, the American Diabetes Association is collecting on a small sample of a physician;s patients. So it just a single doctor. It is a set of 35 medical records. So for them, the issue of going to lots and lots of medical records isn't as big an issue as for us, because for every health plan product or product line, they have to collect a sample of 411 patients. So those differences create distinctions in our desires to approach things in the same way.

We also have different purposes for the performance measures. We basically are -- HEDIS is specifically for purposes of accountability, that is, we want to look at plan to plan comparison and hold plans accountable for the performance that they provide to their members.

Some measurements are being used more for purposes of quality improvement. In those settings, the measures are being collected so that we can use that specific information as providers to improve the quality of care that we are providing. We want perhaps a different kind of information for that purpose. Certainly what we are hearing from a lot of individual providers and provider groups is, if I am going to collect this information, I want to know a lot of detail on it. If we are going to get that kind of detail, we want in seeking the information.

That is also building into -- and I'll turn it over to Carolyn -- our desire to create measures across the major accrediting bodies, which the health accreditation body, the hospital accreditation body, the Joint Commission of Accreditation of Health Care Organizations and the new AMA accreditation body, AMAP, are all trying to see if they can reconcile their diabetes measures to facilitate this problem.

Why don't I turn it over to Carolyn?

DR. COCITAS: Thanks. One of the good things about being towards the end of the agenda is that you make summary statements, because you know that all the smart people at the table are going to say a lot of what you already said.

I'm going to tell you a little bit about my perspective in being here first. I am here as Josh said speaking partly on behalf of the Performance Measurement Coordinating Council. For those of you who don't know what the PMCC is, it is relatively new. It is a joint collaboration between NCQA, JCHO and AMF. They came together to try to coordinate performance measurement activities among the three accrediting bodies, which is a little bit different than what some other organizations are doing.

But when we started to think about it, when the three accrediting organizations started thinking about it, they realized they had providers and a lot of other people going off in all different directions. They weren't sending consistent signals to the industry, and they were building measures that were really inconsistent with one another, telling different people to do different things.

Just to take one step back for a minute from my own perspective, I actually started -- I've been doing this for about 15 years, although not as a research methodologist. I actually started with HCFA back in the early '80s when we were trying under a Republican administration to get the federal government to take a larger role in collecting data and driving quality improvement with the large purchasing power they had, to which we were told, no, we will allow the private sector to lead, and we will follow.

As a result, I went to NCQA, because NCQA was a document center organization and HCFA didn't want to do it. So I thought, here is a lean and mean organization. In fact, I am quite proud of the work that NCQA has done, because I think it has moved more quickly then just about anybody out there.

We were laughing about the earlier days of HCFA, but Kathy probably remembers during HEDIS 2.0 when we were telling people, no, you can't use hospital in-patient based cholesterol screenings to count towards the out-patient cholesterol screening measures. So there were a lot of issues back then that we had to deal with.

Once I got down, I saw the process through and led the development of HEDIS through 1.0, then I decided I should go to a health plan to see what we had inflicted upon people. I want to tell you that when I was at NCQA, I believed every single thing that Kathy Coltin told me. I went specifically to a small Medicaid plan for a lot of reasons. I thought that would be instructive. I spent several years there as a director of QI, which I learned quite a bit about what we had inflicted upon people. So I had to go through the HEDIS process three times. Many of them were Medicaid HEDIS measures, so I learned quite a bit about that.

So now here I am with the PMCC. As I said, our responsibility is to try to coordinate. Philosophically, given all the experiences that I have, I am as many other people are, absolutely committed to the notion that this is the right thing to do. How we are going to do it is going to be challenging.

All of you know that these three accrediting organizations have different histories. Some report for accountability reasons. NCQA has been the leader. There are different reasons why the other two accrediting bodies are where they are, all of which I am not going to talk about a lot.

But after having said all this, I think making sense and using existing data to build complementary and consistent performance measures, which is what we are about, is going to be one of our biggest hurdles. So all the things that have been said around the table tonight are going to be probably more than trifled with what you all talked about. I know a lot of you have done cross cutting performance measurements, and I actually think we have a lot to learn from you all who have been doing that.

So as I said, most of the people in the room tonight are well travelled in this area. I think we have given you some tremendous expertise and advice about what to do.

What I wanted to do then, given to the point of being at the end of it, is to summarize. The reason that I told you where I came from is because I'm not sure which thing had the greatest influence on me, except that these are some of the things that I have been hearing from all of you, the micro things and some of the macro things, all of which are going to have to be dealt with, there is no question about it.

A lot of them everybody knows, but the data or nomenclature systems, as we all know -- we've been talking about this for 15 or 20 years -- were originally built for different purposes. To assume that we can tinker with them here or tinker with them there is not even -- well, I suppose that that is the approach that we have, so that is the way we are going. But it has caused a disconnect between what we want to know and what we can know. I think that is the point that Beth was trying to make.

It is absolutely true that the first question is, what do you want to know, but we have been driven by what capacity we have to know certain things, and kind of backed in.

But from the Medicaid side in particular, I can tell you, we are obviously dealing with a different level of providers. By the way, I am from Colorado, so we can talk a little bit more about that later.

But from the ground up, building from physicians' offices, I can tell you a couple of things. The coding is often at variance with what is being done. I use the word -- I felt funny about this, but saying on or with the patient. There is no question about it.

The current office staff capability for coding is tremendously variable. When we start thinking about the kind of people that are in offices doing coding, -- I am trying to think how many providers' offices I was in, but it was a lot, and I was amazed at the amount of knowledge and information technology systems that were in those offices, what the people who were there thought the box on their desk could provide for them, was incredible.

There is no question that different organizations from physicians up have absolutely established different personal priorities for what they think is important, questions that they think are important to them, and then as a result with data they think is important to them and should be collected. That goes back to the issue of the box on the person's table, how could they set that up. I think that is an observation I made.

Just the buy-in of getting from the ground up, of getting folks to develop a common sense of data or a common sense of purpose even, is an enormous challenge. I wouldn't even tend to think about the dollars, because I don't think I can conceive of them at this point.

There is no question that some codes lacked the sensitivity needed to distinguish severity of illness. Josh, I think you mentioned and other people have as to acute from chronic care. In many cases, codes don't exist at all for what we want to know. Again, these are all things that people have said. There is just not the financial reason for someone either to use them or know they exist at all; you all have heard that before.

Again, I am repeating, but managed care settings have problems with linkages among and across data sets. I can't remember who said about the lab data, but we decided a couple of years ago when I was with the plan that we knew that we were going to have to not only know about cervical cancer screening and et cetera, but we were going to have to know the result and things like that. So we started conversations with some of our vendors, which of course being a Medicaid plan, we negotiated some rates that they weren't particularly thrilled with. So this was their opportunity to open up the contracting process and nail this plan.

I think it was you who mentioned that, but it is absolutely true: they didn't want to just talk about what data we owned and what data that they could link back to us, but how much we were going to pay for every single value that we wanted to know about. That is no small challenge for a health plan, in particular for Medicaid plans that are small and are cropping up around the country, because a lot of the larger plans just aren't making the money on them. So what you have got with this population in particular is a group of pretty inexperienced people trying to build a quote-unquote health plan.

So I think that people as a result of all these things, they have prioritized as best they could. They might not have started. I think Jeff is absolutely right; you need to know what the question is. You don't go to the answers first and then think that that is ideal. But people have prioritized the best they could. I think they have used the data available with all its warts. I in particular hope that Beth and others will work really hard, and the rest of the scientific community. I know how many hours Beth works, and I have tortured her quite a bit about this.

It is unfortunate to me, and I have heard a lot of comments made, but in the meantime, surveys proliferate. I personally, and I think a lot of others, have a lot of concerns about the surveys that are out there now, how many going to the same populations. I have had some feedback from that. I think it was my grandmother who asked me how many times she was going to get a survey asking her a different question.

One of the things that I think we need to remember -- there are a couple of things, actually. One is that performance measurement is still not valued in the marketplace the way that it should be. It is not by purchasers, it is not necessarily by consumers. I think there is a lot of education and a lot of other things. But let's face it, it has been out there for awhile. It has always interested me, having been back when the HMO group originally developed HEDIS, and I know there was other work before, the reason was standardization, the proliferation of different RFIs, et cetera, let's get together. Consultants were involved, a lot of other people were involved, but why are we still where we are? That is a question I have, and maybe some of you can fill me in.

So I think that is basically what I wanted to say. The one thing I respectfully request is that the federal government take a significant proactive role. I hate to say it, but what that means is funding, and making a significant investment in a foundation necessary to implement comprehensive quality improvement, quality measurement and recording structures.

Just one last anecdote about this data issue, because I know that is why we are here. A friend of mine plays a lot of basketball. He went to this PCP because he thought that he had some kind of knee problem. He saw the physicians sit there and check off the codes that they have, and there was not a place for the PCP to check knee problems, so he just checked off that it was an ankle sprain. That just summarizes what I have to say about the data issues.

I will turn it over to whoever is next. Stephen.

DR. CLAUSER: I have to make a couple of points about limitation. I think that a lot of the issues that have been brought to the table regarding complete coding sets and so on, those are all very important issues to move things forward, but I think the committee ought to think about what are some of the sufficient conditions that can drive the agenda.

I think that one of those issues is the issue of a real business need for this information. I don't necessarily mean a business need in the sense from a provider or plan's level, but a business need from a purchaser's perspective. You want to accomplish some very specific goals with this information, and I think that until we have that, and it is driven by more than just the need for financial controls, we are going to have a real difficult time making substantial strides.

I think our experience has led us to believe that there is the possibility, if you can define some of those needs clearly, you can make some progress rather dramatically. I think that from our standpoint, we have tried to define our business needs clearly in terms of trying to purchase value, versus just trying to deal with cost issues.

We have engaged through the Secretary's quality initiative to engage other federal purchasers to have serious dialogues along those same efforts, speaking of the Veterans Administration and Department of Defense, organizations that have basic purchasing commitments. For us, we try to run those things, quality and performance measurement, through our core business functions, that really do impact on quality. I think those do differ, because as a federal agency, we do have a regulatory dimension as well as a purchasing dimension, and sometimes it is difficult to confuse those two. But whether it is our survey and accountability functions, which I think is largely driven by the minimum data set, or whether it is quality improvement in accountability functions or consumer information functions, which I think has driven Medicare HEDIS and Medicare Compare, I think that having very clear focused goals has moved clinical data information specification and collection pretty dramatically in areas.

So I think it is clearly feasible. I think there are some other issues though that are prerequisites to that as well. We have tried to do that in terms of a variety of strategies. I think DIQUiP, the example that Josh made, is an example of how within a year, we sat down with a group of purchasers and providers and creditors and came up with a measure set in the area of primary care which was very highly profiled, got it approved by many of the major organizations, not just NCQA, but BAT, ADA, the quality improvement group. The federal agencies are all behind it now.

We have got this very serious dialogue around data collection and improvement in diabetes care. So it can happen. One model for that is the prerequisites of making these business cases and bringing purchasers to the table that have the leverage to move resources quickly.

In that regard, I think there are a couple of issues that are important. One is standardization. I think that is extraordinarily important. And obviously, from the reasons of trying to improve validity and reliability, because everybody is trying to measure the same thing, but also it can drive the vendor community in ways that I think are very reinforcing.

A good example of that with the minimum data set. In the nursing home industry, there were vendors that had all kinds of proprietary systems. We brought them in together and we described to them the importance of having standardized data. Now, there is a whole new market of competition going on in the vendor community to try to teach nursing homes to add value to the data set. That is really what we want to do, as opposed to trying to create walls. So I think that standardization really can help.

Another issue that is important is, as you think of standardizing specifications, you think of data collection and recording tools. I think that is one area where we really have had a difficult time. I think the issues that Richard raised just address spades to that, in terms of the variety of ways information can flow in a system.

I wanted to say to reinforce that, the notion of auditing. I was going to use our HEDIS audit results on assessment, but I'm not sure what you are saying. But the point is, what we have seen with the Medicare health plans is that substantial improvement on some core measures that we really cared about, by working with the organizations in a penalty-free environment to try to collaborate and work together to make that data collection improve.

An area that we are working on that starts getting into some of this fee for service is, we are trying to figure out how to drive that auditing and data recording capability at the point of data collection. So the tools create the possibility to avoid some of the distortions that move up the system. There are some tools that we have been working on, Medquest, and there are others available, but that I think hold real promise for that. We are testing that both at Home Health and nursing homes, as a way of trying to see if we can actually improve the quality of data right at the source and minimize the amount of distortion as it moves through the system.

I think there is technology to do that. The question is, is getting that technology into the public domain through publicly copyright tools, I think the quest of the committee is, is it worth having a dialogue around this issue of quality data collection really being in the public good, and what does that mean in terms of operational policy for the Secretary, for the government, and how then do we engage the health care community in that discussion.

One final point that has really come home to me is, you have got to use the data. If you collect the data, you have got to use the data. All due respect to the academics, it is not just to produce publications to the New England Journal. There should be work that is going on also throughout the health care system to use that data to start communicating at real levels about what that information means.

One of the frustrating things today was how quickly we were moving from caps to other kinds of data recording strategies. I felt like, let's mine the information we have got. There is linkages between that information that I think we have to make better use of.

One of the best examples in nursing homes is trying to think of improving our structural measures in OSCR with our MDS data or our claims data. There are linkages there that we can work on and improve what we know about quality with what we already have.

I think these issues play out fairly well where you have organized systems. In managed care there are problems because of the way managed care has developed with the network models. But at least there are units of accountability where you can have those dialogues on a real-time basis. I think in institutional levels you also can do that. From our standpoint, we are moving very rapidly in the fee for service area from the institution level, both in terms of nursing homes, home health agencies, dialysis facilities and hospitals.

The big challenges for us is trying to link what we had in managed care in terms of some of the issues of continuity across systems in a very unorganized system like fee for service. Particularly when you get down to the issue of practitioner performance and physician groups and physicians has been an area where I think a big challenge is.

Then, the case I don't think is as clearly delineated, in terms of exactly where you go, how you go, and how you start developing this. The Balanced Budget Act of 1997 mandated that HICFA had to start doing a study to look at the application of HEDIS data in fee for service. We have got a project that is underway with the health economics research.

We are working with five large group practices across the United States to begin a process of looking at the feasibility of collecting HEDIS data both at the plan level, and we are also doing it at what we call a quasi-market level, and also some national work, using both survey data -- health outcome surveys is what we are using. We are also using claims data. We've got all together five HEDIS measures that we are working with -- mammography, retinal eye exams, beta blockers and followup after mental health hospitalization, are the ones we picked to try to get at different elements of the problem.

I just wanted to talk about a couple of questions that I think are real vexing in this area for us. The study isn't going to be completed until 2001, but already we are seeing what some of the challenges are in trying to think about this market in data collection and in data quality.

One is, the appropriate units for recording for these measures need two payment systems, whether it should be state level, market level or some definition of provider. The problem we are having is the unit of recording. It is a real challenge here. To some extent, you can identify it as a denominator problem.

We are trying to focus this effort by looking at large group practices and seeing what that tells us about quality. We are also trying to do it at a little higher level in geographic areas that are akin to HMO market areas.

I think that this becomes a problem for clinical effectiveness measures where populations are very limited -- diabetics, congestive heart failure patients. We really run into some problems with numbers, even when we are working in large groups. Although I think our work is preliminary, I think we are going to have to start thinking about, as we move to lower levels in the system, what is the value of purchaser specific recording versus provider reporting, in terms of trying to look at some of these quality issues. Hopefully we can draw upon studies in the field that can tell us whether there are differential effects across these systems, but it is not always clear that there is. That is an area where we are trying to do some more, to see if that would help us build a database that allows them to move down a little lower, but still have some comfort that we are getting an accurate measure of quality that people can use.

Then of course, who is responsible for the patient. Medicare is probably the worst example of that, in trying to understand that attribution challenge, given how our fee for service systems work.

Our work is focused on trying to figure out how to assign patients to practices. We have developed a number of algorithms based largely on claims data. What we are trying to do is align those algorithms up, first of all, with the provider's perception of what patients are under their management. So far, our best models hit about 70 percent agreement, but that is not bad so far.

We are trying to figure out what we need to do to understand the other 30 percent. Part of that problem in Medicare has to do with -- we have been trying to create longitudinal data. We are following these patients around, and there is a lot of movement nationally. These people are moving all over the place for large periods of time, which create a particular problem in Medicare, which probably doesn't necessarily exist in other markets.

But I think there is evidence that is important here, too. The next question we asked is, from the beneficiary's perspective, who do you think is the physician that is accountable for you. Beneficiaries have a higher hit rate to our algorithms; they hit it at about 80 percent so far, in our best algorithms. So there is a differential there between providers and beneficiaries which I think is very interesting, and we will continue to look at.

I think these issues are very important, because they do get to the part of this accountability, particularly when issues of patient compliance are important in understanding, do people that receive regimens from the same physicians that actually recommend the intervention. An example is a beneficiary who gets a reminder card from her physician and then ends up getting a flu shot from a provider while visiting a friend or relative. I think the issue is, how do we -- who gets credit for this sort of thing. It is a very difficult thing to see; should it be the provider who performs the service or the recommender or both.

I think the other thing that is important from the client's issue is, are fee for service beneficiaries different in some ways for managed care beneficiaries, in terms of how they may be predisposed to certain health behaviors, which have a systematic impact on compliance. Those are issues which have arisen from the study, which I think are going to merit other kinds of research. We know the underlying health status difference. There are systematic differences between those populations.

I think the third thing is, can the existing data sources i the different health care systems give us comparable data. I think so far, for screening data it looks pretty good. Mammography stuff, that looks pretty good. I think we are going to be able to go through a pretty deep level, in terms of being able to attribute issues of accountability between fee for service and managed care. Diagnosis data is a real problem in fee for service, except for the self report data that comes through our survey data. There is bias there, but it is not clear to me whether it is differential bias between managed care and fee for service that makes a difference.

Treatment data is a problem. Followup after mental health hospitalization does not yield comparable results, at least from our preliminary analysis, because the beneficiary doesn't go back to the same provider oftentimes for the followup visit. And of course, in the area of medical records data, the results are not as comparable, either. But that work is really still preliminary. We don't have a real analysis of why there seems to be poor reporting in fee for service than in the managed care area.

I think that there are some real challenges in thinking about the auditability of data in the fee for service system, anyway. So that is going to be a real problem.

In the near term, I think some of the survey data appears to be a better source for comparison across fee for service, especially for some of these screening measures that go across plans, although in the area of mammography, administration data is excellent. I think that we have no reason to believe that there is self reported bias that is differential across managed care and fee for service for the data, although when you have response rates of 60 to 80 percent minus the 20 to 30 percent, then you get into other kinds of problems of bias that Josh was talking about with the survey problems.

We are going to continue to work on this. But I think there has to be a larger discussion, particularly if the system is changing to more loose networks. There is a lot more noise out there in the way these systems are organizing and changing, to begin to look more seriously at some of these provider issues below the health plan, as a way to try and make more sense in what is going on. We are hoping that this study contributes to that effort.

DR. COLTIN: Thank you. Questions? I have one for Steve. Your algorithm process sounds to me like it is something that might work on the private sector side for PPO type organizations that have the same problem in trying to figure out who is accountable. If you can solve it for the fee for service Medicare population, you may develop a methodology that then could make its way into the private sector and be used to be able to produce measures that are representative of lots of different types of delivery systems, and not just HMOs.

DR. CLAUSER: Depending on what happens with the current legislative environment, where we may have to worry about it a lot sooner or later, when we think we'll have to wait and see.

DR. COCITAS: I might mention, we are engaged in a project that is just starting right now, to try to do a lot of what is Steve was mentioning for the indemnity population of these three large employers in Michigan. We are working with Blue Cross Blue Shield in Michigan to take the HEDIS measures into the fee for service sector and look at feasibility issues.

One of the interesting challenges that we ran across is a benefit package problem. In the union negotiations, the hourly workers are not covered for most preventive care services. So one of the interesting conversations we have had is, is it fair to hold the system accountable for something that a person doesn't pay for, even if it represents a standard of care. So childhood immunizations are not paid for; is it fair to report on the rate of child immunizations in the population.

Our answer is yes. This is about informed choice. Here are the consequences of being in one delivery system versus another. You can decide if that is important to you or not, but here is the answer. But in terms of tracking those events, it is considerably more challenging, because we know there will never be claims data available to track that information. But we have gotten a commitment to basically use a combination of administration data and medical records review to try to begin getting a handle on what is really going on. We are going to try to coordinate with HCFA to learn what they have learned.

But this issue of, so you have decided to go to the record, which one, is one we are going to face as well.

DR. COHN: I want to make a comment or two, and then a question. First of all, Kathy, I want to thank you. You've got a wonderful panel. I am actually a guest of this work group, and I think we could spend the whole day going through these discussions. I'd like to question each of you after your presentations.

I hear a lot of frustration in the room. I think there will probably continue to be an ongoing frustration when there are such different ways of dealing with health care issues. There is the payment process. It seems to have nothing to do with the quality process, as best I can tell. I would also observe that until that somehow emerges a little better, I think there is going to be a lot more frustration in this room.

Now, moving to my question, I chair the subcommittee on standards and security. One of our main focuses recently has been the whole issue of different transactions. In many ways, HIPAA transactions are trying to deal in a standardized fashion, what you would describe as level one data. Jeff Blair, who is also on the committee, chairs the work group of computer based patient records, and that is a little bit of a misnomer, because it is really talking about issues around standardization of level two and level three data.

Now, I walked into this room thinking we would be talking a lot about level two and maybe some about level three data and the issues going on there. I would now ask you all, is the issue level one data, beginning to do that reasonably well? There are so many holes in that, that that needs to be the major focus for the next couple of years? Or are there things that we can do in level two and level three that can be of some help in the time between now and the time I retire, the next three, four, five years, based on your observations?

DR. COCITAS: I think it is highly variable. If you look at Kathy Coltin in Group Health and a couple of other places, you are probably looking very differently than at the kind of health plan that I had experience with. That was a response to your question.

But I also have another comment on accountability. From the PMCC perspective, we are really interested in that. It takes me back to the early day of HEDIS, because we talked a lot with the purchasing community about who was the accountable entity. Was it the fact that a physician recommended it, or did they do it? We talked a lot about that, and I think those are the issues that you were suggesting you had talked about, Stephen. I think that when we start looking at delivery across all three settings and we begin to think about -- redraw this whole picture of clinical logic and responsibility, and then we start trying to say who is responsible for what and where are the handoffs, who should and who shouldn't, that issue of accountability is going to be an issue for us. But I hope it would be consistently thought about among all these groups.

DR. MC GLYNN: My answer to your question is that my point was that I didn't think we should focus on any one of these, that we ought to find strategies for moving forward in all three.

I guess my cautionary concern is, let's not assume we have the problem solved in that first group. I think there is a tendency to believe that they represent the truth somehow, because they are automated. It is a funny sort of notion that people have that I have run into for a long time, that if it is on a computer, it is somehow that much more meaningful or reliable or valid.

I think what is interesting is, if you take a look at the work HCFA did through the cardiovascular -- what happened there is, and this gets to the point about, you have to use the data, which is, they started down this path. They got through initially a lot of problems, but they went ahead and reported things, and lo and behold, you are holding people accountable based on data that they are filling out, and all of a sudden there is an incentive to get it right.

Same thing in New York State, with the reporting on mortality following bypass surgery in the state. A lot of data got better because it was being used. The typical response when you give providers feedback on quality information is, the data are wrong. Almost always the first time you measure them, things don't look so hot. They don't look nearly as good as how the providers think they are doing.

So you have to go through the explanations about, the results are probably right, but if you don't believe the data starts at home, you get it right. So I honestly think that the way you improve level one data is using it, and using it on the ground where it is happening. That is where you raise peoples' awareness about the multiple uses of data. So it is not just about, am I going to get my bill paid or am I going to get my capitation check. Somebody is going to count something about what I am doing around patients with heart disease or diabetes based on the way I fill out this bill, and that is going to make me pay attention to this at a whole new level.

What I often say about NCQA is that the lesson we should learn from HEDIS is what I call the Nike just do it philosophy. Let's not wait to get it perfect, let's just use it. If you use it, they will improve it. I really think that is far preferable to sitting around and trying to work out how to make it perfect. It is in the using that you provide incentives for improvement.

I don't want us to get so hung that we spend all of our time improving those data sets. In some ways, when you look ta chronic disease management, a lot of what we are interested in doesn't live at that level of data. So it is trying to figure out a strategy that whatever limited resources we bring to bear on moving this information forward, that we try to bring all three types of data along and into the domain of usability and more routine access.

DR. CLAUSER: Let me comment. I don't know that it is really that the glass is 90 percent full or 91 percent empty, although that is one interpretation I guess of our findings. I think it is better than that.

I come away with a couple of impressions. I think the overwhelming impression is that we truly get what we pay for. I think that the data that we have in our system is particularly influenced by the traditions of what we pay for. We have not completed our analysis, but certainly physicians and physician offices pay a lot more attention to procedure code accuracy than they do diagnosis, no surprise. That is what you get paid for. As long as you have almost any entry in the diagnosis code, that is good enough. You don't need generally eight or ten diagnoses that we might be interested in. One usually is good enough. Some sophisticated organizations might make the connection that between what is in the diagnosis box and the service or procedure is, but that is not the routine.

So that is one factor. Second is, I think that there has been some teaching to the test. We have not completed our analysis. Are there differential things that we value, HEDIS results, for example? Are they reported, recorded, transmitted better than things that we don't care about? I believe that there is a real differential between the kinds of diagnoses and maybe even the kinds of procedures that are reported at will versus not will. I think that is not a capitated California issue; I think it is probably a national issue even in fee for service.

So I think the data are not as bad as they look. We set a very high standard. We expected close agreement, and often the encounter data are good enough.

The final thing that makes them good enough is, for ambulatory care, if you are looking over a period of time, you might fail to report diabetes once, but if the patient really has diabetes and she needs care for her diabetes, you are going to get it right once or twice over a period of time. So eventually the database becomes populated with at least those conditions that are sufficiently important to be populated there.

We look at atoms of episodes and we looked at diagnoses and procedures at the atomic level. I think when you put them together into larger molecules, they are not quite so right.

DR. LUMPKIN: A couple of comments. The first is, I was thinking at the level of hierarchy that there may be a level four, which is data that is not captured at all. Probably the biggest area of that is the one that we have been trying to get at with surveys, which is the patient information.

I am also struck by Steve's presentation. At the University of Chicago when he was there, we used to have a set of rules, and Rosen's rule number four was that when all else failed, examine the patient. But what I found interesting is a comment that relates to the fact that one out of every four people that worked at our agency were involved in certification and licensure. Yet we have not seemed to have the ability to integrate these two processes, of how we go in and inspect hospitals on a regular basis or go in and inspect nursing homes on a regular basis, and how do we correlate that with MBS data. I just don't think it has been integrated in a meaningful way.

DR. CLAUSER: We are investing in that as we speak.

DR. LUMPKIN: But I think in that regard, there are aspects of quality which cannot be measured because the threshold which they have to get at may not be detectable.

A key element is sanitation. We see this in hospitals. In Illinois, we cite hospitals that have been accredited by a joint commission right and left for major breaches in sanitation. Hospitals have been fully accredited for which the subsequent solution for the failure of the physicians to maintain sterility in the operating room is to station a security guard. If you are wearing scrubs you don't leave, but if you do leave you can't come back unless you change your scrubs. Little simple things like that. But they won't show up on a measurement until you've got a major problem. That really is an aspect of quality which -- a patient walking into a hospital can tell whether the hospital is dirty or not, as can an inspector.

So that is another piece I just wanted to toss out. Finally, the last piece to toss out for thought. We are in another report working on the national health information infrastructure. We have three dimensions of what we are describing as health information infrastructure. One is the population view, one is the caregiver view, and the third is the patient dimension.

It seems to me that we need to look at how we collect and aggregate data in describing quality. The whole issue of who gets the credit for a vaccine; is it the doc who sent it out or is it the doc who gave it. The real issue is, did that patient get quality care. So it is not the caregiver dimension that we are talking about in measuring quality, it is really the patient dimension. It is that longitudinal way of looking at that care that best describes it.

We need to somehow get a picture that looks at not only are the health plans providing quality care, but also, are the patients receiving it.

DR. MC GLYNN: Your hospital example reminds me that in L.A. County, they started about a year ago posting as you walk into restaurants the sanitation grade. It is right there in the front, you can't miss it. It is weird to be in a different city and walk into a restaurant that doesn't have those grades, but that has made the restaurant industry sit up and take notice, because people literally turn away. B, you decide, am I wiling to take a risk; C, forget it.

It strikes me that we need to be a little bit careful about the limits of measurement. There are alterative ways to achieve our goals, some of which may be fairly low tech. This is pretty low tech. Yet, I think from what I have just seen is pretty darn effective in terms of doing the thing you want to do.

I don't care about a standardized comparison. What I care about is an easy way for a consumer to make a judgment about whether they are going to take a risk with their health walking into this restaurant. Similarly with a hospital. Obviously that is easier if you have some choice about whether you are walking in voluntarily or not. But the point is, posting those grades gets the group you want to be paying attention to the problem to rally around it, and who cares if consumers are sitting there making comparisons. It has been a very effective threat. When they started it, they closed down the restaurant that the mayor owned.

It really made it clear that no favors were being granted.

So I think sometimes I try to think about where does measurement help us improve understand and where are there other ways that we should be thinking about how we get -- the reason I am in the measurement business is to make the health care system better. I can get entertained by the fine points of measurement, but that is not the point. That is really not what it is all about.

So I tried to remember to step back from time to time and say, is this taking us towards the goal of making things better, and is this the right vehicle, and the answer isn't always the finest the measurement has to offer or the finest that comparative public reporting has to offer. I think that we need to be more willing to look at other mechanisms for reaching some of these goals.

DR. SEIDMAN: I just wanted to respond to Simon's point very quickly, just to say that things tend to paint a gloomy picture, but I do think it is worth recognizing that everything is relative, and there have been a lot of improvements that have been made as well. If you just think back, I recall things in health care 10 years ago, but we talked about the level of standardization of data that existed before HEDIS in terms of how quality data were being recorded. You would come out with a completely different picture.

If you think about the fact that there are now over 400 health plans reporting a standardized set of measures, that is a really big difference from what it was a decade ago or even six years ago. Even if there are inaccuracies in the data, just the attention that health plans are giving to try to find the data and try and collect the data, and even what Richard said about capitated providers, awhile ago you wouldn't have been able to get encounter data, but that is something that has come out not entirely because of HEDIS, but that has been one of the stimulating factors.

The fact that we now have measures where we basically require all these health plans that are reporting data, to be able to provide automated pharmacy data, and that is something that is an improvement. We now have an auditing process. Even if the auditing process isn't perfect, it certainly is a step above what the data accuracy was before, and 86 percent of those 400-plus plans are submitting auditing data to us.

Just the fact that we now do have a much broader spectrum of measures I think is important.

DR. DIXON: I'd also supplement some of the suggestions that I think Mark made, in terms of recommendations that this committee might carry forward to the department's level.

The COLINKS experience as I said at the very beginning made me hopeful that enlightened self interest would drive standardization of data and sharing of data and all of that. I am less certain about that now. I think we need stronger incentives.

The thing that we hear over and over again at the provider level as well as the plan level is that nobody pays for high quality data. There is not that business case. I think that is a huge impediment, that there is no disadvantage to having poor quality data. The cost savings of being able to do your business more effectively just apparently are not sufficient to drive major data quality improvements. So I think it is going to take stronger and stronger leverage.

What might that leverage be? I think there are two levels. One, this is pretty idealistic, I sometimes think that what we need is a national institute of informatics. If we think about the ways that high technology medicine was disseminated throughout the country, it was by the NIH putting in cardiologists and biomedical researchers in every medical school who trained medical students and everybody else, and developed a population of physicians who are committed to high tech cardiology. One wonders whether the same sort of approach in putting training grants and populating throughout the country in the medical schools a group of people whose lives and businesses were based upon improving informatics may not have a small benefit.

That is perhaps idealistic. I think what is not idealistic is getting the major purchasers -- and I think that is the federal government -- to begin to put some financial rewards and penalties around quality of data. There are all sorts of problems with auditing, I recognize, but I think they are relatively low cost, low technology ways of validating quality of data.

We have some approaches for example in tracing records, how quickly and accurately through the system, that would be very, very cheap, as test systems to get some rewards.

One of the things that we have tried to do with the business group on health is have the purchasers get all the health plans to agree that high quality data would be rewarded by higher premiums. Probably a zero sum game and those who with low quality data would lose, and that money would be shifted to those that had high quality data. Try to synchronize it so that all the plans got the same message, and furthermore build into the contractual requirements between the major purchasers and the health plans that the health plans would drive those rewards down to the provider level.

As we talk about what it might take to produce real change at the provider level, we are talking about 10 percent differences in premiums, at least that. It is not a nickel and dime sort of bonus or reward or whatever; it is a significant amount of resources that are tied to improved performance.

We are talking about improved performance and quality of care, but I think many of us believe that we are not going to be able to understand where we have improved quality of care until we have data that adequately show that. So the first step many of us believe is trying to drive the business community to improve quality of data, so that we can then move to look at quality of care more effectively.

I think that my experience over the last three years, I have been investing a lot of hope to try to make change, and optimism, that it really is going to take some powerful leverage to produce change.

DR. COLTIN: We had planned to adjourn at 8:30. I know that it has been a long day for just about everybody around this table. So I don't want people to feel that we have to stay here any longer. I think that we should adjourn formally, and then if there are those who want to stick around and have a conversation, that's fine. But don't by any means feel obligated to do so.

(Whereupon, the meeting was adjourned at 8:47 p.m.)