[This Transcript is Unedited]

Department of Health and Human Services

National Committee on Vital and Health Statistics

Subcommittee on Populations

January 24, 2000

Hubert H. Humphrey Building
Room 705A
200 Independence Avenue, SW
Washington, DC 20201

Proceedings By:
CASET Associates, Ltd.
10201 Lee Highway Suite 160
Fairfax, VA 22030
(703) 352-0091

PARTICIPANTS

Subcommittee:

Staff:


TABLE OF CONTENTS


P R O C E E D I N G S (10:05 a.m.)

Agenda Item: Call to Order and Introductions - Lisa I. Iezzoni, M.D., M.S., Chair

DR. IEZZONI: This is the Subcommittee on Populations of the National Committee on Vital and Health Statistics. This is the first meeting of what I hope will be either three or four meetings, Patrice. Hopefully, it will be four meetings of our subcommittee to look at the following. To explore whether routine collection of information on people's functional status on administrative records provides information that is worth the cost.

What we are going to do today is begin to hear from potential customers of those data about how they might be used. We're going to start though with just kind of getting some background about this, so the committee can kind of get up to speed.

What I would like to do right now is just go around the room, have everybody introduce themselves, and then I'll make a few introductory comments, and we'll start.

I'm Lisa Iezzoni. I'm the chair of the subcommittee. I'm at Beth Israel Deaconess Medical Center in Boston.

MS. RIMES: I'm Carolyn Rimes. I'm just staff.

MS. QUEEN: Susan Queen, staff.

DR. NEWACHECK: I'm Paul Newacheck, with the University of California, and a member of the committee.

DR. STARFIELD: Barbara Starfield from Johns Hopkins University, and member of the committee.

DR. KANE: I'm Bob Kane from the University of Minnesota. I'm a visitor.

DR. IEZZONI: And a teacher. We'll hear from him in a few minutes.

DR. BRAITHWAITE: I'm Bill Braithwaite from ASPE and HHS.

MR. GELLMAN: Bob Gellman. I'm a privacy consultant, and a member of the committee.

MR. HANDLER: Aaron Handler, Demographic Statistics Team, Indian Health Service, and I'm staff person to the committee.

DR. JANES: Gail Janes, CDC, staff to the committee.

DR. PLACEK: Paul Placek, National Center for Health Statistics, CDC, and on the panel tomorrow, but also helping the staff.

DR. HENDERSHOT: Gerry Hendershot, also of NCHS, and staff.

MR. HITCHCOCK: Dale Hitchcock, ASPE, staff to the subcommittee.

MS. GREENBERG: I'm Marjorie Greenberg, National Center for Health Statistics, CDC, and executive secretary to the committee.

MS. WARD: Elizabeth Ward, Foundation for Health Care Quality, and a member of the committee.

MS. COLTIN: I'm Kathy Coltin from Harvard Pilgrim Health Care and a member of the committee.

[Introductions of observers and guests.]

Good thank you. I'd like to start by thanking the staff to the committee. Carolyn Rimes, Susan Queen, Gerry Hendershot, Paul Placek, and Patrice have really, really worked hard to put this meeting together, and I thank you. You did yeomen work at the very last minute.

Those who saw a former agenda know that our lead speaker, Andy Kramer, had a skiing accident that Carolyn found out about, and fortunately Bob Kane was able to step in and hopefully, help us out with that. Andy is home nursing a fractured tibia, so we will wish him well. But thank you to the staff of the committee.

I would like to just talk for maybe about four minutes about the background for what we are doing here. Again, our goal here is to explore whether routine collection of information on people's functional status on administrative records is worth the cost.

As we know, the National Committee on Vital and Health Statistics has a long history of advising the nation about new data gathering. Marjorie, was it back in 1972 that the uniform hospital discharge data set was articulated by the National Committee on Vital and Health Statistics, and it became kind of the national standard for collecting diagnostic information on hospitalizations. And then more recently diagnostic information has been collected on physician claims, at least for the Medicare program, which tends to disseminate out to private payers and others.

Now this diagnostic information is obviously helpful. It's helpful for us to know diagnoses. But sometimes the link between diagnoses and health or well being is not as straightforward. Therefore, back in the mid-1990s when the National Committee on Vital and Health Statistics did a core data elements project, trying to identify core sets of information that we should collect routinely about people's health, we put in a placeholder for functional status or health status.

It was only a placeholder, because we frankly at the time, had no clue how to go about collecting that information, how to capture it. But we knew that it was important. Functional status information is important for a variety of different reasons. It can be used for quality monitoring. It can be used to look at population health.

It can also be used increasing for setting payment levels, not as Margaret Stineman will talk about for rehabilitation facilities or places where you would think traditionally that functional status important. But also potentially even for setting capitation rates for the general population.

But routine gathering of functional status information on administrative records has a huge number of challenges attached to it. How should we do it? Functional status is multidimensional. Are we going to look for all the dimensions? Which ones should we focus on? Who should we collect the information from? Should we collect it from people? Should we collect it from their providers? We know that people and their providers often will disagree when looking at the person's functional status.

For children the issues of who and how become even more acute, because often children are not able to respond for themselves, and what constitutes functioning for a child as he or she grows from infancy has yet been -- people are still working on how to capture that in a straightforward and expeditious way.

The cost of collecting this information could be huge, and we have to struggle with how accurate it will be. And what is the standard for accuracy that we would tolerate?

Then finally, we do have to look at the utility of the information for quality of care monitoring, public health monitoring, payment. And we have to trade that off, not only against cost in terms of financial terms. But also potential cost in terms of privacy concerns that people might have in responding to this information and collecting it.

So there are a lot of issues that I think we have to think about. And hopefully, we will have the participation of the two other NCVHS subcommittees in looking at these multiple issues, the Subcommittee on Standards, and the Subcommittee on Privacy, to help us think about this as we forward.

I'll just state my position up front, which is that we may end this nine month exploration and find that this isn't worth it. But I think given that we have kind of laid out the placeholder, and it's been something that's been kind of niggling at the back of our mind for a while, I think that it's worth for us to explore this, at least for several meetings, and see whether we think it's worth proposing moving forward on this.

So the next nine months we are going to spend exploring this. And today for our kind of introduction to this we have two speakers, Bill Braithwaite, and Bob Kane, who will hopefully do somewhat different things. I would like Bill to speak first, because Bill is going to talk about administrative records and standards. This is the context in which we are doing all of this. This is not the National Health Interview Survey. This is not the National Long-term Care Survey. This is not a survey.

This is thinking about routine collection of administrative data within routine encounter standards and so on. So Bill can talk to us first about the vehicle for collecting this information, and where it would fit, or might not. Then Bob will talk to us about what it might look like. So getting us started.

Bill.

I was just informed that the order was reversed by staff, even though it's listed with Bill first and Bob second. Do you guys have any preference for the order that you go in?

DR. KANE: We had sort of set it up to introduce the topic first.

DR. IEZZONI: That's fine. It's not a big deal. We'll start with Bob. So Bob will introduce the topic, and then Bill will tell us where this might fit, if that's okay with you.

Agenda Item: Overview of Functional Assessment and Health Status: Issue Identification - Robert L. Kane, M.D., University of Minnesota School of Public Health

DR. KANE: Actually, I'm here replacing Andy Kramer, who, as you heard, had a functional status problem. So at least there is some relationship between injury and outcomes. I'm here sort of to present the academic perspective, or what is known in the parlance as sort of where the rubber meets the sky.

So what I would like to do is sort of give you an overview of some of the issues that we think about when we talk about functional status, and how it can be used in these contexts.

Now the first question is, and I think Lisa talked about whether it would be worth it in the end. I think one of the important issues that this group needs to grapple with is what is the 'it,' because there are a lot of its out there, and some of them may be more worth it than others, and some of them may be more suited to certain kinds of contexts than others.

It makes a big difference obviously if you are trying to do a survey or some kind of a measurement to look at the rate of what is the disability in a population, as opposed to whether you are trying to use this to actually do things that are important to people, like paying them. Some mention was made of the interest that has certainly been evinced by HCFA to look at capitation rates.

So one can think about for example, trying to predict utilization as part of a planning effort in order to either extrapolate to the year 2050 or whatever eon one is interested in. Or one can think of it as sort of how this HMO is going to be paid next year. Now extrapolation efforts are always sort of mythical anyway, so one could probably tolerate a fair amount of error there. I wouldn't worry about it too much. Nobody in his right mind would take that seriously anyway.

But if you are talking about real dollars getting paid next year, this becomes very serious, and one begins to look at this. If one things about the whole process of looking at capitation rates, it turns out that you can predict 90 percent of the population without doing anything. All the action is in that last 10 percent of the population. That's really where you sort of make or break these kinds of things, and accuracy there is absolutely critical, and nobody knows how to do it.

Now there is some suggestion that in fact functional status adds a little bit to the accuracy of this prediction, but not as much as some people would like to think it does. The other issue is obviously if this information is going to be used to generate income, then would you want to trust the people who are going to be paid on the basis of it to provide the information?

We never have had any kind of grade inflation in academics, but there are other areas of the world where people have been known to distort data for their own individual purposes. And there is a huge veil of distrust at the moment around this. Now we don't trust providers, but the fact is we may not trust consumers either.

There are people who are in the disability business who are anxious to suspect that people may exaggerate their illnesses or their disabilities if it's going to affect their payment level. Now we use disability to set eligibility for a number of federal and state programs, and people are always going back and questioning whether these reports are as absolutely honest as they need to be. So I think we need to worry about both of those kinds of things.

Now therefore, there is a certain element of sort of: (a) this data is important, and; (b) because it is important, we don't trust people to provide it very accurately. Then we have to think about how do we want to collect it. Everybody's favorite answer is well, let's do a special survey. We'll go out and train a whole armada of data collectors, who will be highly trained in how to get this kind of data. But it doesn't take you very long, even without a calculator, to think about what this would really cost to try and put on.

Now in fact there are circumstances where we are doing special surveys, and there are ways of doing that, and I'll talk about those in a minute. But most people are sort of thinking about how can we get this at sort of minimal added cost. That usually gets you into something called administrative data or some kind of thing.

Now there are two ways in which you could collect administrative data. One is you could simply put some set aside columns into your standardized form, and expect people to sort of fill those in. The other way would be to go back and sort of train people to look through the record to see what has been recorded previously, and to summarize those.

There are serious problems with both of those, and we will talk about these. If you think about just sort of putting additional columns into the inpatient and outpatient billing forms, then you really run into the fact of who has the information, who has the knowledge base in which to introduce this information? One of the questions this group has to struggle with, is no data worse than bad data?

There is no evidence at the moment that certainly anybody in a clerical position could either abstract this information from a discharge summary, or that if it were in a discharge summary, that it would have been collected in any kind of a systematic way that would allow you to enter it.

Now diagnoses aren't always right, but they are right a pretty good chunk of the time. Functional status is collected by different people at different times, with very different perspectives on what it means. And without some systematic way to collect this, I think there are real questions about how consistently it would be collected, how complete the data would be, and frankly, how valid it would be for a number of reasons that we'll talk about in a minute.

Now the only claim I can make to being a consumer is I am interested in what happens to old people. And so in the context of older people, we are talking about using functional status data right now for setting capitation rates. But most people who are looking at Medicare capitation rates recognize that the current system that's out there is not very satisfactory, and doesn't do a very good job of predicting people at the high end of the utilization curve. And there is at least some suspicion that functional status may add a little bit. Our data suggests not as much as we would have hoped, but at least it adds a little bit.

There is another use that we have talked about for this, and that would be in setting prospective payment rates. Right now we have been talking about it within sort of niches, that is, we're setting up prospective payment rates for rehabilitation. We are setting a prospective payment rate for nursing homes. We're setting a prospective payment rate for home health care.

But that's really a very short-term approach, and is frankly wrong, because the goal ought to be to set prospective payment rates for post-acute care that is not modality-specific. What we call prospective payment rates in fact so far aren't. Prospective payment rates for nursing homes has nothing to do with prospective payment. It's a daily rate payment, and really doesn't look at an episode of illness at all, and it certainly doesn't encourage any kind of flexibility or creativity or interchangability among the various modalities of care.

So if we are really trying to set a rate for an individual to estimate what person's true risk would be over a period of time, you don't want to necessarily tie it to a particular modality of service. There are opportunities there. Certainly, functional status for post-acute care is among the most predictive elements that have been now well established in terms of what their utilization is going to look like, as well as their outcomes.

People have used functional status to target comprehensive geriatric assessment, and certainly when you look at the efficiency of how we do this kind of fairly expensive intervention, it works best with those people who really have the highest potential for benefitting from it. So one could do it there.

People have certainly used it for case management. There is a real potential that is at this point underdeveloped to use functional assessment as a way of targeting primary care management. There is an argument to be made that we are sort of practicing a whole anachronistic model of health care that doesn't recognize the presence of chronic diseases.

And that what we really need to do is to feed this back into the system in a proactive way, not necessarily simply to pay people, but in fact to really refocus the energies and the efforts that we have in the health care system, to begin to direct people who need this kind of care, and to monitor their change over time in order to know when to intervene in the most useful way.

So there are a lot of potential uses. And this data, and the problem of collecting it will fit better into some of those circumstances than they will into others. The further we move away from costs, the more comfortable we become in talking about some of this stuff.

Now the next question is around self-report. Most of the time we talk about self-report, as opposed to professionally determined information, as opposed to demonstrated functional status. Now let me make an early distinction. You want to make a very clear separation between where you get the data from, and how you get the data.

There is this myth that is prevalent in the health care system that somehow data that comes from professionals is somehow better than data that comes from patients. What do patients know about things, right? Well, of course the obvious first question is where do the professionals get the data? All you're doing is sort of paying a filter to distort the information you could have gotten directly from the patients.

Now there are some professionals who believe that they possess real insight, and can somehow drill down to separate out those who aren't presenting good information. But I would argue that for every good driller downer, there are probably three people who distort. Since health professionals are notoriously poor at collecting systematic data on functional status, you would be a lot further ahead by getting the data wherever possible, directly from the patient.

Now there are obviously a group of patients who can't report their data. And in fact, there is probably a positive correlation between the functional disability and the ability to report, unless there are people who are too demented or too disabled to be able to report that kind of information. So there is going to have to be some reliance on proxy information. But even proxy information is usually better than professional information.

So I would at least urge you to be cautious of the first bias, which is that things that come from a computer that are signed by a physician somehow have accuracy behind them. There is no evidence to support that whatsoever, and probably more to the contrary.

So the second question we have to ask is all right, if we are going to allow people to basically report on their own disability status, how reliable is that information? And there are a number of studies out there that suggest it's pretty darn reliable. You can ask questions.

Now in order to be reliable, of course you have to ask the questions the same way. Since we virtually never in clinical practice do anything systematically, and certainly not around functional status, the reliability is the art of the possible, not the art of the probable.

If we were going to collect this data, we would really have to collect it in concert with changing the very nature of the way we practice clinical medicine to introduction more systemization in the way we acquire this information from patients, or allow them to report it.

On the other hand, while it's reliable, it's also corruptible. Again, the more the data is likely to be used for payment purposes, the more there is a concern that there may be forces that would no longer make this data quite as accessible.

Now the other alternative then to collecting information beyond self-report would be to do direct measurement. There are ways to collect a large amount of functional status data by various kinds of direct measurement -- simulations, performance-based testing -- but these things have some real drawbacks, and I don't think we are really going to consider them very serious for the purposes of this committee.

But in addition to the enormous cost involved in trying to set up all of these sort of simulation laboratories, there are huge problems about just convenience. And the fact is that for some kinds of things, like activities of daily living, some are easier to simulate than others, and it becomes fairly difficult if you want to simulate toileting and things like that, and bathing, to be easy to do.

So I'm, for the moment, going to dismiss the likelihood that we would actually move to do demonstrations, and really talk about various kinds of self-report, either reported by the individual, or reported by the professional.

Now when we talk about professional data, we need to recognize that different disciplines have very different standards as to what constitutes functional status. We consider a physician extremely well versed if he can recognize what the letter ADL means. So if he says somebody can bathe and dress themselves with minimal assistance or independently, we consider that a major breakthrough in modern medicine.

On the other hand, if you would talk to an occupational therapist and say the person can bathe themselves. They would say, what do you mean? There are 18 steps in bathing, 28 steps in dressing. What is the sort of summary judgment they can do with or without assistance? So the level of detail in the subtext varies enormously by the discipline, and we need to at least be sensitive to that.

It also tends to vary to some extent by the particular professional setting in which the people are working. Rehabilitation people tend to emphasize certain things that geriatricians may not when we are talking about older people.

There are other elements to performance besides whether or not you can do it, and how much help it takes. There are elements for example on how well you do it. There are people who can dress themselves, but it's a day long project. When they get dressed, their clothes are askew, the selection of clothes may not necessarily be those that we think are socially acceptable. What they have chosen to wear may entirely be made of velcore.

There are issues about speed of performance. I can bathe myself, but again, it takes a long time to do it. There are instances about the circumstances of performance. I can do it, but it takes a lot of special equipment, or certain kinds of settings in order to be able to do that.

We rarely, in this sort of brief discussion about it means to sort of perform activities about daily living, get into any of these details in any degree. We sort of leap over them all. But if in fact this stuff is now going to start to count for something, then we might want to come back and begin to do it.

We need to think about whether we are talking about what is person's inherent ability to do a task, which epidemiologically may be more much predictive than what is it that they usually do, which may be determined by their environment. A very simple example of that is in nursing homes nobody bathes themselves, not because they can't, but because they are not allowed to.

Now if you were to go in a do a sort of ADL rating in a nursing home, you would come out with maybe we don't use bathing, because everybody is a zero. But in fact if you were looking at functioning now as a predictor, for example in creating a capitation payment system, you are really not interested in it as its own cost. You are interested in what it says about the person's innate ability to do things because it has epidemiological prediction. You would a different kind of a measure.

So separating out what they usually do from what they can do may be very important. On the other hand, if you are trying to do cap planning, then you may want to know what they actually do, because what they do says how much help did they need to be able to do it.

So environment here plays a very real role in functional status. And we don't usually take the time to dissect out what part of it is the person's innate capabilities, and what of it is environmentally determined.

When we talk about functional status measures, we need to recognize that they need to sort of fit the spectrum. If we take for example measures that sort of work within a group which we'll call the aged as a group, it's crude definition, but nonetheless one set of functional measures may well separate the sick from the well.

Most of you are familiar with the so-called health status measures, the SF36. It does a pretty good job of discriminating the well from the sick, but it's not very good at discriminating the sick from the very sick. You can argue the same thing about IADLs versus ADLs, and where those cuts are made. The question again is where do you want to have the most discriminatory power along the spectrum, and what is it that you are choosing?

Certainly, when you are looking at a measure, to try and fit it across the whole panoply of different groups who might be affected, if you just think about what are the factors that affect these different groups, without even thinking about the measurement issues, you know that it would be very hard to use the same measure for kids and for old people.

The underlying premises are just different. Kids are basically interacting with school, and their ability to perform in a school setting. Growth is the norm. Mainstreaming is the goal. If you think about elderly people, they tend to be more fixated on coping. You are looking at decline in function.

If you are looking at adult disabled there is a hard press toward normalization, as opposed to how much compensatory care needs to be given. If you are looking at rehabilitation, the emphasis is on the potential for improvement, whereas if you are looking at long-term care, the emphasis is on trying to slow the rate of decline.

Those are obviously generalizations, but at least to sensitize to the fact that these measures will play different roles, and therefore need to have different psychometric properties, particularly with regard to sensitivity across these things.

If you are thinking about function, I would suggest that you might want to think about the paradigm that we like to use for measurement, which is arthritis. Arthritis is a nice disease to sort of think about how you get different kinds of things for different kinds of measurement properties.

If you think about how rheumatologists approach arthritis, they tend to approach it in terms of acute changes in the disease state. They do things like joint counts or sed(?) rates or whatever they do these days. There are very specific measures which reflect the specific activity at the moment of that disease. How much of a flare up is going on?

That's quite different than the next level of measurement, which really looks at sort of direct joint-related functioning. In arthritis we usually look at things like grip strength or walk time to get at upper and lower extremity function. Those are clearly influenced by the disease activity, but they are also influenced by other factors.

For example, if you have a rheumatoid arthritis patient with joint deformity, even if you got them so that they had no acute activity in their joints, they still wouldn't necessarily have very good grip strength, or very good walk time. The underlying problems that are chronic to that disease will affect some of those early performance measures, those direct performance measures.

If you go up the hierarchy to the next step of thinking about these more comprehensive measures like the AMES(?) or one of these measures that looks at a broader spectrum of the ability to carry on daily activities that are related to joint condition, then in fact all of the prior stuff applies, but in addition you have all the overlay of emotion status. There are good studies that demonstrate this. And social demands, environmental circumstances will greatly affect how well they function at those different levels.

So the choice of measurement needs to be conditioned on understanding the nature of the process that you are measuring, and the factors that are likely to influence that, and which of those factors are remediable or changeable, and which aren't, and what you want to focus on. And again, depending on whether you are doing it for epidemiological purposes or for payment purposes, you may choose differently, which of those things to give particular credence to.

Now if you are thinking about multidimensional measures, then you obviously have a number of questions that you need to get at. One of them is obviously which domains do you want to include in this. There are a number of multidimensional measures out there, and they tap some of the same domains, but different domains. And they tap them differently.

The thing that tends to get less attention is not simply what domains, but what weight to give each of those domains relative to the others. There are some simplistic measures that simply weight them by the number of questions that happen to be asked about that domain. That is obviously not very useful. There are some that try and normalize them and give every domain equal treatment, which is consistent, but necessarily insightful.

People have tried to use weightings based on people's values to say which domains are more important or have the most contribution to the overall functioning of that individual, and which are viewed that way. Now the problem with that is it's a very good idea, but not all the values are the same.

The question is, whose values do you want to use? Do you want to use the values of consumers? People have gone out and tried to do sort of consumer polls of which ones of these are more important. You can use the values of providers or professionals, who are obviously gifted with greater insights than the ordinary lay person about this. You could interview the man on the street. Even though half of them don't know the name of their senator, they at least know what's important to them.

Or you could actually develop a system that allows each of these measures to be weighted by each person who suffers the disease. So instead of having a constant set of weights, you would have an individually adjusted set of weights. Now again, these have advantages in some settings, and disadvantages in others.

If you are trying to make policy, and sort of playing to a level playing field, you wouldn't want to allow people to vary their weights, because that might allow them to manipulate the importance of their disease. But if you are trying to do things that really get at the question of quality evaluation, then in fact if the medical care that people are receiving makes their life better, that's probably the big payoff. Whether it does something on some arbitrary scale is a problem.

There are also problems in terms of what do you use as the basis for weighting the relative importance of these domains? Is it how frequently they occur? We can think about ADLs. There are some ADLs that occur several times a day, and there are some ADLs that occur a couple of times a week.

Would you use frequency as some kind of a weight to set up importance? Or are you looking at the impact on people's lives? And what does that really mean? And how well can people really make those judgments?

Well, we actually went out and tried to measure how different groups look at this. This is a study that we did a few years ago in which we actually went out and asked populations of experts in geriatrics, and then people who actually in this case, suffer from the diseases, but were still cognitively intact enough to tell us about how they felt. We asked them to look at the weights for each of these things, and use the magnitude estimation system.

The take home message here is that the solid line represents the experts judgments. So don't be misled that it looks like they got them in the right order. We ordered them that way so it looks that way. There is an arbitrary order, so we used the experts as the order. Then the dotted line are the clients' ratings.

What you can see is that the clients' ratings and the experts' ratings are different. And the clients tend to give much more weight to things that we call the instrumental activities of daily living, which is sort of at the lower end of the left end of the distribution. The experts tend to give greater weight to things at the more severe ADL level of the distribution.

Now it turns out that if you take these weights and plug them into some of the work that we have done, it makes a difference in terms of judging which treatments are more effective. It doesn't necessary turn everything on its ear, but it makes a fairly sizable difference for some kinds of treatments in terms of where they work if you are looking at a composite score. So this is not a trivial question that we are talking about.

Well, what kind of data would you want to collect? There is a whole range. You have labeled this as functional status and functional health status. Once you use the H word, you get into all sorts of things. There are people who would purport to have various kinds of things. But again, to what extent are you going to focus on simply generic measures? To what extent are you going to focus on condition-specific measures? Do you want to consider things like symptoms, as well as the more psychometric kinds of things?

There is this incredibly frustrating question about how people rate their health on the sort of excellent to poor scale. It turns out to be incredibly predictive of all sorts of things. We keep despairing as to why that happens, but there is something about that. People sort of seem to know when bad things are happening. People have shown it's much more predictive than professionals' predictions about how people are going to do. So one may want to throw in some of those big questions, as well as the fancier ones.

Then pragmatic issues are sort of how are you going to collect this data? Are you going to use surveys? If you are going to use surveys, then obviously you need to think about whether you want to do them on a sample of the population or the total population. And that will depend entirely on for example, how you set up the capitation rate. If you are going to set up a capitation rate, and it is possible to do a subsample of those people, and simply use that as a sort of risk adjustor.

But if your goal really is to identify the high utilizers and provide incentives for HMOs to enroll high utilizers, then that kind of sampling device won't work, if you are really talking about 100 percent sampling or individual sampling on admission and periodically.

The cost will vary a great deal depending on how you collect this data. In at least one operational program we found that you can collect most of this data by mail, with some telephone follow-up, and a very limited about of in person follow-up to the people that don't respond to either mail or telephone. But it's still expensive, and I wouldn't underestimate those kinds of costs.

There are some serious logistical issues for the program that we have been using this in for the social HMO. We originally suggested that the data could be collected by the HMO itself. The government -- at that time there was a particular office with regard to managed care -- for payment purposes there were great concerns about impartiality.

It's sort of interesting, because we now have in place, as you know, a partially functionally-based system for nursing home prospective payment that is collected by the nursing home. When we came to talk about paying the HMOs on the basis of data that they would collect in a systematic fashion, the people at HCFA had all sorts of palpitations and said, no, we couldn't do that, because there is real money involved there, and we couldn't trust them.

So they insisted that we hire a third party data collector who would go out and administer this thing, which they did. But it turned out of course to be very expensive. Then there was the problem of what do you do with the people who the third party data collector can't get? And how do you handle that?

The answer at least in the working situation was that we would simply pay them on the old rate, and not on the new rate. That would be potentially, depending on whether they were more sick or less sick, an advantage or a disadvantage to the HMO, and would create interesting kinds of incentives there for gaming.

Well, my recommendations to you are sort of like Davy Crockett used to say, be sure you're right, and go ahead. You need to sort of clarify what is the purpose for which you want to use this data, and recognize that for some purposes a little error can be a very bad thing.

Josh Weiner and his colleagues did a paper about ten years that just looked at what the national surveys were doing ADL measurement across four national surveys, and showed about a 10 percent variation in projections based on which particular measure you chose to use.

If you were going to do that for something that was real, that was actually going to affect the lives of people, then that might be a pretty big price to pay in terms of how you would do it, but at least you could be if not fair, at least consistent if you could actually mandate how that data was going to be collected across everybody.

The other thing I want to remind you of is sort of to beware of the hidden costs of data collection. I thought it might be helpful to give you a case example that Bill and I concocted to look at this. The one that comes most quickly to mind is HCFA's need to try and set up this capitation system that would be better than the DCG approach.

If we accept the fact that the ADLs and the functional status are useful in the tail of that, the last 10 percent, which is the group that we really care about, then how would we do that? But of course HCFA has basically said no way are they going to underwrite the cost of direct data collection to generate this functional status data.

So the answer is well, then what we would do is we would simply include this information on functioning on every transaction. Just as we use the DCG data now, which is basically the Medicare administrative data set to generate this data, we'll just put these hidden columns that we sorted away for Lotus all this time for functional status, and we'll use that.

Well, let's just stop and think for a minute what would be the implications if we did that? The first question is would we want to use inpatient data or outpatient data or both? Well, what kind of data would you want to record? There is a cost involved in how much data you record. Would you want to do six ADLs and five IADLs? Would you want to record some other items as well? Would you want to have everybody fill out the SF36? That has 36 items in it.

All of those things come at some kind of a cost. Somebody has to collect that information. How would you get that data when they came to put it on the abstracting form. One thing to think about when they record data for inpatient charts, that's a sort of relatively rare event, and they have a whole clerical staff that does that. And what they don't know, they can make up.

But now you are talking about doing outpatient charts. Well, right now that you can't even get the outpatient diagnosis correct. What is the burden that is involved if you were to suddenly start introducing all this data on every outpatient chart? Who would do it? Would you have the clinician sort of stop and fill out a form that would then get transcribed into that? Or would they put it on their hand held computers that everybody will have in the next year or two? How is this actually going to happen?

Would they be required to do it every time they saw a patient? Most people's functional status doesn't change quite that often. How would you make sure that they really asked the questions? Suppose that the busy clinician just sort of said, looks pretty good to me. We'll give them all 1s, which I know doesn't happen in your practices, but has been noted in other parts of the country.

There are even professional case managers who have been known to dry lab this stuff when doing assessments for eligibility. Imagine what would happen if we actually asked clinicians to do it?

The other big question, what's the use of all this data? Do you really want ADL data collected at the time of hospital discharge? Most people discharged from hospitals today are in pretty sad shape. It may not be terribly prognostic of anything if you were trying to set an annual HMO capitation rate. Discharge data is very useful in post-acute care rate setting, but it may not be very useful for setting annual risk data, because people's functional status is going to be very low. So would you even want that?

Then you need to stop and say, what would it cost to collect this data? Well, I did some back of the envelope calculations, and they are admittedly crude. But let's suppose that it costs you 50 cents -- which I just made up, but that seems incredibly cheap for anything. And there would be 11 million Part A claims.

Well, that would cost you roughly $5.5 million dollars to collect this rather question data just to do that. And that if you just used hospital claims, and we already said the hospital is probably the last place we would want to collect functional status. What we said is what we really want is to get people who are out in the community, the people we see in outpatient situations.

So if we look at the Part B claims for doctors and outpatients, there were about a little over 1 billion of those in 1996. So that would only cost you $608 million. That's sort of the same as Sam Rayburn's terms, now we're talking about real money. This is not a trivial kind of thing.

That's just the cost of writing the darn data into the form. Imagine now what would happen if we add the cost of getting the clinicians to collect the data? They would now argue that they need to be paid more for each encounter in order to collect this data. Now some of that wouldn't be altogether bad, because it would be nice to get some of these clinicians to think about functional status, but probably not on every patient. We would like to focus their attention on the ones where it's really relevant to begin to do that. So this is not a recommendation without cost.

So what we have now is sort of questionable data at high cost. Then the question comes up, okay, what would you do with that data? How would you summarize it? The goal here now is to create a risk adjuster based on an annual prediction of how people are going to do, so you want to come up with an annual score for each person.

What would you do with these people who see a bunch of doctors, or have multiple hospitalizations? How do you tally these up? Do you average them? Obviously, it's simplistic, but it doesn't make an awful lot of sense. Would you use the best or the worst score? Well, if they were coming out of the hospital, they've have a bad score. Would you use the most recent? What happens if somebody is really having a transient illness in which they temporarily are disabled, but in fact what you really want is their chronic baseline level that has the epidemiologically predictive data.

There are whole host of questions that we could ask about what are the pragmatics about how we would do this? So as somebody who has long championed functional status, I think functional status is terribly important. And some of these things have enormously powerful predictive value.

But when we get down to the logistics of how we want to collect this data, I think what seemed to be the quick and easy fixes -- the United States government has a very bad history of taking extant data and trying to use it for payment purposes. And it has never worked very well. That's sort of what we do all the time.

It seems to me if we are going to use functional data, and the whole way we got into this was we got the MDS, which was never designed as a payment system. We turned it into that, and now we're trying to take the OASIS, which is sort of more of a desert than an oasis, and we're trying to sort of corrupt that into a payment system.

Margaret has worked hard to try and take a system that wasn't designed to do that, and at least make is something a little bit closer to begin to do that. But basically, DCGs and the ACGs and the DRGs, none of these systems were ever designed to generate payment data. They were all basically based on the firm believe in regression modeling. If you explain a little bit of variance, you have made a contribution. But they are not very good.

The question then is, if we are going to add functional status into the situation, maybe it would be better to collect functional status data in some more direct way. In fact, while it doesn't look like it's as big a direct cost, it may actually turn out to be cheaper. And at least you would have better data than you would have if you simply try and make it an add on to something that comes out of an administrative data set.

So I think the 'it' here needs to be really re-examined pretty carefully. There is one 'it' that is based on what would it take to tack it on, and make another set of columns in the discharge summary reporting or the billing data. Another is do we want to periodically, once a year collect data on a population of people, either all people randomly selected, targeted in one means or another, but get useful data.

And then ideally try and go the other way. Try and use that data proactively, not only to affect payment, but actually to feed that information back into the care system, and actually begin to encourage clinicians to practice a different kind of medicine than what we have been practicing up to now. We are still fixated on this sort of 19 century acute care model, when in fact we are living in a 21st century chronic disease model.

That takes different kinds of data. It takes different kinds of strategies. Maybe we ought to be thinking a little bit differently, sort of out of the box about how to do this, rather than just how to get another payment element into this.

So with that, I'll turn it over to Bill.

DR. IEZZONI: That was great. Can we hold questions until after Bill speaks?

Agenda Item: Overview of Functional Assessment and Health Status: Issue Identification - William Braithwaite, M.D., Ph.D., ASPE

DR. BRAITHWAITE: Well, if you actually decide to collect anything and do all of that, and put it into administrative transactions, I'm here to tell you about what that means, and the process that you would have to go through to get into one of those transactions.

First of all, administrative transactions are things like claims and enrollment eligibility requests and answer, things like that. The only things that seem to be relevant here are the claim and the claims attachment. Those are the two places where information is provided from the provider to the payer for the purpose of getting reimbursed.

There are some problems with that, many of which you have already heard about. The problem with motivation and corruptibility, and a whole bunch of other things. You just have to keep your mind very clearly focused on the function of these administrative transactions. The provide conducts these transactions to get paid. And the payer conducts these transactions to prevent having to pay when it's appropriate, or can otherwise be justified.

So there is this natural tension between the two parties involved in these transactions, which colors virtually everything that goes between them. So you have to keep that in mind as we look at a claim or a claims attachment as a means for conducting the collection of something like functional assessment.

If we sort of just take a look at what the current mechanisms are for getting changes to the way claims information is collected, we find that there is a large number of people involved, and a large number of organizations involved.

If we just look at the claim, there are many different kinds of claims. There are claims for inpatient, and there are claims for outpatient. There are claims for dental and vision, et cetera, et cetera. And each one of those has its own set of standards.

The sets of standards involved are also different depending on whether you are conducting a paper or an electronic transaction. And until recently, the standards that were set for both paper and electronic transactions were set by the payer. Every provider that had to provide a claim or claim attachment transaction for a patient had to figure out what payer this claim transaction was going to go to before filling it out. Because the rules for the standard claim form are different from every payer.

And if you go into a provider's office today, and some of you are providers and maybe haven't looked recently at what your front office has to do to send in a claim, but there are rows and rows and rows of six inch three ring binders that contain what you need to do to fill out a claim for each different payer. The typical provider has to have about 400 of those. Some larger providers have to deal with 1,500 different standards for how you fill out a claim form.

Now most of them look the same, because all of the payers in the country ripped off the HCFA-1500 form or the HCFA UB82 form, depending on whether it's inpatient or outpatient. And they sometimes used exactly the same form. The problem is they put out different rules for each of the fields in that form. Some of the fields are optional fields, to be defined by the payer. So from HCFA's perspective even, there is no definition for that field.

Field 33, well, you go to the particular payer's three ring binder, and you look up field 33, and it's got a set of rules for what you put in that field, for what kind of patient, at what time. The providers, as you can imagine, are spending a great deal of money doing this. And they find it virtually impossible to move to the electronic world, which would in fact save them some money.

Because to buy software that will do this, that has to be maintained and updated every time one of the 400 payers they deal with changes their mind about what goes in field 33, it makes it an impossible task -- not impossible, just very expensive, because you have to buy the software from someone, and you have to pay them $10,000 or $20,000 a year to come into your office every time one of the 400 payers makes a change to update your software.

So all of that led to what is affectionately known as HIPAA, the part at least of the Health Insurance Portability and Accountability Act of 1996 which told the secretary of HHS to define standards for the electronic submission of these administrative transactions, like claims and claims attachments, in a standardized way so that a provider would only do it one way, no matter what payer it was going to, and could then buy software that could do it electronically, saving a great deal of money. And that electronic software would only have to be updated once a year, instead of once every couple of weeks, whenever somebody out there changed their mind.

Now even at that, if you wanted to go to an electronic claim, which is what the number 837 refers to in this diagram, you have to go to the standards organization, the standard developing organization or SDO as it is called, which controls the standard for that particular electronic claim. So you have to find the right committee within the right standard organization that develops the implementation guides that tell the software people who to do this particular electronic transaction.

But that is not enough either, because there are other bodies called data content committees which historically have been maintaining the definitions of the content on the paper claims forms. Even though every payer has their own variation on this, there at least is a committee that kind of looks at the changes in these forms in a general way, and comes up with recommendations.

Those are still active. And they interact with, sometimes in obscure and not very clear ways at the moment with the standard developing organizations that set the electronic standards. If you look at the paper standard and the electronic standard, they are different, for a variety of reasons.

Now what we have done in HHS in response to HIPAA is to not only tell the SDOs that they have to come up with a consistent single set of standards that all payers must use across the country, a single set of data definitions for each of the data items that go into those electronic transactions, but a sort of maintenance mechanism that is consistent, and easily trackable for the public and for everyone, including the members of this committee to know what to do when you want one of these standards changed to meet a need.

So what we have done is come up with -- we, I mean all of the people involved here, not just HHS -- have come up with a memorandum of understanding which is an understanding between us, HHS, and the organizations that I just talked about, the standard development organizations, and the organizations that currently deal with changes in the content of the claim form particularly, to come up with an agreement that all of these groups of people will work together to come up with a common electronic process of being able to submit proposed changes, and get those changes through in a reasonable way.

Each of these organizations -- I'll tell you who they are in a minute -- have agreed to follow open process principles. That is, the public can be involved. They can at least have access to the process, and can make suggestions and comments and so on as part of the process. Timely review will be made, so that it isn't a long number of months or years before some change gets incorporated.

They will cooperate and communicate amongst each other for a change. It's been happening more and more lately, but in the past it was a problem. You could talk to one committee, and nobody else would know what you were doing, and you never knew what the result of the process was and why they either accepted or rejected your suggestion.

They will evaluate the impact of all change requests across the whole industry, not in a very narrow area. They will try and maintain a national perspective on this, not necessarily a regional one, and conform to the other aspects of the law that HIPAA provides.

In addition to HHS, there is the Accredited Standards Committee X12. The names of these committees sometimes are pretty strange, and you have to belong to the committee to know how the name came about and what it really means. But the American National Standards Institute has a set of rules that ANSI accredited standards developing organizations have to follow. So some of them even use the name Accredited Standards Committee as part of their name to indicate that yes, they've gone through this process, and they are accredited to make standards.

There are others like Health Level 7, known as HL7, which are also accredited by ANSI, but don't bother to use that accrediting part of their name.

There are organizations like the American Dental Association, that do accreditation, that do standard setting, that do a whole bunch of things. Sometimes it's hard to separate out the parts of what they do. They set standards for the materials for example, used by dentists when they get in your mouths. But the standards we are talking about are data content standards in the claim process, the administrative processes. So they have come up with a Dental Content Committee. It's a new name for a function that the ADA has been doing for some time.

And of course the pharmacies have their own standard setting organization, the National Council for Prescription Drug Programs.

Then there are data content committees for outpatient work and for inpatient work, the National Uniform Billing Committee and Claims Committee.

So all of these groups have come together and agreed to follow a common process. What's the difference between a content committee and a standards committee, you might ask. Not a whole lot conceptually. They all take suggestions for changes. They go through some process, and they come out with a result at the end.

But historically, they have been quite different in their process, and in their representation. The content committees have a balanced representation of the tension I talked about at the beginning. There is a balanced representation of the providers and the payers, because these committees are typically focused on the claim between those two organizations' intention.

That is very different from the standard organization in general, as you will see. They have opened up their process so that the public can see, and they have come up with a formal appeals process when somebody is disappointed with the result.

The standard developing organization goes through a similar kind of process to maintain the data models, the dictionary, the structure, the syntax of the electronic transactions that they use in electronic data interchange or EDI. They are all at this point accredited by the American National Standards Institute.

But their voting process is quite different. They vote in their committee on the basis on who is there. But a changed to get approved, has to go through a final voting process, where the votes are based on one vote per member in good standing. So it's voted on the number of members, and anybody can become a member. That is different than the balanced voting process of the data content committee, that is based on the balance of payers and providers specifically.

We have also come up with a steering committee, where all of these organizations are represented. The steering committee does oversight and investigations, and provides a body for an appeals process that is consistent across all of these organizations for the process of changes in HIPAA data standards.

The process we have come up with, and all of these organizations have agreed to is that anyone out there can come up with a proposed change. They are going to develop a Website so that people can submit suggested changes. When suggested changes are posted on the Website, they will be authenticated by a Web administrator.

Then each of the organizations that are participating in this has a limited period of time, ten days, to decide whether they are going to address that particular change request or not. Once the organizations identify themselves as collaborating organizations on a particular change, they have got 90 days to complete a business analysis and a preliminary recommendation.

The requester, the person who has submitted the change request can participate in that process and present their case. After this preliminary recommendation, the collaborating organizations have another short period of time to review, and if necessary negotiate the result of that process. Now sometimes the negotiations take the form of conditional statements.

You have to understand as an aside, the data items in one of these standards are typically either required or optional. That is historical. If it's required, every time you send this transaction, it's required. There are a bunch of others that are optional. That's the way it has been for many, many years.

When we started to move into setting unique and standard transactions that all the industry had to follow, this term 'option' sort of got in the way, because optional used to be what you did when you went through a consensus process of coming up with a standard. And people disagreed, as they will. But optional allowed them to say, well, this is optional. That means those people over there can use it if they want, but I don't have to.

Besides, using a standard or not is optional anyway, because most of the payers set their own standards, and set their own ways of doing it. It didn't really matter if the standard changed for something optional. We didn't have to change the way we did business. But now the federal law says that you must use this standard. Then optional had a meaning that didn't have much use, because what do you do with an optional field in a required standard that everybody had to use?

So the move is not complete yet, but it is moving toward changing optional fields to conditional fields. What's the difference? The difference is that with a conditional field it says what the data item is precisely, and hopefully for those things where it's not obvious, some inclusion of how that data item is collected, so that it's done consistently across all the providers, and used consistently across all payers.

But also there is a condition of when you must use this data item. It is a required field under certain conditions. What conditions? Well, it used to be that where there were conditional fields, the condition was what payer you were going to send it to. If this is going to Blue Cross, the conditional says this field is used a certain way. When it's going to HCFA for Medicare, the conditional clause says it's a different way.

We have attempted to pull out who the claim is going to out of this conditional status. So now the conditional says something about the patient, something about the provider, something about a state the patient was seen in. Those are the kinds of things that go into a conditional statement.

So if state law requires that race and ethnicity be included in this transaction, then this is a required field. So that's the gist at least of the changes that this process is going through. It's a significant change for everybody involved. So changing these conditionals is a way that groups that disagree, can come to agreement by making the conditional such that it meets the needs of the people who need it, and doesn't bother the people who don't want it.

As I said, there is a formal appeal process to the steering committee, with two-thirds majority necessary to continue moving forward with the proposed change. And after the general agreement amongst all of these people, then the standard developing organization goes in and makes the appropriate changes to their documentation, their implementation guide so that everybody knows now how to implement this change to the standard.

But even then the proposed changes have to go through a review process, so that every organization that said we want a piece of this action, gets to say yes or no, the proposed change is acceptable to us, and meets the needs proposed in the change request that came in, in the beginning.

All of the significant changes have been presented to this body, the National Committee on Vital and Health Statistics, who can then on their own option, make suggestions and recommendations to the secretary. The secretary then goes through the process of changing the law by changing the regulations to accept this new standard with the changes incorporated.

That's sometimes a long and difficult process. But we are trying to set it up so that it runs as smoothly and as quickly as possible, given the conditions and the public participation, and the checks and balances that we have incorporated into that process. Now this process -- the MOU has not been signed yet. This is the current state of the MOU. Every one of the organizations involved has to sign this before the final rule gets published, hopefully in the next couple of months, that lays this process out. This is kind of a preview of what is involved.

Now just as a reality check, if you decide to get involved in this process, you have to make sure that you understand that the purpose of the standards that we are talking about here is to increase the efficiency of the administrative process. It is to save money. That is the primary goal of everybody that you have to deal with in this standards setting process.

You have to convince them of a business case to change those standards. If you propose to add something to a transaction which is going to cost them money, there has to be some benefit for that, or they are not going to like this idea.

As in any political process, and this is a complicated political process -- in fact, when negotiating the MOU with these organizations, I sort of convinced them that the data content committees were similar to the Senate, and the SDOs, the standard developing organizations, because of their representation, were similar to the House. And that the two of them had to both pass a similar concept, and then negotiate something in the beginning before HHS, who serves the role of the vice president in this analogy, gets to kind of pass it. And then president gets to sign it, where we actually put it into law by adopting a regulation.

None of them seemed to like that analogy, but it's clear that this is a very political process. And as in any political process, you've got to do your homework if you are going to get something through. You've got to make sure that you've got a coalition of people that would support your position. You have to convince the participants involved.

And most of all, you have to be there. You can't sort of put in a change request, and then disappear, and hope to get an answer. Because if you're not there presenting your case at each of the steps, and lobbying for your case, it will disappear in the political process that ensues. As in most successful political processes, the political decision should already be made before you ever enter the process.

Now does this sound hard? Does this sound difficult? Well, yes, but there is some hope. If you propose your change in a way that is clear enough for everybody to understand, and for everyone to understand how it affects them or doesn't affect, which may be the most important clear statement, if you can justify the change from a business perspective, that is, yes, it's going to cost more money to do this and to make the change itself, but that's a one time thing. The ongoing costs are overridden by some value that you can demonstrate from the information that you will be collecting.

If you get agreement from a significant number of payers and providers that are involved in this sort of transaction that it's a good thing to do, or if you can show that it's required by law, that's always a good reason to get them to accept a change. If any of those things can be brought forward, you've got a good chance of getting your request through the process.

It is clear that in order to succeed you need to have clear and unambiguous definitions. Now after Bob talked about his perspective on what you were trying to move forward on, I suspect that that isn't true yet. But you are going to have to find some very clear statement of your data definitions, what conditional statements would be used for the data items you propose to add, and a clear process for collecting that information in a cost effective way in the normal course of business, because these are business decisions that are being made.

You have all had to present a business case to a CEO or even a CFO. That's the kind of people you are talking to here. You've got to talk their language to get it put through.

One of the important things is that if you propose to add something to the claim, you have to show that you have looked at alternatives and found them lacking. Now the claim is held to by these bodies as something quite let's say religious. They find it very difficult to change the dogma involved in a claim.

The claim is the information necessary to adjudicate a payment process. Unless the data item you are trying to get into the claim is involved in, or should be involved in, or will be involved in by law or whatever the adjudication of a majority of claims, the chance of getting them into that standard is much diminished.

On the other hand, there is this other standard called a claims attachment. A claims attachment is that information passed from the provider to the payer, when the payer asks for it typically, because the information in the standard claim is not sufficient for a particular kind of claim, for a particular kind of patient, or sometimes for a particular kind of provider.

Payers for example, can tell a specific provider that they have to attach information to every claim, because we don't trust you, because you have tried to pull the wool over our eyes in the past, or you have been convicted of fraud or whatever in the past.

There are all kinds of conditions under which claim attachments must be provided. One of those could be that there is good reason for a state or a study or a particular data collection item going on, that you convinced the payers and the providers that the providers should collect the data, and the payers should sort of accumulate that data for that other purpose. It would be tough to convince them to do it without some direct connection to the claim, but I can foresee that as a possibility.

Now has anyone ever gotten this through? Has it ever worked? Yes, it has. In the past, the public health realm that we are talking about here has not been well represented on these bodies. But thanks to the effort of particularly Marjorie Greenberg in the room, and some other people, these data content committees have accepted representatives from state and federal public health oriented agencies.

There are now actual members on these committees that take this into account. And there are members from various groups, not so well sort of organized as representatives for obvious reasons. In the data content committees, anybody can sign up to be a member of one of those. But organizations like the CDC and so on, organizations are starting to -- some of them have been for a long time -- but many are now thinking about joining and participating in the standards setting organization.

[Brief computer difficulties.]

The computer has frozen up on me. Fortunately, I handed out the slides. So I just wanted to say that under Marjorie's guidance a Public Health Data Standards Consortium has been founded, and has already succeeded in making some moves in the direction of introducing standards to this process. They have organized themselves, because there is no organized representation within the standards developing organizations, to those organizations, and they have successfully introduced race and ethnicity as a data field to be added to the claim form.

And that hasn't been gone through yet, but the process of sending in a change request to the claims bodies has gone forward, and that will be considered actually in the deliberations of those bodies next month. The consortium has done the necessary political work to go through and build a consortium as I said, and convince people that this is a valuable thing to do.

Now we have to convince the standard developing organizations to accept it. If that happens, then the next version of the standard electronic claim at least, that comes forward will have that data field as a conditional field.

So that's the process. I think we should open both Bob and myself up to questions.

DR. IEZZONI: Agree, if we have the fortitude to stick around and ask questions after those two excellent and reality-based presentations, as was the title of one of your slides.

Thank you. Both of those presentations were excellent, and right what we needed to hear in this first morning of our considering this issue. Are any of the committee members brave enough to take a first question?

DR. NEWACHECK: Those were actually very excellent presentations. I think both of you raised a number of daunting conceptual and logistical concerns and considerations for us. And I think that's very, very helpful.

I have a question for Dr. Kane. I wanted to ask about measures of functional status. In your examples you often referred to ADLs and IADLs as potential measures of functional status. Those measures I think do work very well in many cases for children, but they often do not work very well for children.

I'm wondering if in thinking about measures that have been developed that go across the broad age spectrum, whether you thought much about the WHO's ICIDH system, the International Classification of Functioning and Disability, and whether that would be perhaps a more appropriate direction to move in as we think about functioning across the life-span?

DR. KANE: Again, I think I certainly would not claim any expertise in children. But the first question is why do you want something that goes across the life-span? What is wrong with saying that you want these measures for this group, and those measures for that?

DR. NEWACHECK: It doesn't have to be.

DR. KANE: Again, I think one of the problems is that we have a tendency to sort of start by saying what kind of data we can collect, and then in a sort of time capsule kind of mindset, then hoping that somebody will dig it up some day and find use to put it to.

Maybe the way to start this is to say, okay, what are the two or three most important envisioned uses at least? It doesn't mean that somebody won't come up with a better idea ten years from now. But the pressure to have this committee meet and to do something is driven by some immediate anticipations of uses, which is part of the business proposal that Bill is talking about anyway. So you are going to have to do that, it sounds like to get through this.

So if you start with that, I can't envision a use right now that is going to try and say what did this child become as an old person, so that I need some sort of consistent measure. It's possible that we might get into some kind of cost effectiveness trade off questions that we would want to have the same measures used across the whole thing in order to make certain kinds of policy determinations. But that is not at the top of the list certainly in terms of the potential uses.

The uses that are coming up most actively from an administrative standpoint are going to be rate setting uses, or risk determining uses, in trying to see which are the high risk groups. And then I hope a close third would be trying to improve clinical practice. Now all of those argue to me for much more age-specific measures, rather than generic measures.

DR. NEWACHECK: Yes. And I agree also, I think that we really need to think as a group as to what is our real purpose here. Is it really for risk adjustment? Is it for payment standards? Is it for epidemiological purposes? We haven't really got into that. I guess we will talk about that tomorrow.

DR. IEZZONI: What we had hoped to do for these two days was hear from people who might be using it. I think for both of you, let me disabuse of the following. The language suggested that our committee wants to implement this. What we had envisioned was that we want to explore the placeholder that we have put in, in the core data elements a number of years ago, and see whether there can be a business case for this.

Not just a business case, but also I wanted to ask whether quality of care monitoring was ever something that the standards committee considered as a business case argument, or whether it purely is adjudicating and paying claims, given especially that nowadays under capitated plans, that isn't really an issue.

DR. BRAITHWAITE: Well, the same electronic form is used for documenting encounters as submitting a claim. So I see no reason why the payers and providers wouldn't agree on a cost effective mechanism of improving the quality of care. That certainly is up there on their list; not at the very top, but it's up there on their list. And I think that if there was a cost effective way of doing that, that they would see fit to adopt it.

DR. KANE: Let me just speak to that. I'm all in favor of quality of care. The issue is are we talking about quality as an input measure or an output measure? One could argue that we think we would get better quality if providers pay more attention to functioning, so it's an input measure. Or are we going to try and use the functioning as an outcome for looking at how effective the treatment was?

The problem with the latter is of course that much of the problem in quality and outcomes is incomplete follow-up. If you have a system to measure outcomes that is conditional upon coming back to be seen, then what do you do about the people who don't get back there to be seen? We would have to think about what you would do.

You could sort of have a risk of those who turned up, here is what their outcomes turned out to be. But as usually happens, the people who don't turn up are usually the ones who quality you worry about the most. So then you could say the failure to turn up was already a bad sign of an outcome. But again, that may or may not always be accurate.

It's a question of how are you really willing to trust to an administrative system to make a determination of quality? As you well know, given all the strum and drang we have over risk adjustment at the other end, if we start adding uncertainty onto uncertainty, my sense is that for the number of things on which we are going to be able to use functional status as a quality outcome, to mandate a national system that would collect that data on every encounter of whatever type, might be a big price to pay.

DR. STARFIELD: I think it's fair to say that in the generation of the core data elements, this committee was not necessarily thinking only of payment. But really I think -- I wasn't in on the beginning of it, but maybe it evolved -- it was really designed as a way of thinking about how do you collect information on individuals for population purposes.

And that gets at your quality purpose, but not necessarily as a quality point of view, but as a needs assessment point of view. If you were thinking about population users, that is say pursuant to the goals for the year 2000 in terms of documenting systematic differences across population subgroups for the purpose of eliminating them, would you think differently about the kind of measure you would use, and I'm sure thinking about it and collecting it for a clinical purpose?

DR. KANE: Yes, I think I would. Again, we need to be clear. When we talk about population bases, we're talking I think about epidemiological sampling that would relate to shared group characteristics, ethnicity, income. On that basis, it seems to me one doesn't need 100 percent data. One can certainly work with samples. And there you may want to trade off the cost and the value of the data collected by targeted efforts, as opposed to routine data collection.

As I suspect is likely, although I can't quote you a study to show it, but there is a variation in how good the data is from different kinds of providers, not just by profession, but by location. Free clinics provide different kind of data, and then sort of Park Avenue specialists.

To what extent are we running the risk of generating bad data, biased data by virtue of the fact that the people we may care the most about don't even appear on the radar screen, because they don't turn up to be seen at all. If we really want to know the functional status of people, it's probably the functional status of people who are underserved.

Now those of us who believe in atrogenesis(?) may want to know about the people who are overserved, but at least for large parts of the population underserved is still perceived to be a significant problem. An administrative system will basically write them off. They really will become the unknown participants in this. So strategies will dictate what kinds of information and lessons we can derive from that data.

DR. STARFIELD: Do you want to comment on the WHO system, or does anybody want to comment on it? Paul asked about it.

DR. KANE: I don't have enough expertise to be able to give you an answer.

DR. IEZZONI: We'll be hearing a little bit more this afternoon. I think Gretchen will be speaking to the ICIDH today and tomorrow. I think Blue Cross of using that. You participated with them in a study to use that. So there is an example that Gretchen will be talking about of a payer that actually did collect that information.

But that's a point, because one of the things that the standards folks look at also is the code sets for capturing this information. And certain ICIDH offers a code set for capturing functional status information. I think it's safe to say that the training that would be involved to teach clinicians -- and I use that term broadly -- to apply it in the United States would dwarf the training that will be required to change them over to ICD-10-CM.

MR. HITCHCOCK: More than 50 cents a case.

DR. IEZZONI: Yes.

DR. LOLLAR: [Remarks off mike.]

DR. IEZZONI: Yes, that's very good. Don, actually do you have any questions while Bill is here? Because I know that you have an idea about the 1500 form.

DR. LOLLAR: I will share that this afternoon.

DR. IEZZONI: Do other committee members have questions for Bob or Bill, or are we too daunted by your excellent presentations?

Paul Placek has copies of the ICIDH. Great.

Anybody around the table or around the room have questions for Bill and Bob? Yes, Kathy?

MS. COLTIN: I guess either one of you could answer this. But Bill, you had talked about the focus being on adjudication and payment, clearly, from the viewpoint of the standards organizations. I know that some of the data content committees have taken a little broader look, and have recognized quality and other types of business needs as appropriate for including data elements.

In looking at some of the downside of using these kinds of data elements for payment purposes, I have to agree with an awful lot of what was said. But I did think that there were some important points made about the value of these data elements for targeting populations for case management, for disease management programs, for special services, and so forth.

And I wondered if you could comment more about whether you have had any experience with that, and two, whether that would be a type of use that might be considered appropriate from a business standpoint?

DR. KANE: Do you want me to talk about the first, and you can talk about the second? The answer is yes, we have had experience in doing that, and it's very mixed. It really does require changing the mind set of most of the people who are out there in terms of how they do their business.

There are sort of two things that have to happen simultaneously. It sort of depends on which businesses are represented in making these decisions. Most of you are aware of this sort of paradox that we have right now in managed care. Suppose you had a managed care organization that really knew how to give good care to the chronically ill. This is a hypothetical.

DR. IEZZONI: In Minnesota.

DR. KANE: I'll take the best and still tell you we know how to give cheap care, but good care is different. But the second problem is suppose they actually became good at it? And suppose people got to know about it? What would happen? Of course all the people with tough problems would come to them. So right now you've got a managed care system that if anything, is making money by essentially getting paid more than it cost them to give the care by basically having a favorable case mix selection.

Now all of the sudden the system turns on them. There is a strong disincentive to become good at this. So the business case is not very tricky here, let me tell you. The only case where we have been able to put this into play was in the second generation of the social HMOs, where we actually created a risk adjusted payment system that was individualized risk adjustment for every enrollee. We actually set up this system.

We went out in this case and took as our model the Medicare Current Beneficiary Survey, estimated what subsequent utilization would be using the full array of data elements in that. And then developed an annual survey that looked a lot like the Medicare Current Beneficiary Survey that we administered to every enrollee of this social HMO.

Now the first thing that happened is HCFA, the Office of Managed Care said you can't have the providers doing this. We said, well, it's really important to have the providers doing it, because we wanted to incorporate this not only into setting the rates, but this would also be part of a management information system we designed, and would then get fed back to target send out trigger messages as to who needed attention for case management, primary care and the whole nine yards.

Well, of course they said you can't do that. So then we had to go out and hire a third party. HCFA went out and hired a third party data collector, which you can imagine was beaucoup bucks to go out and collect this data that we had previously collected the first year from the provider.

Then we actually designed a system, and took this data, wrote the computer programs that actually set off the triggers, sent out notifications, set up a tracer system, and then used repeated measures to look at progress over time. This was to get your outcome measure component in there.

So the answer is, yes, you can do it. Now it was worth it to this organization for two things. Number one, to some extent they were getting paid more the sicker people were, because it was a case mix adjusted system. But at least they didn't feel like they were getting penalized, and so you could sell it. But you change condition two, you don't have to be a super economist to write that business plan. It just ain't gonna fly.

Now if you did it the other way and said, yes, we want to collect this data, because we want to do a better case mix adjustment system, and this will allow it, then you might turn out to have some allies among the managed care provider groups. And you could use the same argument for individual providers, that you would use better case mix adjustment to give them fairer payment. Everybody knows that every provider had the toughest cases.

DR. LOLLAR: Did I understand you correctly that when you did this study, the provider was the person who completed the Medicare beneficiary-type survey?

DR. KANE: No, the provider collected the data in the sense that they -- but that only lasted for a year until we hired this third party vendor.

DR. LOLLAR: Have you looked back to look at the power of the various types of data that you collected there in terms of predicting outcomes?

DR. KANE: Oh, yes. I can send you several articles on that if you are interested.

DR. LOLLAR: Because even if you assume the professional does collect the functional status data, you do have all of the issues of what is most powerful, what's the most prevalent, et cetera.

DR. KANE: Let me be clear. We collected more than function status. We did the whole nine yards.

DR. LOLLAR: I understand.

DR. IEZZONI: Can I just ask, because that's interesting that you had the patients completing the information. And in fact one of the things we had talked about was the fact that patient reports are much more valid, if you will, than provider reports. Is OMB one of the memorandum of understanding signatories?

DR. BRAITHWAITE: No.

DR. IEZZONI: How would that work if somebody were to propose a data element that really does people to report the information? Would OMB be involved in that?

DR. BRAITHWAITE: OMB gets to review and approve every rule that we put forward, that we propose. And if we propose a rule that has that as a requirement in it, because the industry has decided that its standards should include that, they might well want to have something to say about that. We have not yet negotiated with OMB exactly how that process is going to work. So I can't tell you for sure.

MS. COLTIN: But this would certainly not be precedent setting. Many of the data elements on the administrative transactions are in fact reported by patients.

DR. IEZZONI: But there is something qualitatively different.

DR. KANE: But this does raise an interesting question. The reason that we particularly wanted patient report was that we were using as our predictive model the HCFA Medicare Current Beneficiary Survey, which basically is a survey of people, of Medicare beneficiaries. And this gets you into the question of perceived health versus real health if we had a health-o-meter to measure it.

If the data that you have is calibrated against perceived health, in a sense what your real health is may be less accurate in predicting the utilization from the data set that you are using as a modeling device. So again, you have to sort of think about in what context you are really trying to do this. Are you using these as sort of real data or simply as risk indicators that may or may not have other factors in them?

The individual's self-perceived health has always been one of those sort of sloppy measures. We never really can tell you that it measures health, but it may measure the sort of foreboding of impending doom that somebody has an inner sense for based on some change in some totally as yet undetected biochemical metabolic event in the brain. Whatever it does, it works pretty darn well.

So you can sort of use a sloppy measure that has strong predictive powers, even if you don't fully understand what it is, or even if it turns out not to be as accurate as something that you could measure better, but is less predictive.

DR. STARFIELD: Self-perceived health, the excellent, very good, good, fair, poor, you mentioned that that has relatively good predictability. I think most of the studies are on adults. I actually can't think of one in kids. But has anyone had --

DR. KANE: Actually, the Lewises did a study that actually showed it working in kids.

DR. STARFIELD: That's good. We should know about that. Is there any advantage to making a 10 point scale or a 12 point scale? Do you gain anything if you just expand it, do you know?

DR. KANE: I would love to be committee X22 that could debate whether a 13 point scale versus a 14 point scale -- I mean I think we can get caught up in the minuitea of sort the 13.5 point scale versus the 13 point scale. I frankly think that there are such bigger ticket items that this group had got to grapple with.

What is the parent measure that's used, Margaret? Not the IDS. The group that runs that sort of went around and they convened the council of trends to decide whether a 5 point scale was different from a 7 point scale, and wise people came in and testified. It turns out that most of the psychometricians will tell you that it doesn't make a heck of a lot of difference which ones of those you use.

We tend to focus on the small stuff, when it's really these big issues that are staring us in the face. They are the ones that we can't measure easily, so we tend to get caught up in the others. I would just try and steer away from that kind of stuff.

I would even go so far as to argue that probably when push came to shove, the choice of specifically which measure you chose is going to be less important than the ones in which you have data. Either you are looking at something in which you are willing to invest probably ten years to collect a new round of data based on new measures yet to be introduced, or you're going to go with extant measures that may be less terrific than the best ones you could think of, but for which there is no data now that norms them.

In a number of cases health services research is always doomed to sort of use what's there, because those are the things that we have correlations. But there is a pragmatism to doing that.

DR. IEZZONI: One final question. Does anybody else have any other questions? One of the potential outcomes of our process that I have articulated before was not a recommendation that we go forward with this, but a recommendation that it be studied further through a demonstration project.

So my question to you, Bob, is are there any projects that have already happened that you think could serve as -- have already happened, and so we don't need to recreate the wheel. But could give us that kind of demonstration project type of information that would be practical, that could give us a sense of how much this might cost, how accurate the information might be?

DR. KANE: Well, I would certainly look at the Social HMO-2, SHMO-2.

DR. IEZZONI: But we want to look at across an entire population, not at the high end.

DR. KANE: The other problem with that is that it's not done as part of a routine administrative data collection. It is a separate survey. I don't know any group that is currently collecting that data as an add on to every visit.

There were small studies that were done, again, not quite administrative. You know this literature better than I do, using the SF36 in office practices, and trying to get people to routinely collect that. Again, it was all self-selected, practices that did it. So what would happen if you tried to get it in every practice?

Every one of the examples will have its limitations, but I certainly think it would be worth looking at that. Now the upshot of that was a lot of that data didn't make any difference when it was done in the people who were volunteered to do it, and were presumably at least more motivated. So one would be loathe to argue that in the hands of the ordinary practice it would be more powerful.

But I think a review of that literature would certainly be a good part of the work of this committee, to see what we do know about that.

DR. IEZZONI: Okay, thank you very much. We've given us a lot of food for thought. We will certainly take under advisement everything that you have warned us. Just let me warn the people this afternoon, it would be great if Bob and Bill's presentation touched on issues that you think are relevant to your positions, if you could just highlight them. Because I think that we've heard certain points of view this morning that are extremely compelling, and it would be great to hear kind of counter-arguments this afternoon.

So we have a whole hour for lunch. Those of you who know the penthouse at the Hubert Humphrey Building know that that is way too long. So could we try to reconvene at 12:45 p.m., because I would like to take at least a 15 minute break this afternoon.

[Whereupon, the meeting was recessed for lunch at 12:00 p.m., to reconvene at 12:45 p.m.]


A F T E R N O O N S E S S I O N (12:47 p.m.)

DR. IEZZONI: I think what I'd like to do is just start. I was asked before the break if our committee could just hear a comment that one of the audience members had.

MS. HARAHAN(?): My name is Mary Harahan and I am with ASPE. The discussions this morning remind me of differences of purpose. And I have a purpose in being here, and I'm sure everyone else who has come also has a purpose. Those purposes, it strikes me, are not always the same. So I guess I wanted to do two things.

One is I just wanted to illustrate this by saying I came here, because I have an interest in working with international organizations from a government and policy point of view, to try to get enough comparability in data to understand and measure disability trends for the purpose of defining policy interventions, from pension policy to health policy. So that's one reason I'm here.

The second reason I'm here is because in working with HCFA and Carolyn over our various and sundry agreeable and less agreeable positions around all these post-acute care prospective payment systems, it's certainly clear to lots of us that we are requiring the collection of an awful lot of functional data through the OASIS, through the MDS, hopefully through the FIMs, if we ever organization ourselves around rehabilitation.

And all of those systems are mandated, or will be mandated for payment purposes, although they weren't. They didn't start out being for that purpose. They all measure functional status differently, although I doubt that there is real, compelling reason why that has happened, other than history.

The consequence is that you can't track people across a group of settings, where a lot of us think there are a lot of commonalities among the patients who are discharged from the hospitals into these settings, and where we don't think we'll ever get a handle either on cost effectiveness or quality if we can't track patients across settings.

So now those purposes are incredibly different. And I am certainly aware that there are many other differences. People are interested in clinical intervene. What kinds of functional status measures do you need for clinical purposes? We talked a lot about cost. We talked some this morning about outcomes.

My only long-winded comment is that it might be useful for this committee to weigh in, in trying to clarify and distinguish those purposes, all of which are legitimate. And what says to the world about this issue of trying to collect functional status measures, and tries to help us understand whether we can really move toward common functional status measures that serve everybody's purpose.

DR. IEZZONI: Okay, Mary, thank you. That's an interesting additional charge to the committee, which I think we will have to deal with, the issue of diversity of purposes. It has come through really loud and clear as a major issue for us.

With that, I would like to start the afternoon session. We have a couple of speakers who are not newcomers to this issue, but have thought about it long and hard, Don Lollar and Nancy Whitelaw. And Don, I think you are scheduled to speak first, or do you want to switch?

Agenda Item: Functional Assessment and Health Status: Lessons Learned - Donald Lollar, Ed.D., CDC

DR. LOLLAR: After this morning I cannot help but remember the beloved Charles Schultz Snoopy cartoon where Lucy and Schroeder and Charlie Brown are all laying out on the hillside, looking up at the clouds. And Lucy says, "These clouds are just gorgeous. What do you see, Schroeder?"

Schroeder's the pianist. He says, "I see a Beethoven symptom all in the clouds, just right there." He says, "What do you see, Lucy?"

She said, "I see a map of the world, with all the major rivers and mountain ranges and oceans. It's just incredible. What do you see, Charlie Brown?"

Charlie Brown says, "Well, I was going to say a duckie and a horsie, but after those --"

What I feel that I have to add is extremely simple and straightforward.

DR. IEZZONI: We relish that.

DR. LOLLAR: Before I begin to address the questions, and by the way, I failed. Since I have been with government four years, I must tell you that I have gone toward using Power Point. But I chose not to do that today. This is just talking. So this is back to my psychology roots.

Before I begin to address the questions, I speak as a public health professional today, representing CDC's Disability and Health Branch. So public health is a part of what I'm going to talk about. But I will start by saying if the first 50 years of the 20th century was devoted to mortality classification and measure, and the last 50 years of the century were focused on morbidity, then the next at least few years ought to be spent on disability.

Before I came to work at CDC I spent 25 years in clinical practice in rehabilitation psychology, working in community mental health, working in rehab hospitals, as well as private practice. So my comments and observations include both those perspectives as a clinician and a public health scientist.

So I'm just going to answer the questions, kind of in my own unimitatible way. Is that all right?

DR. IEZZONI: Yes.

DR. LOLLAR: The first question asked the types of information on functional assessment and health status that our organization -- and I need to add here that I speak only for the Disability and Health Branch of CDC, not for NCHS, not for the Global Health, or anyone else, if that needed even saying. Where does this type of data fit? Is the collection a priority? What types of data should ideally be collected?

Our branch has been particularly interested in functional assessment of person level activities. Our unit's mission is to improve the health of people already with a disabling condition, not the more traditional public health primary prevention of conditions that has been the way that CDC has primarily worked.

If primary prevention of the specific condition is not one's aim, but rather improving the health of people who already may exhibit or have some kind of health problem, the diagnostic category is of less utility. As we said, even if it is reported, and often times in administrative data sets it is not reported.

In addition, it's clear that diagnosis alone doesn't adequately represent a person's condition individually, and aggregate data from the group with the same diagnosis likewise does little to provide us better information.

Therefore, the type of functional limitation with which we are most interested is of two types, person level activities, and societal participation activities, if you want to use that term 'function' in a broad way. This is an extremely high priority for our branch. The resources the last three years have been, and are now, and will continue to be spent in developing measurements for at least these two types of functional information.

In addition, the impact of the environment on functioning we have already heard about this morning as a confounding variable in looking at functional status at both the personal activity and societal participation level cannot be overlooked. Our office funds projects developing instruments to assess environmental factors, as well as the participation and activity limitations factors.

Specifically, we are interested in assessing personal activities related to those eight areas that you see listed: learning; communication; simple movements, which is the term often used synonymously with functional limitations as in reaching, grasping, bending, stooping, et cetera; mobility or moving around; personal care; routine tasks, which are often called IADLs; behavior and sensory tasks; as well as societal participation in areas such as autonomous functioning, travel, work, school, community activities, leisure.

Environmental factors generally look at physical, attitudinal, and policy issues that may affect someone's functioning.

Now while the societal participation and environmental factors is crucial for numerous purposes, the collection of this data clearly should include the individual's perception, if not totally; at least a major focus. This makes administrative data collection difficult, if not impossible.

On the other hand, the personal activity status has been evaluated by health care professionals for the past 50 years or so, and information on activity limitations which are found, could add substantially to our body of knowledge if they are reported systematically and reliably.

We have already heard this morning that since physicians don't seem to be able to do that, there is a question whether any data collection ought to be completed, as best I can tell from this morning's presentation. I think while it's true, certainly the speakers this morning were not saying we shouldn't do it. It's just that we have to understand the limitations, which is what we need to remember as we go forward in this whole process.

The information provided can give data which can be used for several public health purposes, including elaborating the impact of diagnosis on person level activities across diagnostic categories, assessing the needs by activity limitation, use of services, and even looking at cost.

In addition, quality assurance could be improved by relating functional status or activity limitations, which is the negative part of what I would call functional status, to receipt of appropriate services and procedures. It also could be used to look at HEDIS performance measures, to look at for example the percentage of individuals with severe mobility limitations who receive seating evaluations, or the incidence of skin sores, or other kinds of things that could be allowed to focus, that we don't do so much of now.

It is our belief that this information will open the way to significant improvement in understanding a person's function and needs. I will add that one of the things Dr. Starfield talked about this morning is the elimination of disparities. The Healthy People 2010 document that is to be kicked off starting this evening in the next four days includes a disability status as one of the defining variables.

It does not break down the functional differences in many of those data sets, but in fact if that were achievable in clinical records, it would add substantially to our understanding of those issues and the disparities that we believe most certainly do exist from a public health standpoint.

Question two, does our organization currently collect data related to functional assessment and health status? What types, and if it includes special populations such as children? In addition, are different measures used for specific groups, and how are they selected and tested?

Because we use already existing data sets, and there isn't data, as best we can tell, from that perspective, what we have done is used the perspective of activity limitations as the basis for analyzing the health interview survey. We're just completing an article, and you will notice the question one, learning communication, that's the categories into which we broke the information, using about 42 different activity limitation questions from HISD, Phase 1.

This includes ages 5 through adults. It suggests that 19.3 percent of people in the U.S. report at least on activity limitation ranging from 12 percent for children 5-17, up to 52 percent for persons over 65. While this is survey data, the analysis has convinced us that administrative data can be generated using this perspective.

Now I'll talk about it a little bit more later, but I think this is where we talk about some more enclosed or circumscribed collection of data. While this is survey data -- in another analysis just completed by our staff, the new MEPS, Medical Expenditure Panel Survey data just came out about two weeks ago. And also using -- is it appropriate in government to use the word 'gerry-rigged'? -- gerry-rigged definitions of activity limitations across those. We got six of the eight categories that I mentioned previously.

We just kind of did as best we could, beginning to look at that. We have $5,500 per year expended on people with activity limitations that report an activity limitation; $1,200 expended on those without activity limitations. Now what this suggests is just probably a lot more questions. But the point is that functional status, something is going on. And it is a way to frame data, and it is a way to frame I think some of the questions that we need to look at.

From our perspective, we would be interested in knowing if that $5,500 for example is focused on the primary disabling condition, or in fact if some of that is related to secondary medical or other conditions that in fact are preventable. So that that $5,500 could be reduced with appropriate public health intervention.

On the survey I was referring to, and different questions are used for children and adults to get to that differentiation, clinical experience almost certainly indicates the need for different instruments to assess functional status and personal activities. But in fact those are routinely seen as parallel measures, not strange concoctions of different kinds of things.

You may look at ADLs different in children and adults, but we all know what those activities are, and how we frame them from a developmental standpoint or whatever, it seems to me that's not an insurmountable challenge to deal with the differences.

And as I said, there are already numerous standardized instruments basically around the different professional disciplines. In communication issues you're going to see a lot of speech and language instruments. In mobility you're going to see the physical therapist involved with a lot of routine activities, and you're going to have occupational therapy. Psychology is involved with learning and behavior.

It's not as if those instruments from a clinical standpoint aren't available. And it's not as if because they are clinical instruments, it seems to me that they should be -- it could be understood that that data gleaned from those instruments can in some ways be standardized and coded for the public health good, as well as for numerous clinical purposes.

How the organization, number three, decides what data should be collected, and if that process was used to decide on functional assessment? In addition, are there other existing data sources?

Based on our mission to promote health and prevent secondary conditions among people with disabilities, functioning and limitations of personal activities is the most plausible way to collection information about people who share common characteristics. That is, across diagnostic categories again.

Once you move from involved in rehab or acute care -- diagnosis is the primary way of collecting information. Once you move out of the rehab setting and into the rest of the real world, there are many more commonalities than there are differences among folks with varied diagnostic categories. We know that.

Whether you have a mobility problem because of spinal cord injury or multiple sclerosis or spinal bifida, the issues of pressure sores, et cetera that are common across those things have in common not only the pressure sore, but in fact the mobility problem, as opposed to just looking at the diagnostic category.

The heart of the matter is that ICD codes address common etiology, but that if we are going to assess characteristics of people, we want to know something more about what is going to contribute to their health and well being. We believe that the knowledge of the limitations and personal activities will provide the most useful additional information to understand the individual, as well as aggregated information describing health needs and possible resources required, which is another use for the information.

The data analysis from already existing administrative data is sparse. We have talked to the Rehabilitation Services Administration, but they don't look at functional status per se in their evaluations. It is in the individual physician- or provider-based evaluation at some level. And there are some questions to be answered on the basis of that, but that information systematically is not asked for.

Social Security doesn't officially collect functional data, but the listing in Social Security certainly are imbued with functional status information as a core. But again, it's not collected in that way.

Also, there is a mixing of constructs. One of the things that we tried to separate is, is the participation in societal activities like work, like travel, et cetera, and to separate those from the things that happen at the person level. Can I move around? Can I communicate? Can I learn, et cetera?

Also the issues of environmental barriers are much more involved once you move toward the social participation. If you don't have a ramp, that's a whole different level of environmental barrier than if you don't have a wheelchair, for example.

I think it is possible at the person level to measure -- well, it's not only possible, it's done all the time -- to look at a person's performance of activities moving around without the assistance and with the assistance. That's not a difficult thing usually to do either. But it also indicates whether or not the environment is being supportive of the needs at the person level. And we can collect that information administratively.

In terms of assessment, I have come down in this presentation on the side of using the professional, and looking at activity limitations, and using the professional as the way to do that from an administrative standpoint, because my notion is that probably the most expeditious way to address this is through the HCFA-1500, through the patient encounter forms, and as a field in those forms.

Certainly, one could get that data even for inclusion on the 1500 from the person themselves. It could be done just as ethnicity and other things are gleaned from patient questionnaires. So it's not impossible to do that. And of course as we heard this morning, that brings problems, whether you get it from the practitioner who is evaluating or the person.

Quality of care and quality assurance, question four, are crucial. But our outcomes are -- we're going to look not only at medical outcomes and physical outcomes, but we are also concerned about well being as the World Health Organization talks about it. That includes family issues, and social and emotional concerns. And also participation in work and community activities.

Standardization of the framework is important first. I think purposes may be different, but if you are going to address functional limitations, or activity limitations, there has to be some kind of common framework. We can't continue to revel in our diversity, and that's what we have done. We just love it, how much everybody can do whatever they want to do. But in fact ICD has some flexibility, but darn it, there are some things that say this is the way we need to do it. And that may require a lot of work, but we really need to move in that direction, it seems to me.

So standardization of the framework is important, and I think that will help then with the standardization of terms and case definitions. Then you start moving toward instruments and standard of instruments. Maybe there are differences in instrumentation, whether it for children or adults. There may be some broader flexibility there.

And also there needs to be some notion of standardization of methods, which we got into this morning with the ICD codes.

Also the specificity that was related this morning. If you are talking about just saying you have a learning problem, and that's all you need for the purposes of public health. And that may be where differences come in. You may only need a broad definition at one level for some purposes.

For other purposes, if you are talking about clinical changes, you may need a three level of specificity -- a much more detailed level of specificity. What kind of learning problem? Is it a memory problem? Is it an association problem? Is it problem-solving? Is it generalization? The point is that's reasonable to look at, and again, it's not an overwhelming challenge to try to work on that.

Severity codes is also another problematic issue. It's something to be dealt with. But I think all of these things can be achieved by working with professional groups -- the OTs, PTs, nurses, psychologists, physicians -- to look at standard instruments.

Number five, logistical impediments. I have kind of started with that already, but of course there are going to be logistical impediments. If it was easy, it would have already been done. It wouldn't take subcommittees on populations. There are going to be numerous problems even assuming standard framework definitions and instruments.

As we said this morning, the training of professionals in coding, activity, limitation, severity, and assistance is going to be enormous. But I figure out that with my clinical experience I have completed about 35,000 HCFA forms in the course of my 25 years, it's probably more than that, but I was trying to be gentle. Those pediatricians in here have done a gazillion more.

This is not rocket science, friends. I know it requires some classification and work. Of course you can't write tongue in cheek, but even at a two digit level, the number of codes to learn is well within the cognitive abilities of health professionals. So we can learn this stuff.

It may take more time. It may take more money. But the question is, is health going to be improved? That's the bottom line. It is a cost issue. I don't have any questions about that, but if we lose track of the issue of the major value being the improved health of Americans, we're in big trouble.

My advice or recommendations. Do we recommend the addition of a field in any inpatient encounter forms, including allowing the inclusion of up to two activity limitations, including severity with and without assistance related to the diagnosis and the procedure codes during the encounter?

It seems to me that we could have that conditional field that folks were talking about that are for certain conditions, that are for certain primary diagnoses, or for certain CPT codes, it's required that you include the activity limitation if you wanted to start there. I know that Canada this year has already begun in its rehab settings, to collect functional status routinely in patient encounters. We're not even leading the field in that endeavor.

In addition to our contacts with professional organizations, we have also been in contact with Kaiser Permanente of Northern California about a pilot project to assess the potential for inclusion of activity limitation codes. We do recommend the acceptance of the ICIDH to the classification of functioning and disability that comes out of World Health, who is also the conservator of ICD. Marjorie Greenberg knows all about that.

And I would challenge the subcommittee to be willing to go beyond the problems to look seriously at how to achieve this inclusion. We certainly have to start with the professional and scientific issues. As we do that, I'm assuming we can then begin to address the political issues that were talked about this.

Our future plans include the development of tools to measure activity limitations, societal participation, and environmental barriers in surveys and on clinical records. We hope this information will be used to characterize health conditions and provide the framework for assessing health outcomes. I do see the functional status as more an input variable than an outcome variable.

We will continue to work with the United States DISTAT Group to work with more consistency across countries, and we are already doing that through Paul Placek and Gerry Hendershot's DISTAT project with the U.N. and four or five other countries around survey data. But we really want to try to make sure that there is again, some congruence here. So we'll continue to work with NCHS and HCFA, professional organizations, and insurers to insure the continuity of data within the U.S.

We understand that moving toward inclusion on patient encounter forms is a major challenge. But we are just as clear that the current level of information doesn't do justice to the needs and health of people with limitations. Therefore, we need to find other options, and I would hope that this subcommittee will provide that leadership to meet the challenge.

Thank you.

DR. IEZZONI: Tom, thank you for your presentation. Why don't we hold questions.

DR. STARFIELD: I have just a clarifying question. You mentioned that in your MEPS study you used six of the eight categories on page two. How many questions were used?

DR. LOLLAR: I think there were about 30. Dr. John Huff in our group did that analysis, and I walked off and left his preliminary stuff. I almost called him and said fax it to me, but I thought, no, they won't ask. I can get that for you later this afternoon. No, I can't, because people in Atlanta have the ice storm, and so nobody is in the office. Maybe tomorrow morning. I think there were about 30 or 35 questions there.

DR. IEZZONI: Okay. Nancy, thank you for coming. Can you just introduce yourself briefly, and let us hear your comments?

Agenda Item: Functional Assessment and Health Status: Lessons Learned - Nancy Whitelaw, Ph.D., The National Council on the Aging

DR. WHITELAW: My name is Nancy Whitelaw, and I am here I think primarily for work that I did, that I am no longer doing, which may be part of the story to tell. I was for nine years at the Henry Ford Health System in Detroit, which is a large vertically integrated system. There is a fact sheet -- not 35 of them, but some number of them -- about that organization.

We served about 800,000 people in southeast Michigan. We have a 500,000 member not-for-profit HMO, including a Medicare risk contract there. We operate a couple of nursing homes. We have a large home care agency. The usual sort of mix of programs and services in large integrated health systems.

When I was there my official title was the associate director of the center for health systems studies, which was a health services research center, where I oversaw demonstration programs related to the improvement of care for older people. I also, over the years that I was there, became increasingly involved in strategic planning, and in the operational side of the organization. And helped shape the development of our service package around the Medicare risk program, as well as other kinds of initiatives for the care of older adults.

So it's sort of with that perspective that I come here today. When I originally agreed to do this for October, I was just making the transition out of Henry Ford and into the National Council on the Aging. I have now, although I still spend 20 percent of my time on research projects at Ford, and keep an office there, I am very heavily involved in NCOA at this time.

As a consequence, I apologize if I did not write out more narrative here. But I am hoping to sort of guide you through some bulleted comments that I thought I could make, and then allow you to ask me what you want. This really is a lessons learned. I'm not bringing a lot of empirical evidence.

I guess in addition to what I have done at Henry Ford and in Henry Ford, is being a member of an array of sort of national committees investigating these issues. It turns out that when you actually work in a health system, there are not a lot of us I guess who think about these things at some sort of abstract level. And you get to serve in lots of different environments.

I have worked with NCQA and HCFA on the Health of Seniors measure, which I have serious reservations, but nonetheless I continue to work with those groups. I was on the executive board of the National Chronic Care Consortium, which I would trust most of you would know about. I just finished work with the Institute of Medicine on a HCFA contract they had on the definition of serious and complex medical conditions. I'm on Pete Fox's HMO Work Group on Care Management, and there have been miscellaneous other things that I think give me some perspective about what other health delivery and health plan organizations are doing around the country.

My primary personal experience related to functional assessment deals with assessing seniors and older adults. I oversaw a Johnny Hartford funded initiative at the Henry Ford Health System called Complementary Geriatric Generalist Practice. And there is sort of a glossy little handout manual on how do nurse practitioner/physician teams for geriatric care that came out of that program.

And in that kind of an initiative it's very clinical driven, and functional assessment were used primarily by the nurse practitioners in developing and implementing care plans. They were not protocol or guideline driven, although they did have some general guidelines, but as is done most of the time, individual practitioners made individual judgments about which assessments were needed and for what purposes to help guide clinical care.

The total assessments were available in the chart, although that is not always the case. That takes a whole getting the chart reviewed kind of process, and there is a lot of fuss about what goes in a chart. Notes on that were dictated in our electronic medical record to supplement what was put in the chart.

Because that was also a research project, we did measure functional status with a mail survey to about 1,000 frail elderly in both a treatment and a comparison group, and did that at baseline and 12 months later.

As part of our Medicare risk program -- we've had it at Henry Ford and Health Alliance Plan for about 13 or 14 years, but it was kind of dormant until about four years ago when we decided to be pretty aggressive. And at that time a number of us got together and sort of working on what the benefits and the model of care would be, et cetera.

We contracted with Geriatric Health Systems in San Francisco, Peter Udidia's(?) group, to do a baseline assessment of new enrollees, and then to provide us with the data. We got the raw data from the assessment, which included some functional status items, and it looks like basically your standard survey of older people. And he also had a way of sort of organizing it into kind of a one page summary sheet that was provided to clinicians.

We did that primarily for screening of high risk individuals at time of enrollment, but we also used that for strategic planning, and for the development of some specific program initiatives like a program for people with memory loss, or training clinicians on urinary incontinence or something.

Over time we replaced that with the PRA, the University of Minnesota sort of Chad Bolt tool that we could administer by phone, and get the results much more quickly, although the are much more limited. At that point it was really primarily for high risk screening.

Also during this time I was asked to co-chair a systemwide committee on how to create a real continuum of care in a sort of functionally integrated health system. And we organized that around six teams. One of the teams focused on the issue of data and documentation across the continuum. I resonated a lot to what Mary said when she was talking earlier, and I'll kind of revisit that.

But suffice it to say that in a place like Henry Ford, which I think is probably at sort of the upper end of organization around America's health care, there is no organized strategy for standardizing data collection of any kind really across the organization, much less the collection of functional status information.

I have listed under III experiences with which I was less involved, but kind of represent the array of ways in which functional status information might get collected in a place like Henry Ford. Certainly our home care agency, and I never can say that without quoting when I spoke with our vice president of home health, and we have a pretty large home care agency, and I talked with him several months ago, sort of getting his latest updates about the impact of BBA on his organization, which as been pretty severe.

But one of the things he pointed out to me at that time was that the BBA generated a million extra sheets of paper for his home care agency. I think that it's easy to say a little functional assessment here and there, and who will notice, a million sheets of paper is kind of a lot of paper that people who get paid quite well, and whose skills are supposed to be devoted around the improvement of health outcomes for their patients are spending an inordinate amount of time filling out paperwork, with almost nobody paying any attention to the quality of that, or the standardization of it, let alone its utility.

Data collected in rehab and our nursing homes for clinical, regulatory, and outcomes purposes, the regulatory and/or accreditation bodies that influence all these different parts of the health system rarely converse with one another about what they would like to have gathered. As a consequence, when you are in the delivery side, this looks like a crazy place.

We are filling out repeated forms on the same individual, essentially going after the same kinds of information, but with the minor variation here, there, and everywhere, because whatever you filled out to discharge a person from a hospital can't possibly be the same form that gets them admitted to a nursing home or a home care program. And this is just unbelievably absurd when you are actually out trying to deliver care.

We have a variety of very focused sort of researcher demonstration or clinical improvement efforts. Orthopedics does pre- and post- functional assessments around joint replacement. We've got a surgical low back pain program, asthma, diabetes. And there are many more. This is just to highlight the tremendous variety, all of these people picking their own tools based upon the particular needs of their program, which is not a bad reason to pick a tool, but it does make it very difficult to reach any kind of standardization.

Sort of addressing not very specifically, but kind of generally some of the questions that were posed in the letter I received. No, there is no organization-wide approach, and I doubt there is in any delivery system in this country. Kathy could also speak to that, but certainly the ones I'm acquainted with, there would be no standard approach.

It is so tumultuous out there. We are acquiring and dropping hospitals, medical groups, home care almost daily, hospice in, hospice out. We can barely standardize like the payroll system or the benefits system, let alone the data collection system. We have different financial accounting systems in our multiple hospitals that we're trying to standardize.

The things that are kind of really fundamental to our survival, to try to do this across all these organizations, and you just get started, and then you close one down and you acquire a different one. This is not a static world out there in the delivery system. It makes it very hard to do these kinds of things.

The selection of instruments is largely done, as I said before, by each project or each director or each administrator based on whatever kind of issue they are bringing to the table at the moment.

What are the issues that we have confronted? Well, I think there are certainly issues around pragmatics and methods, but I put them second, because I think that much more fundamental are the issues around the value of doing this. And to recognize that the gathering of data, and it's sort of my last bullet under value, but I'll say it first, the gathering of the data, as expensive and difficult as that is, is probably the cheapest part of this venture.

Using the data in any meaningful way is incredibly expense. Relative to our baseline assessments of enrollees in our Medicare risk program, we ultimately abandoned the GHS and have all but abandoned the PRA. Last year we lost $60 million. We are due to take about a $200 million financial hit due to the balanced budget amendment over a five year people of time and Medicaid managed care.

Over the nine years that I was at Henry Ford, we built about 12 programs of sort of model care for older people in a variety of different settings and sectors, including case management, team-based geriatrics, and inpatient geriatric service, a PACE program, et cetera. There are two of those left. We have closed them all down in the last two years due to the financial losses of the organization.

One of the things that we had was primary care-based case management, where the data that was being gathered through the baseline assessment was given to the case managers, who were then expected through some miracle, to know something useful to do with it. One of the things we discovered along that line is that you go and just hire case managers and send out functional assessments to them, and they like know what to do.

It takes a tremendous amount of training and supervision, and ongoing management of them to do something meaningful with functional assessment data. I think it's highly unlikely that the vast majority of the nation's physicians would know anything useful really to do with functional assessment data. And it would take a great deal of effort on our part I think to make that happen.

So from the point of view of trying to do something that is helpful at the clinical level, and that wouldn't be the only reason to do this, but in terms of either strategic planning or clinical delivery in the nation's health or health plans I think it is important to recognize how expensive, and how culturally transforming it would be out there to get these kind of data, to really drive the care process.

I will say that another sort of area that I care a great deal about is the development of the continuum of care. And I think that particularly those patients, members, people who we think would most benefit from good functional assessment are people who are most likely to use multiple sites of care. And as a consequence, thinking about the way these data play out across all settings of care, and what their use would be, and how they correspond across the settings of care, I think is really critical.

One of the issues that I am sort of challenging the Health of Seniors team on is what is the documented evidence about the relationship between medical care -- and I will say at the moment sort of reimbursable medical care -- and functional outcomes? There are a lot of factors, particularly around older people, that contribute to their functional abilities and functional deficits.

And some fraction of that, I think a not very well known fraction, some fraction has to do with medical care, and with the things that we can hold doctors and nurses accountable for. But a pretty large fraction of it has to do with all kinds of other things.

And my concern with the Health of Seniors measure, and it would be my concern around any functional status measure that is really going to be used to judge the quality of care, is that we have a much clearer idea than we do presently about exactly what aspects of function the medical care system per se should be held accountable for. And I don't think it's all aspects of function unless we are really going to change the nature of delivery.

I have said before I think providers are really ill-equipped to address functional status issues. There are conflicting measures required by outside agencies and purchasers. General Motors feels total freedom to demand that Health Alliance Plan, and therefore the Ford Medical Group collect whatever information they want, regardless of what they want or what anybody else may want. They are a purchaser. They can make these same demands, and they do.

The difficult and perhaps the impossibility of using a single instrument for strategic planning, risk adjustment, screening, care plan development, performance measurement, et cetera. Again, I think in the Health of Seniors measure there is a hope that the primary purpose of this may be to sort of audit the performance of health plans. That it could also be used to develop a risk adjustment system, and it could also be used for health plans to do quality improvement work.

I think there is almost no likelihood that it will be used by health plans to do quality improvement work. It simply isn't structured in that kind of way. We don't even get to know who the people are that are being interviewed, and have not any ability to relate those data to the utilization information that we keep ourselves.

So I think we should not kid ourselves that it's really easy to develop functional assessment measures that will just be broadly useful. That needs to be thought through.

Some more points in the disconnect between data gather for external audit uses and internal quality improvement. Attention to functional status information and the gathering of functional status data are unrewarded and unreimbursed. And that's why I think you can certainly add the digits to a standard HCFA form, but what drives people to fill that form out is payment.

And what drives them to do it pretty accurately is payment. And we know for example in the Kaiser system where for years payment didn't have anything to do with the completion of these forms, that they didn't fill them out, and they weren't all that accurate. And now that they are actually using these forms do to all kinds of strategic planning and moving towards a payment mechanism, they are getting much more assertive about that.

But I think that basically what gets paperwork filled out with any accuracy at all is the potential for reimbursement. And I don't think that has been connected in with the functional status piece.

Issues related to method. Non-response has certainly been an issue for us, certainly a time one, but a longitudinal measurement for the populations we are most concerned about. People are frequently lost to follow-up either due to changing settings, or increasing disability, or mortality.

And I basically disagree with the Health of Seniors measurement strategy to rate people who die during the process as having the lowest functional status at time two. I think it sort of misses the point, and Joan Lund(?) among others I think would be appalled. This is like the whole quality of death. But that is currently the measurement strategy for the Health of Seniors measure. So people who die will have the lowest functional status scores at time two measurement.

Timing relative to the intervention. I think that another concern I have with the Health of Seniors measure is that measuring a random group of people over a two year time frame, or any kind of time frame, I don't think we have any reason to believe that health is anything that by and large functional status measures I think should be linked to specific interventions, or specific time frames of deficit or something.

Which means that it's very hard to do. If somebody has got a joint replacement, you do it at this time and at this time, but if they are diabetic, then you might do it at this time and some other time. And if they are entering a nursing home for rehab, you might do it at this time and that time. And you can't just do this at two year intervals and expect I think to get powerful information.

Training and consistency across data collected in clinical settings would take a phenomenal amount of work. And it would not be at all standardized to begin with. We discovered that in people who are just working with us every day on projects, and interacting with one another. And it drifts over time, and you sort of have to pull it back.

Obviously it's very difficult around cognitively impaired populations. The letter raised the issue of proxy respondents. I did my doctoral dissertation on the measurement of health status. One of the issues I was looking at was proxy and non-proxy data. That was a number of years, but I presume it's the same now. But basically, proxies seemed to rate people as sicker than people rate themselves, to the extent that you can kind of know that.

The measures that are sensitive to different parts of disability, and the causes of disability, and the way those change over time. And then certainly issues around confidentiality.

So all of the sort of nine years of wallowing leads me to make five suggestions here. One is I don't think the delivery system world is ready for functional assessment measurement where there is one solution that fits all, where 'all' refers to all uses, and all settings, and all populations, and all methods.

I would recommend that for the purpose of population audit or quality assurance we focus first on the most vulnerable populations, and not just do something on a kind of a mass basis; persons who would be most likely to benefit from attention to functional needs, versus people who would be vulnerable. And this I think is sort of a key point, persons for whom functional assessment data will add value beyond more readily available outcome data.

If a change score in function is highly correlated with a change in diagnosis or something that we already gather, I know it would be lovely to have, but I don't think we really added a valuable step here. And I don't think we've got the research base that tells us when the functional status data really tells us something new that we wouldn't have gotten without it, and for which people, and under which circumstances.

Persons for whom function-related interventions are a covered a benefit. And then to stick with the limited set of relevant and sensitive measures.

For individual care planning and treatment purposes I think there we need to focus on assessment in the larger stream of a broad care management strategy. And this is what we talked about in the Institute of Medicine report that I presume you can get, and I brought one copy of. But basically, assessment without an approach to case finding and screening and the conversion of assessment to a care plan, and the conversion of a care plan to a set of interventions, and a monitoring of those interventions is purposeless assessment.

We do way more assessment than we have the energy and the resources and the knowledge to actually drive through to an improved quality outcome for people. In that sense, the clinical level, assessment is just one of about seven steps, and all seven steps need to be in place for assessment to have value.

We need to train providers to use functional assessment data. I have been the PI on a geriatric interdisciplinary team training program at Henry Ford for the last four years, where we have been trying to train clinicians to work as interdisciplinary teams. This is no easy thing to do. You don't just take three or four disciplines and declare that you are getting interdisciplinary work done.

In the use of the functional assessments by these interdisciplinary teams and their ability to really bring this kind of information to bear on their clinical practice takes a lot of time.

Developing evidence-based guidelines and protocols that document the link between specific assessment and intervention and outcome. Selecting measures that are clinically relevant and sensitive to change over a known period of time.

I would recommend that we develop an approach to functional assessment that spans the continuum of care. I think that's where the greatest need is at present. And I guess I would recommend that if this work is going to go forward, that the integrated delivery systems across the nation be heavily involved in it.

And involved to the extent that they are pre-testing and working with these data, not only to find out the value of the data, but to better understand how these processes actually get implemented in these organizations. So that before it is mandated on a nation-wide basis, we have some understanding about what will actually happen to it when it is put into practice.

DR. IEZZONI: Nancy, thank you. That was very, very informative. And, Don, thank you too.

Committee members, any questions for Dr. Whitelaw and Dr. Lollar?

MR. HANDLER: First, Dr. Lollar, you mentioned that you would like to have functional assessment information collected using the HCFA-1500 form or a patient encounter form. Now the current population survey conducted by the Census Bureau each month collects information in a standardized way on employment and unemployment. It's required that that be done. Now they do have supplements on a month-by-month basis where questions are added, questions are dropped off. The same household is seen on a recurring basis. There are some new households going into the program, older ones dropping out, that type of thing.

Would you think that possibly using a HCFA-1500 form limiting functional assessment questions to only certain types of functional assessments, not the whole range of everything; there is just too much out there. But let's say for one year certain functional assessments could be identified on a HCFA form for one year. And you hone in on certain things in that one year. And then when the next year comes you don't use those forms. You use a different set of forms for the next year. Would you be agreeable to that?

DR. LOLLAR: That is a certain kind of data. I mean the survey methodology -- I'm not a survey methodologist.

MR. HANDLER: No, this wouldn't be a survey. This would be using a HCFA-1500 form.

DR. LOLLAR: Oh, you're talking about using the HCFA-1500 form?

MR. HANDLER: Yes, but only for certain types of disabilities, and do that for a whole year. And then after that year goes by, go to some other kinds of disabilities, just to limit what you are looking at. There is just too much out there to get all at once is what I'm thinking.

DR. IEZZONI: Could I just interpret there? That's a practical question. I think before we kind of talk about mixing and matching from year-to-year, we have to kind of figure out what we would do in year one. We need to think very strongly about that.

Paul, you had a question for us?

DR. NEWACHECK: Thanks to both of you for your presentations. I had a couple of big questions that I wanted to ask, and I think I kind of get a sense of the answers from your presentations, but I want to ask anyway, to help us on the committee. As Lisa pointed out, this is a new initiative of the National Committee on Vital and Health Statistics, and we have set aside time for meetings in the future. But we are obviously hearing some concerns, as well as some positive statements about this enterprise.

I'm wondering if the two of you could comment on whether you think this initiative to consider adding functional status as an item to the administrative data systems is a worthwhile effort. That is, is this a good investment of our time? And also if so, should the focus be in thinking about the purpose of that administrative data for payment systems, purposes, that is risk adjustment or payment levels? Should it be for quality improvement? Should it be for statistical epidemiologic purposes, like for example Barbara was indicating reporting on 2010 objectives? Or some combination of those things? I know that's a big set of questions.

DR. WHITELAW: Well, I probably have fairly strong reservations. I guess that won't surprise you from this. I should say as a caveat, I have been working and for and on behalf of older people since 1971, so obviously issues of functional status are sort of my life's work here.

But what is important to me, and I think what my career has been about is how to help well meaning and solid providers do a better job of serving populations. That's not the only reason to exist in the world. It's just what I've been doing.

So I come at this from the perspective about is there anything about what might happen here that would help providers and older people get better outcomes in that sort of diadic or team-based approach. I think that attention to functional status could lead to better outcomes, but it will not get there by a mandatory requirement of the coding in an administrative database, because they don't have the tools to use the information effectively, let alone I might say at the moment, the time and/or the other disciplines around them.

We just laid off our ambulatory social workers. Our physicians are doing 10 and 15 minute visits. All that stuff you read is true. And they do not have the skills to do this. They don't have the time to do it. I don't think they should do it. I think other disciplines are better able to do much of this.

You take something like medications in which we know have a big contribution to functional status, and you can't get reimbursed for having a pharmacist review the medications for people on six or eight or nine or more drugs. If you're not going to get reimbursed for that, it's not going to happen.

And so I mean kind of my challenge is if HCFA and large purchasers want things done, they need to figure out ways to help it get paid for. So I would urge looking at that delivery system issue. And again, I would urge looking at either through the HMO Research Network, the Chronic Care Consortium, a lot of other ways in which one can access a group of providers who I think would be willing to work with you on trying to solve this, but it would be much more incremental.

DR. LOLLAR: I think the medical care reimbursement system in this country is evidently under sufficient siege that this little subcommittee on populations is probably not going to address all those questions. But it feels like that all of those questions are getting into the mix of whether or not you even look at this.

And certainly that is down the road of what you have to do. But in fact, this is not an idea that's been tried and found wanting, it seems to me. The notion of trying to look at the value of adding functional status is something that really needs to be explored, I believe, from a professional and scientific standpoint, realizing the delicacy of the cost, et cetera.

I do think it's a longer-term issue. If you assume those two bits of data that I suggested, on the one hand if you said well there is only 20 percent of the folks at best that have an activity limitation, so why the heck do you want to include it for everybody? Well, maybe it needs to be conditioned on specific kinds of diagnoses or whatever.

But in fact there are folks in the medical field who spend their time or much of their time -- podiatry, et cetera, spend a lot of their time dealing with these issues. They don't code them in that way, but they are dealing with them. I do think that if you required functional status to be included if you have certain diagnoses or certain procedures on that HCFA form, I would bet you $50 you would get it; if you didn't include it, you wouldn't get paid.

You can used a hammered-looking carrot. I guess all I'm saying is just to bring this issue, to begin to look at it, it's way too soon to off-handed, because of the problems, whatever they are, to not go farther and really explore those. I think there is too much potential fruitfulness in that.

And again, we're just a dinky little part of CDC, but we're going to put money into this. I know the big folks may not want to do that, but darn it, if it's important, we are going to do what we can. We would love to have other folks participate in the pilot, in the struggles. It's not easy stuff, any more than trying to make the health care system in this country work. But it is something that is worth doing, it seems to me. I would encourage the committee to think hard before you decide its not worth even pursuing.

DR. IEZZONI: Again, remembering what I said at the beginning, that we might decide that it's not worth going forward with. And we might decide tomorrow that we -- we might moderate our work plan for this.

Can I just ask one quick question before we move to the next panel, and that is what is the biggest barrier to having a conceptual model of the data that should be collected across the continuum of care? I mean it's obvious that HCFA has these little fiefdoms that are doing things in different ways. Is there a conceptual barrier, or is it a fiefdom type of barrier?

DR. WHITELAW: When we had a subcommittee that worked on this inside Ford, which is full of its own fiefdoms, I think we could have overcome the conceptual issues. I mean the subcommittee actually had representatives from rehab and the health plan and the nursing homes and everybody else, and basically came up with a recommendation of one, a standard format; and that it was sort of set of progressive forms where you had a kind of standard database on all people that would be in this targeted population group.

And then there would be other forms based upon diagnostic or treatment or other pieces, but they would have a common look to them. I think the message that was persuasive with us was that almost every form that is created is designed to be helpful to the people that are filling it out, and not to the next stage people who may read it.

And we were able to get a fair amount of consensus about the need for change when we could get both sides talking to each other. So that as soon as the hospital discharge planners got a better understanding about what people who were going to be receiving this person needed to read, they were better able to think about modifying their discharge forms because they would see both sides.

But I think that's the kind of work that needs to be done. And to understand what are the care issues that all these different settings and people are involved in, and to build a system that has some core standardization to it, and then has pieces added on through a kind of approval process.

DR. IEZZONI: Well, thank you. We have learned a lot from each of you. And we very much thank you for coming. Don, I guess you'll be here tomorrow.

DR. WHITELAW: I am two blocks away, if there is any reason to need me.

DR. IEZZONI: Nancy, you've been extremely helpful. You've been great. Thank you. So Don, we'll see you tomorrow.

The next panel, also we're really looking forward to hearing from you. Could I ask that you try to limit your comments to 15 minutes so we can have some time for questions and then we'll take a break.

Gretchen, introduce yourself.

Agenda Item: Functional Assessment: Risk Adjustment and Rehabilitation Focus - Gretchen Swanson, D.P.T., M.P.H., Western University of Health Sciences

DR. SWANSON: Hi, I'm Gretchen Swanson. Done had said when spoke earlier, Don Lollar, that he had been in the government for four years, and made some remark about just becoming Power Point dependent. I haven't been in the government, and I have become Power Point dependent. So I'm thrilled that you have this. You have a handout that reflects each of the slides.

I believe that I have been asked to participate in these hearings because of one part of the discussion that has emerged. And that is the use of function in the adjudication process. That's an experience that I bring to the table, as it were.

I spoke to the larger group committee in 1993, I believe.

DR. IEZZONI: I think it 1996 or 1997.

DR. SWANSON: No, and then I spoke in 1996 or 1997. This is the third time I have chatted about this topic. Some of the experiences haven't changed. Some of the experiences have remained the same. So hopefully, they will prove worthwhile.

The way I see this situation, and sort of to introduce my perspective on this is that I think our health care system is a health care system, and that we need health data. The bottom line with that is that we are using correlate data. And for maybe 10 percent of the population that receives services from our health care system, this becomes problematic.

In my view of the world, functional data is a good source of a person's health status. So this is sort of a basic idea set, which I think obviously would be challenged in the room, but I think that's the kind of thing that should be made explicit in a discussion like this.

How I have gotten here, or why I keep coming back here maybe is a better sort of point is that in starting out in consulting and rehabilitation services in 1983, rehab programs were becoming in vogue. They become popularized. Trauma brain injury programs, for example, became quite popular in the eighties, and depending upon what was being reimbursed. So programs would be packaged, and a delivery system, as was just described, would evolve, because there was a stream of dollars to support it.

And then as we got into the nineties, rehab programs were not as apparently effective as one thought, and so they were being questioned, and Dr. Stineman will discuss that.

But in the discovery of why a person would be a candidate for a rehabilitation program it was clear that the person receiving the services, their needs were second or third to what was driving the delivery system. And there is a fair amount of support of that. And I think that brings back sort of the idea of function, and where you get that functional data from. The patient becomes a huge source of this information, an incredible source of this information.

But based on what I want to talk about is a part of the adjudication process is based on justification, is the service warranted for this condition set, whatever that condition set is? And the processes that have evolved from the eighties and now into this new century many times produce obstacles to care, as many of us who are on the provider side know, to the very things that we hope to accomplish.

So there are internal measures between the health plan and the providers themselves that are problematic, and we need to address that. And that's some of what the last speaker spoke of.

There is a concept that is used throughout the industry called medical necessity or treatment is medically necessary that I think is very important to bring up in this discussion, because it's come through a lot of traditions. Once a patient, a person, a real person is in a provider environment, and the provider would like to do something that is sort of outside the norm, that provider is required to substantiate that this request is medically necessary.

And that medical necessity issue becomes a part of our mind set as to what then is justifiable at a policy or a larger level. So I want to spend a little bit of time on that, because I think a functional perspective may or should or is altering that idea of medical necessity.

Finally, sort of why people keep asking about these data sets that have function in them, or look at some outcome derivative is that the clinical reasoning process that we are very comfortable with, and like to support in our health care system many times is about justifying the treatment, and not really reflecting on the effective of the treatment, and therefore the results.

So that idea of the cycling of information that the last speaker spoke of is critical to making use of functional data if you should request it. That's quite essential. Functional data can be a part of the candidacy process. A person is a candidate for care. But it also in my belief should be a part of the results process. That's where the feedback into the value of the information allows us evolve as a health care system.

So the way I am interpreting this day and a half of discussions is to make a connection between function and risk, and I'm going to also talk about medical necessity.

So some concepts that I think are helpful in relating this idea of a functional state and risk, these are some of them. There is a definition by Donald Patrick that I find quite useful, and that's where health is defined as "a relationship between current and future function."

It's hard to really grasp this idea of health and a person's health status. It's more than the absence of illness. And that's many subcommittee meetings on what health is. But there is a lot in the literature to support the idea that function is an excellent measure to determine a person's health status.

And the reason I include Patrick's definition is that what I think function does is add a temporal variable into the data set that I don't believe that we have when we look at administrative data sets. We have illness variables. We have utilization variables. But function is a variable, and I'll show an example of it in just a couple of minutes, where you can look at the future or look at potential in a way that we may not have traditionally done from a medical illness perspective. So I just want to put that out there as an idea for us as a group.

There are also some functional state concepts that are very useful from a risk point of view. One of efficacy and prevention. If we really are going to move to a secondary prevention model of health care in the United States, and we really have to have a way to reduce the potential for future loss of function. And particularly in this 10 or 15 percent of the population that uses most of our health care dollars, they are most vulnerable to loss of function.

So clearly we need data to allow us to screen those people in properly, manage them properly, and then evaluate the results. So efficacy is a critical component of managing risk within the health care delivery system.

Another concept related to risk is optimizing activity. This may relate more to the ideas of healthy aging. Many of you represent senior programs here. Or the restructuring of rehabilitation programs as they have existed in the United States.

The idea is to look at the person from a positive point of view, who has had a variety of health conditions, multiple variables that may be causing them to consider a life-style that requires ongoing caregiver support or institutionalization or that sort of thing. So there are demonstration projects that look at optimizing activity. Again, this allows for reducing future risk.

And finally, participation, which is at the societal level, which may not inherently be a part of the discussion for this group, but I think is a part of a larger social context and reason for risk control within the health care system. So those are three concepts that I think relate function to risk.

Medical necessity, and many of you are probably quite familiar with it and sort of breath it, is a way to justify treatment. The term has implied that given a specific medical condition, a certain type of treatment is warranted. And so in the electronic claims processing business it allows for a clean claims process to go through if you have the right codes in place, ICD codes and CPT codes. And the world is good when that all works in the right direction.

But I think the term is under renovation, and I don't know that there is a collective group that is discussing it. I think it's just been under change out in the field, out in the hinterland. Like where I come from in California, we are just sort of modeling it as we go. That may be something that this subcommittee would like to provide some input on.

But I believe that the term is now moving towards this justification, medical necessity, to this idea that there should be an improvement expected as a result of care. It's not simply because we have always done it this way. It's because we actually expect a result. And if that is in fact the case, then we need a way to monitor that expected improvement. And that's what I think is missing within our current administrative data set.

Risk is managed in many places. This is not a comprehensive list. There are two major places that risk is managed: at the policy level, where coverage decisions are made in a global way, big picture way; and then at the local level, where those policies are interpreted, which many of you are familiar with.

What I would like to do is focus on that local level, because that is where functional data has been used in the adjudication process, and the retrospective process to control risk. We can call it case management. And sort of retrospective or peer review would be occurring at the local level.

And it is a place where functional data or expected outcome is being monitored. An example based on a gentleman this morning was talking about what would have to occur to get a new variable on a form. That we could put a new variable on a claims form, or we could also consider a claims attachment form.

And I think that that is something for the committee to look at for a possible source, because currently Medicare uses for the UB-92, a claims attachment form that has lots of functional data that comes in on it. It's called the 700 claims attachment form. Some of you may know of it. This is an excellent tool that could benefit by having a standardized nomenclature, and would reduce the cost, as well as the risk at the local level.

I have run a couple of focus groups with payer organizations in the last two and a half years. And overwhelmingly they would like functional data access. They would like it though, with certain conditions attached to it. This is what they have asked for: it needs to be applicable to both prevention and rehabilitation, mostly a secondary preventive way.

And it should be federally agreed upon. I didn't ask them, because I didn't know I was going to be coming to this group, but really they are looking for a group to say this is the way functional data ought to occur. There should be some federal mandated control over it, not necessarily federally mandated to exist, but that the shape of it, the nature of it should be agreed upon in a national way.

And this variable can be applicable across payer type. So if the person goes from a commercial carrier to workers compensation to a federal program, that these variables consistent. That now we are not into different types of measures and criteria.

And then it can be used both in adjudication and quality of care. So the access to, as well as the evaluation of the service rendered.

Some examples of how data might look, and these were examples that I have provided to payer focus groups for them to look at the utility. Just to give you a sense of some specifics, some of the feel of it, I'm familiar with ICIDH, so I'm using that as an example, but we could discuss other alternatives. But if you have someone with hypertension that is difficult to control, I believe in the current data set we would get an a single ICD indicator for this.

But if we looked at it from a functional point of view, a functional perspective for this person, we might want to add on additional information that could tell us one, the difficulty in managing this person; add to this notion of medical necessity, which is now simply based on the ICD; and third, there is a number after the digit in this ICIDH code, a .2, .8, which is a qualifier that looks at the level of impairment, or level of limitation that this functional variable has, and optimally would change over time with intervention.

So that rather than out of this rich sort of phrase of what's going on, this person would get a single item. You could actually get multiple items that have a tremendous utility. It actually reduces the amount of time spent on document, because it provides, at least from an ICIDH point of view, a conforming language for people to report.

I took a couple that I thought might be of general interest. Here is a diabetic condition. Again, you could have one of these ICDs, but then you would have the related functional derivatives that would cause this person, again in the small part of the overall population, but a high cost patient, to be properly managed.

Another example, this is in the elderly population. This is a nursing home type of condition where again, a single ICD with multiple functional variables help describe the nature and the quality of the care that ought to be driven.

I think it was a bit much for me to suggest steps given that level of discussion we had. But I do think that pay off is important for everyone in the functional pictures: the providers, who have to collect the data; to the patients, who should be the recipients of service. And that should be a part of the presentation when the value and the business plan of adding a functional variable is put forward.

DR. IEZZONI: Gretchen, thank you. That was very helpful. It was nice to actually see ICIDH codes, and get a sense of what they say, what they look like.

Jinnet Fowles, welcome. We're glad to have you from Minnesota. So can you just introduce yourself?

Agenda Item: Functional Assessment: Risk Adjustment and Rehabilitation Focus - Jinnet Fowles, Ph.D., HealthSystem, Minnesota

DR. FOWLES: My name is Jinnet Fowles. I was actually here speaking to the subcommittee before, not too long ago, on race, and that was easier. Some people are Power Point dependent. I'm FedEx dependent, and no FedEx package, there you go. But there will be, I'm sure. I have 35 packets of information for you, so you don't have to take notes on what I'm going to say.

I'm a health services researcher and manage a research organization that is affiliated with a vertically integrated delivery system. In shorthand you can think of us as Henry Ford Lite; same kind of activity, but smaller in scale, with no long-term care, no nursing home facility.

The story I'd like to tell you this afternoon about functional status measurement at HealthSystem, Minnesota has two separate threads. There is a quality improvement thread, and a risk adjustment for financial or payment thread. The first I'm going to be talking just as a reporter, and the second is my own research experience.

I'll note at the outset that I'm not limiting my comments to pure functional status as I understand it, but including HealthSystem, Minnesota's broader experience with health status measurement in general, because I think that the lessons that we learned may be instructive to the general problem that you are considering.

First, I'd like to review the history of HealthSystem, Minnesota's involvement in collecting health status information. The interest in routinely collecting these data sprang from individual clinical departments within the care delivery system partly in response to the outcomes movement galvanized by Paul Elwood in 1988.

Our experience routinely collecting health status information began with the orthopedics department. This department began collecting patient evaluated health status measures on patients undergoing total hip arthroplasty for two reasons, because they wanted help in their daily practice with individual patients, and they wanted to be able to compare their patient outcomes with other people's patient outcomes.

The orthopedics department's relatively isolated interest in quality improvement initiatives related to these data was reinforced by a purchaser coalition. The Buyers Health Care Action Group is a purchasing coalition in the Twin Cities. The original contract that that purchasing coalition had in 1993 called for a demonstration of improved outcomes.

So building on the individual experience of orthopedics, three different participating orthopedic groups agreed to cooperate in a pilot demonstration comparing outcomes of total hip arthroplasty. The pilot was not constructed as a research study. I didn't have anything to do with it, but as feasibility demonstration under general practice conditions.

The pilot was a failure for a number of reasons that may be instructive for understanding what can happen when routine data collection of this nature is laid on top of a non-academic practice. First, the project was undertaken without any consultation regarding necessary sample sizes for any potential question, so that after five years of data collection, there were insufficient numbers of complete cases to allow any kind of comparison to be made.

The primary problem, apart from just not understanding the patient volume, was the problem of missing data. It was endemic. In spite of a negotiated agreement to complete common items and forms, height and weight were missing more than three-quarters of the time for one site. For those of you who are familiar with orthopedics problems, it's a big confounding issue.

In another site the entire pre-operative form was missing from almost one-fifth of the patients. The one site with a continuous on-site project manager monitoring data collection had complete data. It speaks to the kind of effort that is required for complete data that is not part of the routine practice of care.

The third problem was that there was a variation in clinical practice related to follow-up visits, which led to a great variation in the intervals of the follow-up measures. The absence of a research question at the beginning led to forbidding long forms for physicians to complete, to say nothing of what they asked from the patients. In spite of the forms' lengths however, there was no assurance that the appropriate confounding variables had been measured.

And then as a final blow, three years into the study one site decided to modify its forms, leading to non-comparable data.

Lessons drawn by the purchasers who had paid for this demonstration were that it was premature to routinely request and pay for this type of information in spite of the logical appeal going into the process.

I can only echo the things that Dr. Whitelaw had to say about the complexities of trying to collect this information in the current health care system. The leadership of our institution in the early nineties was absolutely committed to outcomes management. We set up a separate department within the institute to foster the development of outcomes measurement, and catered to individual departments that had demonstrated an interest.

We were fortunate to have unrestricted grants from the pharmaceutical industry to help us pursue this interest. From this experience clinical departments became more aware of the costs of data collection, and the limits of data reporting, including the limits of the technology available for both of those activities, and the usefulness then to them of the data collected.

In the ophthalmology department, encouraged by the support that the administration gave, they started to collect both the SF-36 and disease-specific functional status measures for patients undergoing cataract surgery. In spite of having standardized questionnaires available, they spent a considerable amount of time developing scanable forms, which was a technologic challenge at the time.

They collected data for two years, but never managed to develop a reliable mechanism for routinely providing feedback to physicians. The project died with the frustration of physicians, lack of continued external funding, and a change in clinical leadership.

All is not completely bleak. A contrasting tale comes from our inpatient rehabilitative medicine department of the hospital, which is part of this care delivery system now. Here the functional independence measure or FIM is incorporated into routine care on admission and preparation for discharge. The department uses nationally standardized forms for monitoring individual patient status, discharge planning, and follow-up patient stages three-four months post-discharge.

However, the data are maintained separately, and in fact not compiled. So they are completely detached from any routine data collection. The costs of using these per patient types of functional status are absorbed as part of the routine cost of care. The department is aware of the availability of a comparative database, the Uniform Data System for Rehab Medicine, UDS, but has been unable to convince hospital administration that the utility of such comparative information is worth the cost of acquisition.

I would note that HSM is facing a shortfall of about $10-20 million for 1999, not enough to put it into receivership, but a major financial dilemma nonetheless.

What drives this organization to collect some kinds of data and not others? As Dr. Whitelaw pointed out, financial management, in particular billing is a critical data point of commanding of resources. Business management, whether it's regulation or accreditation or for marketing, which is an issue we haven't touched on yet today.

Clinical management, where the interest is usually on a per patient basis, so that access to things like progress notes or test results for a given patient are deemed more important than aggregate types of information.

In terms of quality improvement, the nature of the system tends to focus on selective data collection, rather than continuous data collection, except for issues of access, such as access by phone, or access for visits.

Annually, HSM sets strategic objectives from which flow the objectives for each department, so that in theory there is a system for prioritizing what kinds of data are collected. For the year 2000, just to give you an example of where this kind of care delivery system is going, there are four information management projects that include: a new billing system (are we surprised?); service-related information management products relative to monitoring phone access; a database for radiology, which is driven by the need to more closely manage the large resource utilization in that department; and fourth, a database for cardiology, deemed to be a core product for the institution.

And that comes to this marketing issue that I alluded to earlier. The cardiology department has been awarded the privilege of a niche system that includes functional status measures because of its marketing priority for the organization. At the moment, niche systems are generally out of favor with the information management division for many of the reasons that we can talk about.

The cardiology's data system is not on-line yet, but the purchasing decision is being made. Such a system has been deemed critical for cardiology, not only because of its designation as a core product, but also because the care delivery is spread over many sites -- clinic, hospital, noninvasive lab, nuclear imaging. And patients are often seen at some or all of these sites, and current information being required for patient management.

The division administrator emphasized that the granting of this request was driven by the need for external benchmarks to help manage the activities the department, and the anticipated demand for such information from purchasers -- not present yet.

I want to move from quality improvement, to talk a little about the use of health status information for risk adjusting financial payment. Here I move from being a reporter to my own activities as a researcher. Several years ago my colleagues and I conducted a study for what was then the Physician Payment Review Commission. In this research we compared the merits of several different potential risk adjustment mechanisms for adjusting capitated payment to providers using a data set that had been assembled by a local HMO.

The data set included the results of a health status survey, the SF-36 for a sample of adults 18-64, and a sample of the elderly 65 and older. In addition, we had claim information for respondents and non-respondents for the 12 months preceding the survey, and the 12 months following the survey.

From our analyses we concluded that risk adjustment based on clinical diagnoses, ICD-9s, predicted or explained expenditures better than self-reported health status, the SF-36, or self-reported medical conditions. In particular for skewed populations, which is the major concern when you are risk adjusting payment.

There are many limitations to this study, including the relatively small sample size, and the limited size of the population over 65. The study population did not include Medicaid recipients or children younger than 18. And in that mysterious packet there is a paper that presents these results.

In addition to the statistical performance of the models, we were concerned about other characteristics of self-reported health status as a risk adjuster. Of particular concern was the role of non-response. We have drafted a manuscript on that subject which is also part of the mysterious package.

Our findings demonstrated that non-respondents differed systematically from respondents. In the age group 18-64, non-respondents tended to be younger, healthier, and male. In the aged population non-respondents tended to be older and sicker.

The crucial issue of evaluating the usefulness of any kind of health status as a predictor for total health expenditure for groups of enrollees is the extent to which this kind of non-response such as we observed, results in biased predictions about what the expenses will be in the future. Such bias has serious implications for the use of survey-based measures to adjust premium or capitated payment rates.

The problem is probably more serious than is documented in our analysis because we had a relatively homogeneous study population. In more diverse populations where there are language issues as a barrier, or issues about cognitive impairment, the sources of bias become exacerbated.

Dr. Kane spoke this morning about corruptibility. We have termed that as the issue of gaming. Because health status surveys were not initially designed to set payment or capitation rates, the issue of gaming surveys to increase payments has not been addressed. It might be something you would want to do a pilot on.

In theory, however, surveys could be gamed. Gaming could involve either sample selection, or the follow-up of the sample. For example, say if a sample could be manipulated so that sicker members are identified for survey participation by health plans or provider groups. For our non-senior population, our study indicates that these individuals are more likely to respond. For seniors, however, these individuals are less likely to respond. In addition, inadequate follow-up of non-respondents will lead to other types of bias, depending on the population that you are observing.

Gaming could also involve more subtle practices such as patient coaching. Respondents could be coached to answer surveys in a way that make them look less healthy than they really are. I don't know if any of you have purchased a car recently, but I got a lot of coaching from my car dealer about how I could answer the consumer satisfaction survey that I was going to get, complete with a little sample with the correct boxes marked for my general education.

The issue of coaching is of course speculative, however, it may or may not directly influence payment rate. The timing of the coaching relative to survey administration, and whether or not coaching was aimed at an entire enrolled population or limited to survey participants could affect the likelihood of coaching actually working.

I think the bottom line here is there is no risk adjustment approach that is free from either intentional or unintentional bias. Because of the threats that are posed, the ability to audit any type of risk assessment measure is critical, and we haven't talked about that, but it relates to the use that you are going to put these data.

Auditing survey results for functional and perceived health status could pose a challenge because of the subjective or self-evaluative nature of the information. I would note that some researchers have argued that functional status is less subjective and more easily audited, but potentially very expensive.

Other criteria that you should consider in selecting a risk assessment approach include the ease of data collection, the cost of data collection, the acceptability to the providers, and in particular the compatibility with rating and payment timelines. Dr. Whitelaw talked about the timing of different kinds of health status information. These are not clinical timelines that are required for making payment adjustments.

Again, in the packet there is a discussion about each of these characteristics in terms of how it affects the choice of a risk adjustment measure. I would note that other researchers have come to conclusions that differ from ours, and I'm confident that you will consider their findings in your deliberations.

In closing, I would like to note that in returning to HSM, they are being paid on a health-based risk adjusted basis at the moment using ICD-9 codes, and it is a miracle what is happening to the ambulatory diagnosis coding.

My final recommendation to you is that you need to be very clear about the question you are trying to answer. Lots of things hinge on that. I have a t-shirt that I wear when I'm coaching clinicians on beginning to design research projects. It says exactly that, "What is the Question You are Trying to Answer?" If you can't get it right, everything else from that point on is a mish-mash of stuff, and you don't know what you've got.

So I'm not alone in throwing down that challenge for you. But in addition to that, if you are thinking about quality improvement, I would be very wary of requiring assessment when it comes in lieu of actually doing interventions. It's a big problem, and that any kind of mandatory work in that area is likely to be a drag on the progress; even the progress that we are making now.

Thank you.

DR. IEZZONI: Thank you. Thanks, Jinnet, that was really informative. And we look forward to getting the papers when the FedEx package shows around.

Margaret, if you could just introduce yourself.

Agenda Item: Functional Assessment: Risk Adjustment and Rehabilitation Focus - Margaret Stineman, M.D., University of Pennsylvania

DR. STINEMAN: Hi, I'm Margaret Stineman. I'm from the University of Pennsylvania. I'm a practicing physiatrist, and also the principal investigator that did the project that led to the function-related growth system, which is being modified and implemented by the Health Care Financing Administration.

I do have a handout which answers a lot of the questions, and I want to jump into actually responding to some of the things that I heard, and also show some live data from our institutions in terms of how we actually use functional status. It's going to be interesting, because you are going to see the sequence.

First, you're going to see it before we started comparing it to national norms. Then you're going to see it as we began to compare it to national norms. Then finally, I'm going to talk about what can happen when you actually case mix adjust and compare to national norms relative to the type of decisions that we can make in treating people relative to this data.

I think what one thing that Jinnet said that really struck home was that she said that of all the various care programs within her continuum, that the inpatient rehab unit did appear to have standardized data. This is very true in that the rehab industry does have the uniform data system for medical rehabilitation that has been operating since about 1987. At one time I know about 60 percent of the rehab facilities around the country were using it so that they could create national norms.

I'll talk a little bit about the FIM, and some of the benefits of the FIM, and also some limitations to the FIM. The FIM was endorsed by numerous national organizations with the explicit desire to really create a functional status instrument that would be appropriate for acute rehabilitation, which is defined as the period in time when a person has a new onset disability, and they have the potential for functional restoration.

So that the measurement characteristics of this particular instrument are very different from what you would need either for the outpatient setting or the institutional chronic care setting in that you need to be able to measure the type of activities that are necessary to get a person back into the community.

But you are not going to need to at that point following their acute injury or acute disability, need to necessarily measure all of the IADLs, which would be once they get back into the community and they need to start working on this. They really need to be able to take care of themselves, climb stairs, walk. So this is the type of things that it measures.

The other thing that is important is that because it is looking at the process of acute rehabilitation, it really needs to measure change. So that you will notice that the scale is a seven level scale. And it has been shown to be quite sensitive to change when you look at admission, to discharge attributes of the patients in this particular phase of their care.

Now I think that we have been using the FIM for about 10 years at our institution, and we actually have benchmarks that the clinical team gets together on a quarterly basis. And we establish that we are going to attempt to achieve this parameter relative to the gains of the patient. This proportion of people are to be discharged to the community. So that we have all of these benchmarks that are really standardized.

We keep some of the constant, so that we can compare across time. But as the clinical team identifies programs, whether it's urinary tract infections or whatever it might be, these parameters are added to the system for a period of time, so that we can develop a process to deal with these problems, and then re-analyze to see if the process work, and those parameters will stay in the system until we fix the problem.

I think that's another key point/recommendation that I have is that you need standard parameters that are always measured, that don't change, that are fundamental. And possibly these parameters could be across settings to a certain extent. But you also need to have sets of parameters that are adjusted based on the issues that you see in your clinical population.

So I'm going to actually jump now -- I'm going to show you some of the data, and how we have actually interpreted over the year in our institution. The first slide is going to show you length of stay means over time. This is going back to 1991-92 fiscal through 1998-99. So when I started doing this initially, you will notice that the length of stays were really long -- 33.1 on average in 1991-92.

So of course health care changed, and there was an impetus to really push down, to increase efficiency. You can see that reflected in this data over time in terms of the length of stay norms.

Now I don't have a pointer, but I don't think you can read 1995-96, but about at that time, which is the nadir of the drop of the length of stay, the therapists came to us as attending physicians and said, look, things have got to change. We are not able to get done what we need to get done with the patients. So there was a negotiation with therapists and attending physicians.

Basically, the decision was made to change the parameter in the program evaluation system. Not to continue to reduce resource expenditure, but to begin to look more closely at the patients who might have functional capability to continue to improve or need to improve because of their environmental circumstances.

So that we really had to kind of rethink that. So this question became -- really as an attending physician, this is how I operate. If I have a patient that doesn't really need a lot of rehab, I'm going to try to get them out quickly. But I want to make room for a patient who is really, because of their environmental circumstances, or because of the severity of the disabilities, is really going to need more time. So it's actually the process in reviewing the data with the staff has forced us really to look more at individuals.

In 1996-97, miracles beyond miracles, our facility said, oh, you can join the UDS. So now I've got some normative data to look at our facility. You can see that as you might guess from the data that I showed you before, our facility, which is in the darker bar, the length of stay averages were lower than the national norms for other similar facilities. And you can see that as we made the decision to begin to liberalize that, that we began to adjust ourselves and become closer to the national norm.

This is a presentation that I put together for the staff to understand average admissions, functional status severity relative to gains, and relative to discharge averages that patients achieve. So what this is, along the Y axis you can see the fiscal year. Along the X axis you can see the average FIM scores. So that the left-hand side of the bar is the admission average scores. The right-hand bar is the average discharge scores.

You can see that over the years there has been some case mix fluctuation in terms of the types of patients that have been admitted, in terms of the severity. There was a decision at one point to try to encourage the admission of more severe patients, because we felt that the cost containment pressures to get people in and get people out fast was really eroding the type of quality of care that we were providing. That we were turning away some patients who really, really needed to be in the hospital.

We talked with the therapists in terms of now that the length of stays are shorter, now what can you do to create greater efficiency in your services. You can see that we are very proud of the fact that the outcome gain scores have been maintained, and are actually improving.

Now again, here is where we get the national normative data. The way this works is that we have the national norm from the UDS plotted in the open squares, and our facility-specific is in the shaded squares. This works the same way. The left-hand side is the admission average FIM scores. The right-hand side is the average FIM discharge scores.

You can see that for example, 1996-97, the lower one, that our case mix was actually less severe than the national norm. But at discharge they were at the same level of disability. You can see in 1997-98, that our case mix actually had become quite similar in terms of functional severity to the national norm, and that our functional outcome at discharge from rehab was quite similar.

You can see in the most year that we are continuing to admit a similar case mix in terms of severity, but that our gains are a lot higher. I would attribute this, and the therapists felt it's probably due to the fact that we are beginning to liberalize the length of stay somewhat, and allowing some of the folks that really have demonstrated potential to recover, more time to do so.

The next one I'm going to show has to do with the portion of patients discharged to the community over time in comparison to national norms. And this is another parameter that we really follow carefully. It's interesting, and we've had philosophical discussions. We don't necessarily want to have really, really high community discharge rates, because that would suggest that we are selecting patients only the basis of their community setting.

We don't want to exclude people who don't have caregivers that are easily available. We want to give people a chance. So it's sort of almost like when you are working in an emergency room, and you are getting people coming in with appendicitis, there are going to be some that you operate on that didn't have appendicitis, but if you didn't you might miss somebody that really did have appendicitis, and they go on to rupture, and the consequences are terrible. So it's a real balance in terms of really analyzing the parameters. And we analyze these different parameters sort of against each other to get a sense of how we are treating our population.

Now the last slide to me is the most exciting, and this is going to be a little confusing to explain, but all the slides that I have shown you, the progressive slides beginning with the raw parameters, and then going into the parameters that were superimposed on the national data. None of that has any case mix adjustment in it. The most powerful usage of this material will be once we can case mix adjust. So this comes out of our research.

The function-related group system at this point is modular. We have FRG systems that will predict length of stay based on the patient's rehab impairment categories, their level of physical disability at admission, and their level of cognitive disability at admission. We also have FRG systems that will predict functional gain based on the same parameters. And we have FRG systems that will predict functional status discharge scores based on the same element.

This particular plot is showing the discharge motor FIM FRGs, which is the module that predicts outcome. And it predicts the level of physical disability at the time of discharge from the rehab facility for groups of people who have come in with similar clinical characteristics.

So that this was selected because it's extremely important to look at physical disability and discharge, because this is really what determines the degree to which a person is going to need help at home, indeed whether or not they are going to be able to function at home, and have sufficient help. So it's a good thing for therapists and the rehab team to look at.

The way this works is that the discharge motor FIM score, which is the segment of the FIM which measures physical disability, is along the Y axis. The value of that goes from 13, which would mean that the person is totally dependent on all FIM items in the motor scale, versus 91, which would be completely independent in all items.

Along the X axis you see the FRG groups. Now this system is very complex, because it takes a lot of groups to explain difference in outcome. The resource use FRGs are very simple, and it doesn't take as many groups to explain cost. I'm going to talk a little bit about that briefly in a minute.

What you see here is you see from the national data the 25th and the 75th percentile values of the patients' motor FIM discharge scores, the lower and the upper horizontal lines, and the median bisecting that line. So that you would expect that about 50 percent of your patients, if your facility was operating like other facilities around the country in terms of their outcomes, would fall within the bars.

It's very interesting, if you look at the relationship of the bars, and you actually have one of these in your packets, so you can look at it in some detail, you will notice that there is a monotonic relationship pretty much that as the functional severity of the patient decreases, the functional status at discharge from rehabilitation increases.

But you are also going to notice some interaction effects. If you look at 4, 5, and 6, you will see that there is a decline in tendency. That's the effect of age. So that this begins to give you parameters that are specifically appropriate for people who are older and younger. You can't expect the same clinically necessarily from a person who is in their eighties versus their thirties.

The circles on this diagram are our facility's data which is superimposed on the norms so that all the dots which are below the norms would be people who had outcomes that were lower than expected, given their initial status. All the circles above the norms would be people who had higher outcomes than expected.

So if this were to be used in a system to look at process, what you could do is you could pull up all the case records that higher than expected outcomes, and compare those to all the case records that had lower than expected outcomes, and begin to try to figure out what are the clinical differences, and also what are the process differences.

What treatments did you give to the high outcome patients that the lower outcome patients didn't have? So it just becomes a much more powerful way of beginning to look at designing your services.

The A here actually would be an outlier. What you might want to do is say there is something really strange that happened to this patient. Let's pull this record and see if we can figure out clinically what happened. So that's the way that works.

In conclusion, I will say a coupe of things. On is that I go on the side of really believing that functional status is extremely important for improving the health care of all Americans, but with some caveats. I think certainly in the rehab settings, in the long-term care settings, in the home care settings this is true. But I think that we've got to be very, very sensitive to costs.

And I don't think that the outpatient healthy population is necessary appropriate to ask about functional status, or even necessary. I think that you could design some very, very simple triggers, simple questions, one or two questions that you might ask a patient. And if the answers are yes, then have a supplemental series of questions.

It's very interesting in that we haven't done any studies to look at this, but the VA system did develop an FRG-adjusted clinical guideline where they used the FRG system for stroke and amputees, and really established benchmarks that were case mix adjusted. They did it across the full post-acute care continuum, not just acute rehab.

The results that came out of this pilot study that they did were really astoundingly exciting in that it turned out that as long as the patient was in the acute stage, or was receiving restorative rehabilitation, the FRG structure appeared to show differences in cost. So that if you were in a very disabled FRG, you would cost more irrespective of whether you had inpatient rehab or went to a SNF or home health care outpatient.

I felt that was very important. I think that is the promise of the future in terms of this type of technology. That this system isn't necessarily right for it, but it suggests that a model could be developed.

Another caveat that I think is critically important is that for all of us to really recognize the differences between restorative rehabilitation and long-term care in that an FRG system would not necessarily be appropriate for long-term care. This type of design is appropriate for any setting probably where the clinical objective is to get people back to a functional level where they can resume their lives in the community, because of the fact that the measurement issues are different, and the types of things that you need to measure are quite different.

The other thing that I will say is the question about clinicians versus self-report. I think that self-report is always very, very important, but even doctors of certain specialties can do functional status quite appropriately. Your rheumatologist, your physiatrist, we're trained for it. That's our field. Functional status is at least as important as diagnosis. We design our treatments around functional status, as well as diagnosis.

I think the final thing I'll say is that if you want to get accurate functional status information, in addition to who you ask, the important question is how do you ask it in a way that it will have happen reliably.

What happens in the busy clinician's life sometimes is you might get a form to fill out six months after you have seen the patient. And you might be seven blocks from the medical record room. And you might have 20 other patients waiting to see you, or a research project you are doing. So how accurate is your assessment of their functional status going to be?

So I think that the system needs to be designed with sensitivity to the flow of care. And finally, when you begin to design functional status measures to be collected in a standard way, it is critical to limit the actual elements to the most important questions in that the health care system in many ways is broken. We don't have the time to do what we all want to do for people that we are caring for.

So this is extremely important information, but some of the forms have so much information in them that is not necessarily relevant to the services that are being provided that we need to really look very carefully at developing minimum data sets, not maximum data sets. And recognizing that there can be overlap between what we do in clinical practice, and what we develop for administrative databases. In fact, there should be.

I think the reason the FIM has been so successful is that it's incorporated into the practices of the actual therapists that have training programs that specify how to do the FIM. But it's also used for quality assurance and program evaluation, and possibly even someday payment. That's why it works, because it's using information multiply, and not asking people to use a different form, a different way of collecting the same type of information.

DR. IEZZONI: Okay, Margaret, that last statement was extremely provocative, because everybody else has said you have to have your data collection geared toward your specific purpose. But we are running a bit shy of time, and I know that Dr. Stein --

DR. STINEMAN: Could I just clarify that? Actually, I believe 100 percent that you have to have your data collection designed for a specific purpose. But that there are some overlaps. If you can design an administrative data set so that it taps into stuff that the clinicians are already collecting, that is a tremendous cost saving, and also I think that the data will be more accurate and more relevant.

DR. IEZZONI: Good, okay. Dr. Swanson will be here tomorrow morning, but neither Dr. Fowles nor Dr. Stineman will be. But we are running a bit shy of time. So let me just ask if there are one or two questions from the committee that can be answered quickly, and then we'll take a brief break. Committee members, anything? No.

Thank you. We look forward, Jinnet, to seeing your information. I think that will be really great.

Why don't we take 15 minutes, so we'll resume at 3:20 p.m.

[Brief recess.]

DR. IEZZONI: We have our three final speakers. You come at the end of a very informative day though, and we're very much looking forward to hearing from the three of you.

Dr. Stein, I think you are the first person up.

Agenda Item: Functional Assessment: Selected Focus Areas - Ruth Stein, M.D., Albert Einstein College of Medicine

DR. STEIN: Let me tell you a little bit about where I'm coming from. I'm a pediatrician, and I have spent the last couple of decades looking at the care of children with chronic conditions, mainly chronic physical conditions, but over the years I've gotten more involved with the broad spectrum of conditions.

I was involved at the beginning of my career in the development of a functional status measure at a time when people weren't thinking about the functional status of children. So I think I'm here as a result of that set of activities.

And what I would like to do in the short time that we have together is to focus on some of the ways in which I think that children are different, and some of the special considerations in assessing the functioning of children, and how these conditions and considerations affect our choices; what some of the controversies are, and I think we have heard some of them as we have talked about adults as well; and then what some of the measurement options and recommendations might be.

I should preface this by saying that to my knowledge no one is doing any administrative data collection about functioning in children at all. I don't come from a system where that is happening on a routine basis. I have been involved in this primarily from a research perspective. But many of the same issues and arguments that have been made earlier in the day definitely pertain to children, and I do know from other collaborators that functional status risk adjustment using ours, and I think other people's measures as well, has been an important thing in their research, and has had some predictive validity.

As we think about the special considerations with children, I think the first and most obvious thing is that children are not little adults. And not only are children not all little adults, they are not the same as one another. So that when we think of infancy, early childhood, school-aged children, adolescents, we are not talking about people with the same sense of functioning. And so that complicates any discussion we are going to have about measuring functioning in children.

Moreover, children by definition, are dependent. We don't think of children as autonomous beings, especially early in life. So their functioning is an interdependent functioning with an adult caretaker. So the notion of independence that is so much a part of thinking about adult functioning is really not an appropriate concept for early childhood, and maybe even late adolescence.

Most childhood morbidity further, is not caused by the single handful of conditions that are the predominant and overwhelming cause of morbidity in adults. Rather, the morbidity for children is a lot of very kind of what we call one of a kind conditions, with very low individual frequency. So if we are thinking about disease-specific models, we are right in trouble the minute we get past asthma.

The other thing is that included in childhood morbidity in a very big way these days is learning disability. Although that is not so much on the physical health spectrum we think, although we may change our notions about that, it is a very important piece of morbidity.

Further, there is less standardization of role expectations when we are talking about children. And we have no previous baseline. So that unlike the situation that Margaret was talking about before where someone has an acute injury, for the most part we are talking about by definition that this is something that has occurred before the child has become a functioning, autonomous individual with full roles.

And the developmental norms, which is kind of what we fall back on, are really the lowest common denominator of what we expect a child to do. They are not an accurate measure of the potential of a given individual. So it is very hard to use those as a standard. And we are talking about a habilitative rather than a rehabilitative stance in many instances.

And even where we are talking about rehabilitation in a child, we are talking about rehabilitation taking place for a 5 year old between the time they are 5 and 7. We don't know what their potential would have been as they changed from a healthy 5 year old to a healthy 7 year old.

Now the major ways -- and we talked about this all through the day -- has been the notion that there be some clinical evaluation or observation outside of the patient. The possibility of self-report, or a behavioral inventory of some sort that looks at the way the child is functioning in society.

Now all of these methods assume three things. They assume a reference concept of normalcy. So we need to buy into that. They assume recognition by the informants, whether the informant is the provider, or the child him or herself, or a parent respondent proxy of what that deviance is from the normalcy. And then the ability to communicate that. So it involves a cognitive translation, which is not always possible for children, especially young children.

As I mentioned before, childhood is not a homogenous period. There are great variations in the social and cultural norms of age appropriate behavior, even within mainland US society. What the function of an 11 year old girl is, is very different in different communities and in different subcultures. Also, as we mentioned, there is a previous high proportion without the previous well state. And a lack of knowledge, as I mentioned earlier, of the potential.

Since this adult standard is not appropriate, because we don't have independence, we need to think in terms of developmental issues. And they preclude static measurement. So that my measurement of function of a 10 month old or a 10 year old has to have some comparability if I'm thinking about children globally. Or there has to be some translatability. You don't want to have 20 different age groups between 0 and 1. And even within that, an 18 month old is so different from a 2 year old, that you've got to figure out some way to have a continuum.

It's sometimes difficult to distinguish the effect of development itself from the effects of intervention or change in health status with an individual child. So that becomes another layer of complexity.

In addition, the cognitive changes that take place in children's understanding of the questions that they are asked, and of their understanding of what other kids do make it very hard for us to rely on their self-report, and increase our dependence on respondents, especially at the lower end of the age spectrum.

When you add to that the complexity that a disabled individual may also have some difficulties in communication if it involves sensory or motor or some other deviations, it makes it very hard even to think about an age group at which that is not a problem.

So based on all this, if we look back to our choices, I think aside from the fact that clinical evaluation is expensive, difficult, it's also dependent very much on reports of things that are not observed, especially with a young child who won't necessarily perform when you are trying to test them and observe them.

So what you find is that clinicians are very dependent on the information that the proxy gives to them. And self-report has all these problems, as I mentioned, of comparability across the earlier age group. So that leaves me feeling pretty much like the place we have to focus when we are thinking about children is on behavioral inventories.

And as we do that, I think of behavioral inventories as assessing the effects on individuals of health, both on the multiple conditions, which I think is really a strength, because there is a tremendous amount of co-morbidity, of children having more than one problem. But it also has the advantage of looking at how the health status is affected both by the illness or condition or impairment, and by its treatment.

And particularly with some of the more noxious treatments that children are sometimes exposed to with the intent of making them better in the long run, that issue of how they are functioning now is often very affected by treatment.

And we think of these inventories as assessing the impact of health on the individual's performance of age appropriate activities. The assumption behind using this kind of a technique is that behavior is the final common pathway for the manifestation of the health or health problem.

And that performance is comparable to ability, which is a big assumption, although I think a little bit less of an assumption in young children than it is in adults who have more incentives not to necessarily perform. Most kids try to do a lot of exploratory things, and take on new tasks. Activities that are measured need to be able to reflect the range of circumstances under which it would be appropriate to know about the individual's performance.

The completeness inherently requires that we ask more than one question. And that becomes one of the difficulties, because brevity becomes a problem. We cannot always be sure what is causing a given dysfunction. And the other disadvantage is that there is a very limited choice of validated measures.

Now in adults we often think of functioning as being operationalized by the ability to do self-care, which I have already mentioned is not an appropriate thing for young children, and also the performance of adult roles and work. But as I think about what the task of childhood is, the work of childhood, the work of children is development. It is not really school and it's not really play, although play is a part of their development.

This involves many domains, and I have listed some of them here on the handout, but I think it's important to think about all of those, and which of those types of functioning we think are most important if we have to measure only a little bit of it.

Then just briefly, some of the controversies. The categorical, do we want disease-specific measures? I would opt to recommend very strongly for non-disease, non-categorical measures, but that is a debate. And I think the reason for opting toward a non-categorical approach has to do with the epidemiology and the presence of comorbidities and the small numbers issue.

There is a question of whether we want to focus more on capacity or performance, on depth or on breadth, things we have heard about all day from other speakers, so I'll go over them quickly. Some of the other speakers raised the issue of where on the spectrum the measurement should be most sensitive. Do we want the extremes, or do we want more of the gray zone? Different measures have different ability to distinguish among different parts of the spectrum.

Another question that comes up over and over again, particularly with children with organic medical disease is static versus fluctuating functioning. And we have some evidence that it's the kids with the fluctuating functioning who have the most long-term morbidity. Do we want their best, their worst, the range, their median or mean?

Then there are the issues of an age-specific versus a broad range measure. We would like optimally for the sake of consistency to have a broad range measure, but that's hard to achieve. But it does require that we have some broad range measure to show ability to change over time.

And then there is an issue of whether or not, and I think this was something that Gretchen spoke to, was the long-term potential of functioning on the future. So the notion of what are the future risks? Should they be incorporated into an assessment of a child's functioning?

Then are we talking about broad or narrow dimensions. Do we want to focus primarily on the physical? If we do that, we're going to leave out a lot of particularly the psychological, the social, and educational functioning, which is the source of much morbidity in our child population.

And then there is the issue of whether we should be child-specific or because of this notion of joint functioning, should we take into account the impact on the caretaker or the independent functioning of the child and caretaker? And I think that can also be an issue in other points in the spectrum.

One thing I want to spend just a moment on is the compensated notion versus the uncompensated notion. I'm putting a very strong plug in that two children who function at the same level, one of whom is on dialysis, and one of whom is not, in my way of thinking it is important that we measure how they are different, as well as that they both function, because if we don't, we are really going to obfuscate a very major impetus to our continued provision of service to those youngsters.

In many instances we are not going to totally eliminate the chronic diseases. What we are going to do is improve the functioning of the individuals who have those chronic conditions. And I think it's very important to measure assistive devices or personal assistants that enable people to function.

ADLs don't work in the assessment of child functioning. And I will say that every emphatically. They are not relevant at all below the age of 5, and these data are for children mostly over the age of 5, where we know that less than 1 percent of children over 5 have an impairment of ADL.

So that is in the face of knowing that 4.9 percent of children are unable to perform major activities when they are defined as play or school, and 28.8 percent on the NHIS disability supplement were limited in major activities of play or school. So if we only measure ADL, we are not doing the job for children.

Some currently available validated measures are the child health questionnaire developed by Jean Landgraf and her group. It has both a child and parent version. It goes through the whole age spectrum, and the child version for 10 and above.

There is the CHIP adolescent version which Barbara Starfield developed, which is for adolescents over the age of 11, although work is continuing to downward measure the younger children, and to shorten the measure, which now takes 45 minutes.

And then there is the FS II(R), the most common form of which is being used as a 14 item scale that Dorothy Jessop and I developed. That is a continuous measure from 0 to 16 years of age.

Other options are disease specific measures, and there are a slew of them. I didn't begin to outline them for you, because I don't think they are the way to go, except if you are doing biologic studies of pharmacology or other very specific interventions.

There are assessment tools for development such as the Denver or the Vineland. There are mental health assessment tools for children mostly above the age of 4 or 5. Measures that classify children with ongoing conditions. One I will just mention is the questionnaire for identifying children with chronic conditions which our group developed, and then a much brief version that has been developed by the Foundation for Accountability called the Living with Illness Measure. I've been a part of that group as well, along with Paul.

And single item global assessments, which is what we have used up to now in national surveys. Those generally measure the ability to perform major activities, and I mentioned that they are defined as play and school. And the level of limitation in those spheres, some also have looked at the degree of effort needed to perform, or the issue of stamina.

These single item measures don't, unfortunately, capture the full range of impairment, even though they capture a lot more than ADLs. And they don't necessarily capture the full range of arenas in which the impairment might disrupt development.

The Four Ds of childhood has been suggested as a kind of good way to keep remembering what is so different about children: the developmental change; the dependency on parents and other adults; the differential epidemiology; and the difference in demographic patterns.

I would like just in closing to point out a couple of very key issues. One is that child health care is a small proportion of the annual cost of health care. Therefore, there has been very much less investment in dealing with the issues of measuring functioning in children. It's almost been an afterthought.

And little attention is being paid to the cumulative financial impact of the childhood impairment over the lifetime, which is not so inexpensive. So I would caution you that although it seems like these are very complicated issues, and maybe it's just too complicated to deal with, it is very important that we deal with it, and that we add functioning to it.

I believe that the only feasible way to assess child functioning on a large scale is to use measures that cut across conditions and age groups, even though those measures may be less refined that we would optimally like them to be. But we also need development of measures very urgently, and that will not happen without the inevitable coughing up of some dollars to support the psychometric work.

Thank you.

DR. IEZZONI: Thank you, Dr. Stein, for an excellent review of an area that obviously, yes, we do need more dollars to study it. If you could just stand by.

Alice, why don't you take the stage.

Agenda Item: Functional Assessment: Selected Focus Areas - Alice Kroliczak, Ph.D., HRSA

DR. KROLICZAK: I'm going to give a very brief presentation, with the main theme of the individuals that we serve at HRSA in the HIV/AIDS Bureau are not at the state where we are interested in doing functional status or functional assessments. And I would like to show you why.

I speak only for one of the four bureaus from HRSA, that is the HIV/AIDS Bureau, and not for the others. We have a unique population that we serve, and we serve them through the Ryan White Care Act, which was originated in 1990, and reauthorized in 1996, and is up for reauthorization again. These are individuals who are HIV-positive or have full blown AIDS, and do not have private insurance, do not have Medicaid, and therefore someone has to give them some services.

The Ryan White Care Act comes under different title programs. We fund about $6 billion a year to about just under 500 grantees. Each of the grantees has a whole separate network of service providers, ranging from primary care providers to transportation bus drivers, to all types of special support services.

The different title programs fund different eligible populations from that larger group. Title I funding goes to EMAs that are eligible, with large HIV-positive and AIDS rates. A Title II goes to states and territories in the United States, every one of which is funded.

Title III, we fund public and non-profit entities for outpatient early intervention. Title IV we fund public and private non-profit entities for projects that will coordinate services to improve access and availability of services and research for children, youth, women, and families. And the research is generally access to clinical trials.

These are the basic four title programs funded under Ryan White, to which most of the money goes. Part F funds a few auxiliary programs, one of which is the program called Special Projects of National Significance. We have about 15 what we call SPNS grantees, and their networks of providers across the United States. A few of them have tried working with various measures of functional status, only to find that because of the population they work with, it was impossible.

In the HIV/AIDS Bureau we do different types of data collection, but our main type of data collection is collecting program data as part of our ongoing monitoring of the $6 billion that we administer every year. We also collection information from program data for program evaluation purposes, but we fund about six small local evaluations a year, each of which is about $50,000. And so in the attempt to encourage local sites to do their own program evaluations, we can only offer about $300,000 a year.

We also do encouraging of local sites to do their own program evaluation to the extent possible, in order to do their own local, state, and national planning; again, not being able to give them much money for that.

Our major data collection system across the four title programs collects only aggregate data. It's unbelievable that we give $6 billion a year, and can only collect aggregate information. Our legislation does not support making mandatory for all the grantees and providers, any type of client level data collection.

We fund seven sites across the country from anywhere from $50,000 to $150,000 for a three year period of time to develop client level data collection systems. After the three years are over, generally we fund another 7-10 sites. And so pretty much the sites are on their own to continue doing any kind of client level data collection.

When I talk about aggregate reporting, I am really talking about just under 500 grantees receiving a data report from each of their providers. Those providers cannot unduplicate the client data that comes to them. For example, a grantee in Washington, D.C. could have 50 providers. Washington, D.C., as well as other areas are not that large. You could have clients going to several different providers funded by that same grantee. And there are no unique identifier numbers to attach to these clients, so that the data ends up being duplicated data.

We had a special study done by one of the faculty in the Harvard School of Public Health to see if they could develop a model for estimating the amount of unduplicated clients that we actually have. Basically, we found that about one-third of the data that comes in annually to us is duplicated data.

When we have talked in the past to our grantees and their providers about collecting any kind of data, let alone functional status type of data, they say these are our basic concerns. First of all, is the client dead or alive when they come to us? Has the client's disease progressed? Very general questions.

Spending energy on data collection takes away from services to patients. The Care Act allows for, depending on which title it is funding, a maximum of 5-10 percent of a grantee's funds in any one year for administrative costs. And administrative costs have to cover everything that they generally use for administrative costs. And if we include data collection, that would also have to come out of that.

For the reauthorization in the year 2000, we are trying to get Congress to increase the amount spent on administration. We don't know how successful we are going to be.

The population that comes to us have many barriers. They are HIV-positive at least, if not having full blown AIDS. They have many comorbidities, substance use, histories of trauma, incarceration, and mental illness. It is difficult to engage and maintain them in treatment due to basic lack of trust. Some of these are people who have never maintained any kind of an established relationship. Some of them are homeless, which makes it very difficult to track down these individuals and keep any kind of a primary care record on them.

Also limited education and illiteracy. So it's very hard to get someone who comes in, who is under the influence, who may also have a mental illness to answer even self-report functional status type questions.

The data collection tools that some of our SPNS grantees have attempted to use to measure functional status, they have not found that they have been designed for the particular target population that we are funding. And they also have not had the time to develop their own standardized instruments. So we have an academic evaluation center working with them. Should they come to the point where they are able to collect any kind of functional status data, hopefully we'll have an instrument that will be better for them to use with this particular target population.

We also find it very difficult to determine good outcome measures at this point. So I guess my basic message to you is that it really depends on the type of target population you are working with. Although functional status is an interesting issue, with this particular population it's simply life or death.

DR. IEZZONI: Thank you. It's kind of hard to have a rejoinder to that one. But actually, I'm glad that you raised, Alice, a number of issues that really touch on the privacy and confidentiality concerns, because that is something that I knew we would inevitably get into if we went further, and talked especially to disease advocacy groups. So thank you, that was a helpful presentation.

Dr. Kaplan.

Agenda Item: Functional Assessment: Selected Focus Areas - Sally Kaplan, Ph.D., Medicare Payment Advisory Commission

DR. KAPLAN: I'm going to try to make this as short as I possibly can, but I do want to briefly explain to you what the Medicare Payment Advisory Commission is. Fundamentally, we have a broad mandate to consider, develop, review, and advise Congress on improvements to the Medicare program. And in addition to advising Congress on payment issues, MedPAC is tasked with analyzing access to care, quality of care, and other issues affecting the program.

Very simply, we do not do original data collection. We do very little. We are a huge user of Medicare administrative data, and use almost any type of Medicare administrative data that we can get our hands on. And basically, I'm assuming the reason that we were asked to come here is that MedPAC has huge concerns about functional status in the post-acute area.

In the post-acute arena, as you heard Bob Kane explain, functional status explains a great amount of the variation in resource use, and therefore payment. The BBA and the Balanced Budget Refinement Act mandated new post-acute payments systems for basically every aspect of post-acute care. I have listed that on the slide that you have a copy of.

They also mandated risk adjustment payment for Medicare+Choice, which also has an effect, because that includes special programs such as PACE, SHMO, SHMO II, Evercare. Most of those special programs are now exempt from risk adjustment, but there is a question as to whether they will be exempt perpetually.

The main thing that I wanted to bring to you was the diversity in the ways that we are measuring functional status in post-acute arenas. As a person who is supposed to conduct analyses across settings, it gets very difficult, as you might be able to tell from these slides.

I also wanted to apologize, because I made a typo. It should be FIM-FRG, rather than FRG-FIM.

I wanted to illustrate the disparity. I basically wanted to bring one measure of functional status on three different measurement systems. And first of all in the SNFs or skilled nursing facilities for those of you not familiar with the acronym, the MDS 2.0, which is called the minimum data set, I don't think can be represented as the minimum data set, since the questionnaire is 300 questions long. But it is used to assess patients, and from which payment is derived.

I have given you both the definition for bathing under the MDS 2.0, and then how it is scored. And as you will see, it has 11 response codes, 6 for coding patient self-performance, and then for coding staff-supported bathing activity.

On the OASIS, which home health agencies use to assess beneficiaries, and which also determines the payment that a provider receives, I have also given you the definition and the scoring on bathing for the OASIS. As you see, it's six response codes which range from full independence, which is 0, to complete dependence, which is 5.

Then because the MDS-PAC, which is going to be used for the FIM-FRG payment system for inpatient rehabilitation facilities, which is not publicly available at this time. We do know that to get to the FRGs, you are going to have to have something approximating the FIM. So I gave you the FIM definition of bathing also, just as a contrast. And also the scoring for the FIM, which as you might notice has seven response codes, which is pretty close to six, but it does go the other way. In other words, complete dependence is 1 here -- no, I'm sorry, it does go the same way. I take that back.

Now just because I was curious myself in looking at this, I decided, well, let's see what one score means with these various ways of measuring bathing. So I chose 4. That was a random selection. Basically, I have given you what a score of 4 means with the FIM, with the OASIS, and then the rather lengthy description of what 4 means for the MDS.

I think it was Don Lollar who said we have been reveling in our diversity. And I think that is a pretty good way of describing it. But the commission is very concerned about this issue. We strongly believe that it would be extremely useful, to say the least, to have standardization of functional status measures at least in post-acute care so that if similar patients are treated in different post-acute settings, or if patients are treated in successive post-acute care settings, that we would have a means of measuring them.

And I think you have heard Nancy Whitelaw talk about the difficulty when you have an integrated system, and you have patients moving from one modality to another modality. And you have all these different measuring systems, it is difficult.

And finally, it would expand the utility of regularly collected information.

DR. IEZZONI: Dr. Kaplan, let me ask you the same question I asked Dr. Whitelaw. What's the barrier to doing this? To doing what seems totally rational? Is it a conceptual barrier, or is it the fiefdom barrier?

DR. KAPLAN: I don't think it's a fiefdom barrier. And probably Carolyn knows better than I do, maybe some answers to this, but it seems to me that what happened is much as we heard Dr. Kane criticizing other researchers, as HCFA has basically decided to develop these systems, they have let out an RFP. A group of researchers has responded to that RFP, and we always know a better way to do something, rather than taking something that has been used, that had been tested for reliability, validity, was universally used, and incorporating it.

Also there were issues of copyright in the past that may have applied as well. So we ended up with all these different tracts. I know that Dr. Kane is working on a project that theoretically is going to make it all, all right. But I believe it's a payment across post-acute settings.

But I think it can be done. I think it's going to take a massive effort. And we feel obviously functional status information is critical to pay these providers for the care that they provide to these beneficiaries. But it also is crucial to being able to assess quality.

DR. IEZZONI: And in the meantime who was saying that they have a million pages printed out for doing OASIS on the home care side?

DR. KAPLAN: Well, the OASIS is 79 questions and 18 questions are used for payment. But then the other questions are used for quality initiatives.

DR. STINEMAN: And that wasn't developed as a quality measure. A number of these instruments that we're talking about were developed for very different purposes, and because they are there, they are adapted.

DR. KAPLAN: Yes, and I think Dr. Kane addressed that issue. The MDS was basically developed for nursing homes and care planning, and now is being used to pay skilled nursing facilities for Medicare beneficiaries.

DR. STINEMAN: I think a lot of the concern has to do with using instruments that were developed for one type of care, and trying to make them work for another type of care. Like when I look at the MDS instrument for example, even this definition of level four, as a rehab specialist, I'm completely confused, because I'm not a statistician, but it looks as if there are two different ways of getting to the same score. And I'm a little bit confused as to how I would rate one of my patients using this for my particular type of care.

I think that a lot of it has to do with performance versus what a person has been doing over a period of time. It looks as if the MDS-PAC is very appropriate for looking at a period of time and saying, well, over this past period of time, is there a particular episode that has required this much? Whereas, the FIM is basically something that the therapists use in the PT or the OT gym, and actually see the patients, whether or not they can perform a set task in a standard way.

I think that's some of the resistance is a fear that you can't take these instruments and necessarily say that they are the same thing.

DR. IEZZONI: Do you want to comment on that?

DR. KAPLAN: I think that we feel that there has -- that ultimately we are encouraging HCFA to move to one shall we say functional status for all the post-acute settings. And that we realize that there is going to have to be dual data collection on functional status once a decision is made as to which is the best to use.

Now I would hope that within these three we could find one, but I'm not sure that that's really true. And not having studied the psychometrics of these systems, I can't really comment on that.

DR. IEZZONI: I just wanted to get to Barbara, because I know that she is going to be taking off, as probably you are. Barbara?

DR. STARFIELD: Does HCFA have any interest in functional status assessment for people who are neither in home care nor in what you call post-care facilities, any interest in most of your Medicare patients' functional status assessment?

DR. KAPLAN: I think that there is interest in it. But I guess what I'm illustrating to you is if you have this much problem with the however million beneficiaries who use post-acute care, and having all these different ways of measuring functional status, I would hope that you would standardize, and maybe move out from there.

MS. RIMES: You heard a couple of things in terms of the Health of Seniors. That's a HCFA initiative, granted. And you also heard some conversation about risk adjustment in terms of functional assessment. Those are all separate and unconversational discussions across HCFA.

DR. IEZZONI: So there are silos.

MS. RIMES: There are a number of efforts on it. When I used to work on the Medicare Current Beneficiary Survey, which spends a lot of time working on functional assessment garnering and once a year reporting, there are a number of issues, and a number of collection mechanisms.

DR. IEZZONI: Barbara, I know you have to do. Do you have any other questions? Margaret, did you have a comment on that point?

DR. STINEMAN: No, that's okay.

DR. IEZZONI: Are you sure? Because we'll give you the floor.

Paul, you had a comment?

DR. NEWACHECK: I wanted to ask Dr. Stein whether or not -- you were critical in your earlier remarks about the use of ADLs and limitation activity to the more common measures of functioning for adults in their use for children. But I'm wondering whether you think we need to have a separate measure for children, or we can get by with one of the measures that has been developed to use across the life-span. This is basically the same question I asked Dr. Kane, like for example using the ICIDH. Would that be appropriate for children?

DR. STEIN: I think the ICIDH has a lot of potential promise for children. And I think the addition in the last revision of the whole learning spectrum of activity is a every important addition for children. To date, that is still an untested system. That's in beta testing now. And I think some of the testing on children will be a very important piece, but I think many of us hope that it will work so that we will have a system that will run across the life spectrum.

In terms of other things, I think the perfect is always the enemy of the good. And my inclination is that we need to get started in the concept of measuring functioning, whether we have a perfect measure or not. Because it will only be through studies that show that the measure is imperfect that we will get some refinement, and some better measures of functioning.

I don't think that the diagnostic approach works, and certainly doesn't work for children where most national data sets have very little data of relevance to children in them.

DR. NEWACHECK: Would you recommend any of the current measures for the committee to pursue in the case of children, any of the current functional status measures, the ones you discussed or described?

DR. STEIN: Well, there really are only two, the one that I was involved in developing and the CHQ. The CHQ is a very multidimensional functional measure in which it is not clear to me what the total score means, to be very blunt. Although it is being marketed very heavily, so it's being translated into quite a number of languages at this point, and it is also 50 items long.

It does not downward extend as far, which I think leaves us with one measure which I do not believe is by any means the end all and be all of this thing, and doesn't measure compensation issues at all, which I think told you I thought was a very important thing to do.

So of the existing measures, I guess I think that the FS II(R) does as good a job, as efficient as any that are yet out there and validated, but I think we need another generation of measures.

DR. IEZZONI: Dr. Swanson?

DR. SWANSON: I wanted to follow-up that question with another question to you, Dr. Kaplan, about do you want to know what the test score is? Because a lot of these tools that have been referred to are composite scores. Paul just raised the issue of ICIDH, which we'll talk about a little bit more tomorrow, which is not a test score, it's a description. It's a conclusion, a classification system.

And I think it would be worthwhile to distinguish, if we are going to spend time on that tomorrow, and certainly while you are here, is the important OASIS, FIM, MDS are composite scores of many different functional variables that come forward under different domains of function.

And sort of speak to that issue a little bit, so that when we are talking tomorrow, that it will help reference that discussion.

DR. KAPLAN: The OASIS and the MDS and the FIM -- well, I can really speak to as whether they are now being used or will be used as of October 1 of this year. The SIMS(?) is already being used to pay providers. It is hard for me to really be able to remember whether it pays on a composite score, or whether sometimes those scores are parsed out in different ways. However, from a quality perspective the composite score might not be what would be used.

DR. IEZZONI: Are there any other questions around the room, comments? This has been great. Thank you, all of you. That was very informative.

The energy level of the committee seems to be a little low at this time, which seems to be quite appropriate. It's been a long day. I know that Barbara is running out of here. So do the committee members have anything urgently -- Barbara, you have exactly five more minutes. Barbara is going to try to call in tomorrow. She's not going to be here. Do you have any summary comments, or have you not formulated in your head what you thought of today?

DR. STARFIELD: I thought today was really extremely helpful, because it convinced me we shouldn't be overly ambitious. It also suggested to me that maybe there is something that we can come up with that is really maybe one or two question-y type things. So I came out of the day really hopeful, and I assume we're going to talk about how we might move ahead tomorrow.

DR. IEZZONI: Your instruction on the train on the way home is to write out what those two things are.

DR. STARFIELD: I did. That's already done.

DR. IEZZONI: So you can read the to us over the phone tomorrow.

Are there any other final comments from the committee members before we all go to our respective hotels and crash? Paul, anything?

DR. NEWACHECK: The discussion today was really helpful.

DR. IEZZONI: Yes, the speakers were really, really superb. Carolyn and Susan and Gerry and Paul, I really want to thank you for putting together a great day.

Kathy?

MS. COLTIN: One thing I want to raise is the purpose question today that was raised today, and where we were challenged to think about that. One of the things that struck me is that for some of these purposes a single measure at a point in time is sufficient if you are trying to do case mix adjustment or something like that.

But for quality improvement you are looking at change scores over time. The detail in the gradients that you may need to measure when you are looking for change may be far finer than what you would need if you were just looking at the point in time. I think we are really going to be challenged to try to decide which way we want to go.

DR. IEZZONI: Any other comments?

We resume tomorrow morning at 9:00 a.m., and we will end by 1:00 p.m. So thank you everybody for coming. I appreciate it.

[Whereupon, the meeting was recessed at 4:10 p.m., to reconvene the following day, Tuesday, January 25, 2000, at 9:00 a.m.]