You Are Here: AHRQ Home > Clinical Information > Evidence-based Practice > Evidence: Its Meanings in Health Care and in Law > Commentary

The Use of Evidence and Cost Effectiveness by the Courts: How Can It Help Improve Health Care?

Commentary

By David M. Eddy, Kaiser Permanente Southern California

Notice of Copyright

This article was originally published in the Journal of Health Politics, Policy and Law. All rights reserved. This material may be saved for personal use only, but may not be otherwise reproduced, stored, or transmitted by any medium, print or electronic, without the explicit permission of the copyright holder. Any alteration to or republication of this material is expressly prohibited.

It is a violation of copyright law to reproduce any copyrighted information from this publication without first obtaining separate permission directly from the copyright holder who may charge fees for the use of such materials. It is the responsibility of the user to contact and obtain the needed copyright permissions prior to reproducing materials in any form.

Permission requests should be directed to:
Journals Division
Duke University Press
Box 90660
Durham, NC 27708
Fax: (919) 688-3524

Introduction
What Can We Expect from the Courts?
What Is the Health Care System Trying to Do?
   Interaction with the Courts
   The Method of the Courts
   The Assumptions behind the Court's Methods
   Problems with the Court's Methods
What Can Health Care Do to Address These Problems?
   Conceptual Steps
   Practical Steps
What Can the Courts Do to Help?
Precedents and Prospects
One Final Thought
References

Introduction

Although the relationship between health care and the courts has always been strained, recent developments in health care are creating new pressures that threaten to make it worse. Two particularly important ones concern the use of evidence and the application of cost-effectiveness analysis. This commentary will address these issues from the viewpoint of someone in health care. I will describe an ideal relationship with the courts; the current ways the courts address evidence and cost-effectiveness; the strengths and weaknesses of those methods; and what both sides can do to help the health care system function better. Both the legal and health care systems are enormously complex. To avoid obscuring the main points, I will focus on the big picture and plead leniency for the inevitable oversimplifications.

Return to Contents

What Can We Expect from the Courts?

From the viewpoint of someone in health care, the purpose of the legal system is to help the health care system function smoothly and correctly, consistent with the principles laid out in the constitution, its amendments, and our laws. We expect the courts to do this primarily by ensuring that contracts are followed, by resolving disputes, by judging whether reasonable expectations of performance are met, and by meting out awards and penalties. The presumption is that we in health care know what we are trying to do, and the legal system is there to help us do it. The legal system should be a facilitator and referee, not an initiator. Only if we fail to articulate our goals, or if our goals violate the principles contained in our constitution and laws, should the legal system be expected to define our goals for us.

Return to Contents

What Is the Health Care System Trying to Do?

The first sign of trouble ahead is that this is a surprisingly difficult question to answer. Our health care system is phenomenally amorphous, heterogeneous, and complex. There is no single structure or coherent leadership; we have highly variable coverage of different populations; and there are dozens of different ways of organizing, financing, and delivering care, all of which are undergoing constant change. Depending on where they sit in the system, different participants can have goals that are not only different but that actually compete with one another. Even single participants can have multiple goals that are in internal conflict. For example, a pharmaceutical company's goal of improving health can argue for putting a low price on a drug, at the same time that its goal of maximizing shareholder profits can argue for a higher price. A physician paid a fee for a service has an incentive to do a test of little value, while the same physician working in a capitated setting has the opposite incentive. Indeed, it is pushing things to talk at all about a "system," with the implications that word has for coherence and consistency.

But despite the complexities and inconsistencies of the current system, it is still helpful to imagine that there is some overriding purpose to the collection of activities we call health care. To articulate that overall purpose, we have to look beyond the immediate commercial and personal objectives of the participants and think about the ultimate purpose—the lofty goal to which all participants would lay claim. For the ultimate purpose I would propose something like "to deliver the highest possible quality of care to people, within whatever cost they want to pay."

The generality of these words intentionally skips over a variety of difficult issues. For example, it finesses all the issues around how health care is organized, financed, or delivered; whether we have a single-payer system or a heterogeneous one; who pays the immediate bills (people, employers, governments); whether access to health care is a "right" or a "commodity"; and the tension between maximizing care for an individual versus optimizing care for a population.

Furthermore, this objective does not make any ethical judgments about whether or not there should be any limit on the amount of money we spend on health care; it merely acknowledges that there may be a limit on what at least some people can or want to pay to receive health care, and that if there is such a limit, then the system that delivers health care should respect it.

In fact, let us be quite explicit about the latter point and identify two possible settings vis-à-vis costs and budgets. Imagine that in one setting people are willing to pay any cost for any care the physician recommends. In this setting, which we might say has an open-ended or unlimited budget, providers do not need to consider costs when making decisions about appropriate treatments, and aphorisms such as "costs should play no role in medical decisions" are applicable. Imagine that in the other setting there is a limit to what people can or want to pay. They either directly or indirectly (through market pressures) instruct plans to limit their costs. This in turn limits the resources available to plans for providing care. (I will use the word "plan" in a general sense to include any organization that is contracted to pay for or deliver a service, whether it is through traditional insurance or prepaid "managed" care.) In this setting, which we might say has a closed or limited budget, providers have to consider the cost of a treatment, if for no other reason than to ensure that the total stays within the budget each year. Needless to say, settings with limited budgets can vary widely in the tightness of the budget and therefore in the extent to which costs need to be considered. But notice again that at this point we are only acknowledging that these two settings and their variants can exist; we are not making any judgments about which is better, more desirable, or more ethical.

Now let us return to the statement of ultimate purpose. As general as this one is, it has the virtue that, no matter how any of the specific issues around the organization and financing of care are resolved, or even if they are never resolved, there will always be a need to do certain things. Those things include determining the effectiveness of different treatments and making judgments about whether a treatment's benefits outweigh its harms. (I will use the word "treatment" to include any type of health intervention, including prevention, testing and support care, as well as the more usual treatments.) In settings where budgets are limited, additional needs are to determine the costs of different treatments and to make judgments about their relative "values"—whether a treatment's net benefits are worth its costs or it is a good use of resources. These determinations and judgments are "fundamental" in the sense that they are intrinsic to the provision of health care and must be done no matter how we choose to organize and finance care.

Return to Contents

Interaction with the Courts

These determinations and judgments are fundamental in another way; they are the sources of the conflicts that bring health care before the courts. They do so under two main headings: coverage and malpractice.

Coverage: The question is whether a plan is obligated to pay for a treatment. In theory a plan's obligation to cover a treatment will be specified in the contract written between the plan and those who purchased the services (the "purchasers"). The question then is: Does the contract that describes the covered services include or exclude the treatment?

In theory this should be an easy question to answer, because most plans and purchasers agree in principle on what they want the contract to cover. They all generally agree that coverage should be limited to activities that address health conditions such as diseases and injuries, that involve health interventions, that are effective in doing what they are supposed to do, whose expected benefits outweigh any potential harms, and that are "reasonable" in some sense that we assume should be obvious. Good language for these criteria should do the trick.

Unfortunately, the actual language that specifies coverage suffers from several problems. The most fundamental is that there are very few precise ways to define the boundaries of coverage. For example, there is no easy way to separate a medical condition or disease from, say, a normal variation or aging; or a treatment from, say, a lifestyle. Nor are there unambiguous ways to define "effective." The magnitude of a treatment's effectiveness, the probability of its effectiveness, and our certainty about both of those, all vary continuously. A third problem is that even if there were clear definitions, they would be difficult to see because of incomplete data. Poor evidence especially plagues attempts to determine effectiveness, harms, and cost-effectiveness.

These problems have several consequences. One is that contract language varies enormously from plan to plan. Another is that definitions tend to be extremely vague (e.g., "reasonable and necessary" or "appropriate"), subjective (e.g., something is "investigational" if it is "not a generally accepted medical practice"), and unpredictable (e.g., something is covered if the treating physician recommends it). References to cost-effectiveness are especially elusive. The concept may be included as an explicit criterion (rarely), as an implicit criterion (e.g., buried in the notion of "appropriateness"), so narrowly that its applicability is trivial (e.g., "the least expensive way to achieve an identical benefit"), or not included at all. In general, existing contract language is so flabby that it is almost worthless for creating accurate expectations, making decisions, avoiding disagreements, or settling disputes.

This situation would not be so bad if we didn't expect many disagreements. Unfortunately, two additional problems create a set up for disagreements. One is that most of the homework and negotiations around contracts are done by third parties. That is, those who are actually paying the premiums (e.g., employers) are usually different from those who are actually receiving the treatments (e.g., employees). Individuals are often unaware of what is in a contract until some treatment they want is denied. The second is that, even when people are making their own purchasing decisions, their incentives and desires change dramatically between the time the purchasing decision is made and the time the care is needed. As with any transaction that involves the prediction of future events and chance, choices are different when an event is a possibility versus a reality; people want to change their bets after the wheel has been spun. When we add the fact that for coverage decisions the stakes are usually very high—often involving life and death, and thousands if not hundreds of thousands of dollars—the recipe is for trouble.

Malpractice. The issues here are different but equally slippery. Here there is no dispute about whether the treatment is covered. The disputes are about whether a patient got a treatment that was indicated and/or whether the treatment was performed properly. To determine if a patient got an indicated treatment, the issues are whether, for that particular patient with his or her particular indications and contraindications, the treatment in question is effective, has expected benefits that outweigh its expected harms, and is the most appropriate compared to other available treatments. That is, does the treatment meet the standard of care? For performance, the issue is whether the actual performance corresponds to reasonable expectations (e.g., the correct drug was given at the correct dose, the correct leg was operated on, no instruments were left in the abdomen).

At least three problems make issues of malpractice difficult. First, because individual patients have a bewildering variety of individual characteristics, histories, signs, symptoms, and behaviors, the range of uncertainty around the effectiveness of any particular treatment in any particular patient can be enormous and easily capable of harboring wide differences of opinion. Second, the outcomes of virtually all treatments are highly probabilistic: a treatment does not cause a particular outcome to occur, it only changes the probability it will occur. Furthermore, the effect of the treatment on the chance of the outcome is often small compared to other factors such as the patient's characteristics (risk factors) and the severity of the patient's disease. This means that practitioners cannot control the outcome nearly as much as a patient might expect. The third problem was also stressed in this special issue in the article by John M. Eisenberg: that medical decisions have to be made prospectively, whereas determinations of malpractice are made retrospectively. Prospectively, the practitioner may not know the patient's condition with certainty and never knows the outcome the treatment will produce. The choice of a treatment has to be based on the odds. Retrospectively, everything is known, it is obvious whether the choice of the treatment was right or wrong, and the patient is either happy or upset. It can be very difficult for juries and even other physicians to appreciate the appropriateness of a decision made prospectively, after the ultimate answer has become known. Given all these problems, there is considerable room for misunderstandings, disappointments, and disagreements.

Return to Contents

The Method of the Courts

When cases of coverage or malpractice come before a court, the ultimate problem for the court is to determine whether what was done was appropriate in some sense. To deal with the inherent ambiguities in a word like "appropriate," it is extremely helpful for the court to have some reference or comparison. The underlying idea is that there is a correct "standard of care" out there, either a correct conclusion about the coverage of a treatment or a correct way to manage a patient. If several treatments provide roughly equal benefits to a patient, there may be more than one way to achieve the standard of care, but there is still a threshold of quality that we call the standard of care. The task of the court is to determine the standard of care and make a judgment about whether what was done meets that standard.

The traditional method used by the courts to determine the standard of care is to apply what might be called an "internal test." Rather than try to determine the standard of care for itself, the court hears the testimony of experts from within health care who are expected to know the standard of care and compares their answers to what was done. What the court actually hears depends on what the lawyers want to present, but it generally involves the answers to any of three questions: What do the majority of practitioners do? (What is the "community standard"?) What do groups of experts say (e.g., the recommendations of specialty societies or national panels)? Or what does an individual expert believe? While the community standard could conceivably be determined empirically, perhaps by polling community physicians or analyzing databases, it is usually learned by asking experts.

So in the end, it comes down to the testimony of experts. If all the experts agree, the court's job is easy. If the experts disagree, which they always do in an adversarial setting, then the court has to evaluate the credibility of the experts and decide which ones are correct. To do this, the court has two main sources of information. One is an assessment of any supporting evidence presented by the expert. The other is an assessment of the overall credibility of the expert based on such things as the expert's training, academic affiliations, experience, articulateness, and ability to build rapport with a jury.

At this point we should pause to note that the heavy reliance on experts by the court raises a question about differences in the "standards of evidence" used by the courts and health care. In health care, evidence means empirical observations. They may be systematic observations using rigorous experimental designs or nonsystematic observations (e.g., experience), but they are all empirical observations of real events. In the legal system, "evidence" means whatever is put before a court. On the surface this may appear to be fundamentally different standards of evidence. But in fact, at least for the use of experts in health care cases, the theoretical intent is the same. To see this we first need to appreciate that in the end, the only way for anyone to learn a fact is through empirical observations. (The only other methods are things like revelation and divination, which do not meet either the court's or health care's definition of evidence.) When the court asks an expert to express a belief about a fact, the court is presuming that that expert will be basing his or her belief on empirical observations, not revelations, dreams, or ancient texts. Thus the role of the expert is to serve as an interpreter of empirical observations. The assumption is that the beliefs developed by the expert will be accurate interpretations of all the pertinent empirical evidence. Thus the expert is not there to introduce a different kind of evidence that is nonempirical, but to introduce the conclusions learned from the empirical evidence. The evidence ultimately desired by the courts is the evidence also used in health care, it is just obtained secondarily, through the expert, rather than primarily, through direct study of empirical observations.

Having said all this, it is also important to note that what the court actually hears can vary from pure primary evidence to pure secondary evidence. The actual information imparted to the court by an expert may be a detailed description of the empirical evidence, without any judgments about what the evidence says about the case, or a pure opinion about the case, without any reference to the evidence behind the opinion, or anything in between these two poles.

Return to Contents

The Assumptions behind the Court's Methods

On its face, the court's approach seems quite reasonable. Certainly it is far easier for a court to listen to and evaluate experts than it is for a court to learn medicine, learn experimental methods, read scores of reports of clinical trials, weigh benefits against harms, and so forth. It is also very respectful of the professionalism in medicine. However, the validity of its approach rests on several assumptions. The assumptions differ slightly, depending on whether the method of learning the standard of care is by reference to a community standard, through the testimony of individual experts, or through the conclusions of a group of experts.

When the reference is to a community standard, the main assumptions are:

There is a standard of care. That is, everyone agrees generally on the threshold of quality that should be achieved and that there is at least one way to achieve it.
There is a community standard and it matches the standard of care. That is, the majority, or at least a substantial proportion of practitioners know what the standard of care is and do it.
There is no other practice that a majority or substantial proportion of practitioners do that does not meet the standard of care. (Otherwise such a non-standard-of-care practice might be interpreted as the standard of care.)
To be meaningful as an indicator of quality, the standard of care that is revealed through the community practice should be consistent across communities, unless there are differences in physical resources that can explain differences in practices.
The community standard can be learned from experts. That is, at least some experts know what practitioners are actually doing, presumably through observation or analysis of data.
These experts will present their beliefs without bias.
If experts disagree about the community standard, the courts can differentiate the "true" experts from the "false" experts. This in turn requires assumptions that (1) to the extent an expert has presented the empirical evidence on which his or her beliefs are based, the courts can evaluate that empirical evidence, and (2) to the extent an expert's credibility is based on less direct measures (e.g., training, affiliations, experience), there is a good correlation between those indirect measures and the accuracy of the expert.

When individual experts are the sources of information about the standard of care, some additional assumptions are:

There are at least some experts (true experts) who know the true standard of care.

When the reference is to a committee of experts, there is usually an additional assumption that:

A group of experts will be more accurate and less biased than individual experts. Two ways this can occur are that (1) whereas no single expert may have all the required knowledge or experience, a combination of experts will; (2) whereas a particular expert may be biased, the biases of a collection of experts will cancel out.

There is one more important assumption that is contained in the idea of the existence of a standard of care and that should be made explicit. It is that all the practitioners who make up a community standard, or all the experts who provide individual testimony or serve on committees are in basic philosophical agreement about what constitutes high-quality care. If this is not the case, then there might be a standard of care for each philosophy. This assumption harks back to the premise that the role of the legal system is to help the health care system do what it is trying to do. Unless the health care system agrees with itself about what it is trying to do, the legal system could be led down hopelessly tangled paths as practitioners and experts play out their different philosophies.

Problems with the Court's Methods

Each of the assumptions underlying the court's approach to health care cases seems reasonable, and until fairly recently there was little reason to question them. Unfortunately, we now have incontrovertible evidence that virtually none of them are reasonable.

Begin with the use of experts to learn the community practice. The assumption is that an expert can somehow "see" what all (or a random sample?) of the physicians in a community are doing either first hand or by analyzing data. In fact it is a major research task to figure out what practitioners in a community are doing. When an expert answers a question about a community standard it is extremely unlikely that he or she has any real data on actual practices. It is far more likely that what an expert believes is the practice in a community is what the expert personally believes should be the standard of care. In practice, questions to an expert about a community standard are really more like questions about a personal belief.

But more important, the idea that a community standard exists at all has been shattered by hundreds of studies of variations in practice patterns and inappropriate care. When rates of procedures vary across communities by factors of two, five, ten and more, and when 10 percent, 20 percent, 50 percent, even 70 percent of practices are judged by peers and experts to be inappropriate or equivocal (providing no advantage of benefits over harms), it is impossible to believe that there is a single community practice out there that the majority of practitioners are following. Indeed, given the very high rates of inappropriate care that can prevail in communities, if we actually measured what practitioners were doing and used that to define the standard of care, we would run a high risk of installing an inappropriate practice as the standard of care. The well-documented overuses of hysterectomies, antibiotics, bypasses, and C-sections are examples.

Nor can we continue to take the opinions of experts at face value. New studies continually reveal that practices that were once accepted without doubt can turn out to be worthless or even harmful. We were wrong about diethylstilbestrol, radical mastectomies, erythropoetin for anemia in end-stage renal disease, hyponatremic encephalopathy, treatment of ingested poisons, hormone replacement therapy for heart disease, and class I anti-arrhythmics for heart attacks. Experts from top universities with the most experience testified under oath that high-dose chemotherapy for late-stage breast cancer would produce 20 to 30 percent long-term cure rates. Randomized controlled trials later proved them wrong. This is not to say that all experts are always wrong. It is to say that we cannot assume they are right, and there is no easy way to tell when they are and when they are not from their credentials or enthusiasm.

Nor can we take at face value the recommendations of a group of experts, such as a specialty society committee or a national panel. In addition to examples of committee recommendations that have later been proven wrong, we have the fact that the recommendations of many committees disagree with each other. This is especially obvious for treatments that represent organizational turfs. Examples include cancer screening and the management of back pain.

At this point it is important to stress that the fallibility of experts and the variations in community practices do not mean that experts and practitioners are inadequate or arbitrary, much less petty, dumb, or greedy. The fundamental problem here is that human biology, diseases, and medicine are phenomenally complex, and the complexity simply exceeds the capabilities of human subjective reasoning. There is no more shame in this than there is in the fact that people can't calculate their income tax in their heads. It is tempting to imagine that there was a time when the practice of medicine was simpler and experts could form accurate beliefs from subjective reasoning, but treatments like leaching and strychnine suggest that the mismatch in complexity has always existed. But whatever the strengths of subjective reasoning in the past, it is clearly inadequate now. Recognizing this fact does not imply disrespect, it implies confidence that we all want to provide the best care possible, and that there is no way to correct a problem without facing it.

There is one more threat to the court's method that needs to be addressed. It strikes at the fundamental assumption that there is a standard of care. As stated above the concept of a single standard of care implies that everyone who is a part of determining that standard—physicians, patients, purchasers, and others—agrees on what constitutes high-quality care. Until recently that was generally true. However, two recent developments have caused serious splits in the principles people use to define the standard of care. Not surprisingly, the principles involve evidence and costs. Depending on which principle one applies, one can arrive at different standards of care.

With respect to evidence, the split is between a traditional approach that gives heavy weight to it, and a more recent approach that is much more skeptical of subjective reasoning and much more strict about requiring good evidence. In this context it is important to distinguish "subjective reasoning" from the related ideas of "clinical judgment" and the "art of medicine." Subjective reasoning is the attempt to pull all the facts that are pertinent to a question about a treatment into one's head and synthesize them into a conclusion, without using formal, explicit methods such as a statistical analysis of data. Clinical judgment and the art of medicine both can include subjective reasoning. But they have additional elements that are not included in subjective reasoning, such as empathizing with a patient, discussing a patient's preferences for benefits and harms, addressing a patient's hopes and fears, and helping a patient cope with a problem. Everyone agrees that these are important and there are no formal methods for accomplishing them that can replace clinical judgment and the art of medicine. But there are formal methods that can enhance subjective reasoning when it comes to estimating the expected effect of a treatment on the probability of some health outcome.

With this distinction in mind, the different weights placed on subjective reasoning have profound implications for the standard of care. The skepticism about subjective reasoning and requirement for empirical evidence can lead to a policy that a treatment should not be affirmatively recommended through a guideline or made the standard of care (or if it is a new treatment, covered by the plan) unless there is good evidence it is effective and beneficial. On the other hand, those who prefer the former approach would acknowledge that good experimental evidence is highly desirable, but they would argue that it is unrealistic and too strict to require good evidence, and that a practice might be recommended or even made the standard of care if there is suggestive evidence, or if it is supported by biological theory, or even if it just provides a hope of effectiveness.

With respect to cost, the split is between a position that took root in the era of open-ended budgets fed by indemnity insurance and a more recent position that is trying to respond to marketplace demands that costs be controlled. The former holds that costs should play no role in medical decisions. The latter says just the opposite, that costs have to be considered in medical decisions. The differences can create two very different standards of care. In settings where there are no limits on costs, a standard of care should include treatments that provide even small benefits, regardless of their costs. Examples are tPA as a blood thinner (instead of streptokinase) for acute heart attack, nonionic (instead of ionic) contrast agents (dyes) for certain radiographic procedures, and annual Pap smears (instead of three-year Pap smears) for cervical cancer. However, in settings where budgets are limited, a very different standard of care should apply. There, the standard of care should include streptokinase, ionic contrast agents, and three-year Pap smears, because the marginal gains in benefit of the more expensive alternatives are tiny compared to their costs, and the money could be put to better use elsewhere.

These disagreements about the roles of evidence and costs in defining the standard of care can wreak havoc on court cases. The risk is that decisions made in one setting, where one standard of evidence or cost is being applied, will be judged by experts from another setting that applies different standards. A good example is the case of high-dose chemotherapy and autologous bone marrow (HDC/ABMT) or stem-cell transplant for late-stage breast cancer. Refusals to cover this treatment, on the grounds that it was experimental, were based on a lack of evidence from well-controlled trials of an effect on health outcomes. But the experts who testified for coverage developed their beliefs from biological models ("more is better"), uncontrolled clinical series, and the effects on intermediate biological outcomes (response rates). If the philosophical differences had been made clearer to the courts, and if contract language had been more precise about which standard of evidence would be applied, the courts might have been able to sort it all out. But as the cases actually played out, the courts came to very different decisions in different cases, reflecting the conflicts in philosophies behind the testimonies presented to them. Far from helping the courts deal with our internal philosophical disputes, we are using the courts as a battleground for fighting them.

Return to Contents
Proceed to Next Section