CHERYL L.DAMBERG, MELONY SORBERO, ATEEV MEHROTRA, STEPHANIE TELEKI, SUSAN LOVEJOY, AND LILY BRADLEY
WR-474-ASPE/CMS
November 2007
Prepared for the Assistant Secretary for Planning and Evaluation, US Department of Health and Human Services
WORKING PAPER
This product is part of the RAND Health working paper series. RAND working papers are intended to share researchers’ latest findings and to solicit additional peer review. This paper has been peer reviewed but not edited. Unless otherwise indicated, working papers can be quoted and cited without permission of the author, provided the source is clearly referred to as a working paper. RAND’s publications do not necessarily reflect the opinions of its research clients and sponsors.
CONTENTS
PREFACE
TABLES
SUMMARY
ACKNOWLEDGEMENTS
ABBREVIATIONS
INTRODUCTION
Background
Development of the Value-Based Purchasing Plan
Content and Structure of This Report
A REVIEW OF THE EVIDENCE ON HOSPITAL PAY FOR PERFORMANCE
Summary of the Empirical Evidence on the Impact of Hospital Pay for Performance
Theoretical Literature and implications for p4p design
Limitations in using Economic Theories to Predict Behavioral response
Conclusions
SUMMARY OF DISCUSSIONS WITH PAY-FOR-PERFORMANCE PROGRAM SPONSORS
Methodological Approach
Findings From Discussions with Program Sponsors
Critical Lessons Learned
IV. SUMMARY OF DISCUSSIONS WITH HOSPITALS, HOSPITAL ASSOCIATIONS, AND DATA VENDORS
Methodology
V. SUMMARY OF FINDINGS FROM ENVIRONMENTAL SCAN
APPENDIX A: DESIGN ISSUES EXPLORED AS PART OF THE ENVIRONMENTAL SCAN
APPENDIX B: SUMMARY OF PAY-FOR-PERFORMANCE DESIGN PRINCIPLES
APPENDIX C: INPATIENT HOSPITAL MEASURES
APPENDIX D: LIST OF ORGANIZATIONS PARTICIPATING IN THE ENVIRONMENTAL SCAN
REFERENCES
TABLES
Table 1: Design Issues Explored with Program Sponsors and Hospitals
Table 2: Key Terms Used to Search the Literature for Hospital P4P Studies
Table 3: Summary of Design Features of P4P Programs Contained in Published Evaluation Studies
Table 4: Summary of Evaluation Studies Examining Hospital P4P Programs
Table B.1. P4P Principles and Recommendations from Stakeholders
Table B.2. Summary of P4P Design Principles and Recommendations
In recent years, pay-for-performance (P4P) programs have emerged as a strategy for driving improvements in the quality, safety, and efficiency of delivered health care. In 2005, with passage of the Deficit Reduction Act, Congress mandated that the Secretary of the Department of Health and Human Services (DHHS) develop a plan for value-based purchasing (VBP) for Medicare hospital services. VBP is one strategy for modifying the payment system to incentivize improvements in the quality of care delivered to beneficiaries in the Medicare program. The use of incentives—by paying differentially for performance—is a key component of building a value-driven health care system as called for by the DHHS Secretary’s Four Cornerstones Initiative.
To inform the development of the VBP plan for Medicare hospital services, the Assistant Secretary for Planning and Evaluation (ASPE), in collaboration with the Centers for Medicare & Medicaid Services, contracted with the RAND Corporation to conduct an environmental scan of the hospital P4P landscape. This report presents the results from the environmental scan of P4P and pay-for-reporting (P4R) programs; it also includes a review of the empirical evidence about the impact of these programs, a description of program design features, and a summary of lessons learned from currently operating P4P and P4R programs about the structure of these programs and implementation issues.
This work was sponsored by ASPE under Task Order No. HHSP233200600001T, Contract No. 100-03-0019, for which Susan Bogasky served as the Project Officer.
Mounting cost pressures and substantial deficits in the quality of care within the U.S. health care system have led policy makers to consider various reform options. Pay for performance (P4P) has emerged as a leading reform strategy, in an effort to stimulate improvements in the quality, safety, and efficiency of delivered health care (IOM, 2006). In 2005, Congress passed the Deficit Reduction Act (DRA, Public Law 109-171, Section 5001(b)), which mandated that the Secretary of the Department of Health and Human Services (DHHS) develop a plan for value-based purchasing (VBP) for Medicare hospital services that would commence in Fiscal Year (FY) 2009. VBP, which is being applied by payers in both the public and private sectors, includes the use of both financial (e.g., P4P) and non-financial (e.g., transparency of performance scores) incentives to change the behavior of providers and the systems within which they work.
The use of incentives—by paying differentially for performance—and measuring and making quality information transparent are key components of building a value-driven health care system, as called for by the DHHS Secretary Leavitt’s Four Cornerstones Initiative. In support of this initiative, CMS has taken a number of steps toward using incentives and making quality information transparent, by funding pay-for-performance demonstrations in the hospital, physician, and home health settings, and by implementing pay for reporting (P4R) for hospitals, through the Reporting Hospital Quality Data for Annual Payment Update (RHQDAPU) program, and for physicians through the Physician Quality Reporting Initiative (PQRI).
AN ENVIRONMENTAL SCAN OF HOSPITAL PAY FOR PERFORMANCE
The DRA required the Secretary of the DHHS to consider the following design elements when developing the VBP plan: (1) the process for developing, selecting, and modifying measures of quality and efficiency; (2) the reporting, collection, and validation of quality data; (3) the structure, size, and source of value-based payment adjustments; and (4) the disclosure of information on hospital performance. The CMS Hospital VBP Workgroup was delegated the task of developing the VBP plan for Medicare hospital services.
To inform the development of the VBP plan the Assistant Secretary for Planning and Evaluation (ASPE) and CMS issued a contract to the RAND Corporation to conduct an environmental scan of the hospital P4P landscape. The environmental scan, conducted between August of 2006 and June of 2007, included:
To take advantage of the experimentation going on nationally with respect to P4P program design and implementation, discussions were held with 27 program sponsors, 28 hospitals, 7 hospital associations, 5 data support vendors, and a number of individuals with expertise in rural hospital issues. The discussions were necessary because this type of descriptive information and this level of detail about program design are not typically contained in peer-reviewed journal articles that summarize the results of P4P interventions. Additionally, many of the demonstration experiments are still in their infancy, and little has been formally documented about the related experiences. This report summarizes the findings from the environmental scan.
FINDINGS FROM THE LITERATURE REVIEW
The Empirical Literature on Hospital P4P
As of June 2007, few peer-reviewed studies existed on the use of financial incentives and their impact on quality, patient experience, safety, or the efficient use of resources. While more than 40 hospital-based P4P programs are operating in the U.S., little empirical evidence has emerged from these payment reform experiments to gauge the impact of hospital P4P in meeting programmatic goals or to understand how various design features affect such things as engagement in the program, the likelihood of creating unintended consequences (such as reductions in access to care for more difficult patients), or the distribution of payments to providers. Few P4P programs are undergoing formal evaluations to assess their impact, and challenges arise in conducting evaluations of real-world applications because the applications generally lack a comparison group that is required to assess the impact of the P4P intervention.
We reviewed the literature between January 1996 and June 2007 and found only nine published studies that address the impact of three separate hospital P4P programs in which formal evaluations have been occurring:
Of the eight studies examining changes in performance, each one reported improvements over time in at least some of the hospital performance measures or condition-specific composites included in the specific study; however, it is difficult to disentangle the P4P effect from the effect of other quality improvement efforts that were occurring simultaneously. The strongest evidence on the impact of hospital P4P to date has been shown through the Lindenauer (2007) study of the impact of PHQID relative to the Medicare RHQDAPU program. These studies, while showing a positive effect of P4P, reveal that the additional effects of P4P are somewhat modest relative to public reporting and other quality interventions that are occurring simultaneously. Improvements in hospital performance have been observed in response to feedback reports (Williams et al., 2005) and public reporting, with a financial incentive for submitting data (Grossbart, 2006; Lindenauer et al., 2007). One study found improvements in a few performance areas associated with P4P as compared with what was seen for control hospitals participating in voluntary quality improvement activities (Glickman et al., 2007). It has been argued, however, that in order to accomplish sustained quality improvement, interventions should be multifaceted and focus on different levels of the health care system (Grol et al 2002; Grol and Grimshaw 2003). This suggests that to be most effective, P4P should be partnered with other activities such as public reporting and internal quality improvement activities, that also encourage quality improvement for the same clinical area.
There is less evidence of the effect of P4P on patient outcomes. One study (Berthiaume et al., 2006) found reduced complication rates for obstetrical and surgical patients in an uncontrolled study, though it was not reported whether those improvements were statistically significant. Glickman et al. (2007) did not find significant differences in inpatient mortality improvement for AMI between PHQID and control hospitals exposed to an AMI quality improvement intervention.. None of the studies evaluating PHQID separately analyzed the other patient outcome measures (for coronary bypass survey and hip and knee replacement surgery) included in the program, so it is not clear whether improvements occurred in these measures.
Most of the published studies have significant methodological limitations. Six of the nine had no controls, which are critical for providing evidence of a link between P4P and performance improvements. This is particularly important given the documented temporal trend toward increasing performance on many hospital quality metrics. Another important issue to consider is whether the experience of these smaller-scale incentive programs, with the exception of the PHQID, could be generalized to reflect what the effects would be of wholesale national implementation of a hospital P4P program by Medicare.
Theoretical Literature and Implications for P4P Design
P4P is common in industries other than health care, and economists and management experts have studied and developed theories on how individuals respond to financial incentives. The economic and management theories that we reviewed suggest that the way in which P4P incentives are structured, or framed, could influence whether they achieve the desired behavioral response. Among the key highlights of this literature review:
FINDINGS FROM THE KEY INFORMANT DISCUSSIONS
Design Lessons
Discussions with program sponsors, hospitals, and data vendors revealed the following lessons about P4P program design and operation:
Payment structures—Existing P4P programs primarily make reward payments on the basis of improving over time or relative performance. Hospitals universally agreed that payment structures should use absolute thresholds and reward all good performers, rather than providing incentives on a relative-performance basis (such as paying only to the top 10 or 20 percent of hospitals participating in a P4P program). This was seen as critical when the measures of performance used have scores that “top out,” reflecting little meaningful difference in the performance across most hospitals. Programs sponsors felt strongly that performance improvement as well as attainment of specific benchmarks should be included as a component of the payment structure, at least in the early years of a P4P program, in order to engage all hospitals. Hospitals also noted the difficulty of getting physicians to change their behavior absent aligned incentives on the physician side, and called for program sponsors to create parallel physician incentives focused on inpatient care for the same conditions used in hospital programs.
Absence of Knowing What Works—Because P4P is a newly emerging reform tool and little information is currently available about the impact of P4P or the influence of various design structures on P4P outcomes, P4P programs should incorporate evaluation and ongoing monitoring into their design as a means of building a knowledge base. Hospitals and P4P program sponsors recommended allowing experimentation, which would create models where learning could occur to inform future design structures. The discussants noted that the results of P4P may differ as a function of the program design features as well as the varying structure of local health care markets, and that much could be gained from examining the experience of these local experiments. Collecting and broadly disseminating this type of information will be critical to future efforts to construct P4P programs so that they can meet their programmatic objectives. Funding will be necessary to support program evaluation, and the evaluation work needs to be sustained over multiple years to fully assess impact and monitor for unintended consequences.
Program Implementation Challenges
The environmental scan also uncovered a number of program implementation challenges that warrant consideration during program design and implementation.
The small numbers problem: A sizeable number of hospitals have only a small number of events or cases to report for one or more measures. A small number of events to score will result in unstable estimates of performance as a basis for determining performance-based incentive payments. While this is a more acute problem for small and rural hospitals with a small number of patients per year, the problem also occurs in some medium- and large-size hospitals depending on their service mix, the details of measure specifications, and the use of sampling during data collection. Using all-payer data, collecting and aggregating data over longer periods of time, using composite measures,1 and identifying measures relevant to smaller providers are approaches that can help to mitigate the small numbers problem and allow for the construction of more stable estimates of performance.
The Burden of Data Collection: The data collection burden, which affects how many measures a P4P program can reasonably require a hospital to collect and report, creates challenges for efforts to comprehensively assess the performance of hospitals given the wide range of care and services provided within hospitals. The more comprehensive the measure set used, the greater the burden on hospitals in the near term, given that most of the data needed to construct performance measures is contained in paper medical records. In most cases, hospital information systems are not yet equipped to capture and easily retrieve the clinical information used to create performance measures, nor are they structured to enable routine monitoring of quality of care. Until health information systems are upgraded to capture this information, program sponsors may be constrained in the number and breadth of measures they can expect hospitals to collect and report. Once effective information systems are built and put into place, the number of measures included in a P4P program could be expanded.
Ensuring the Validity of Data used to Make Differential Payments: P4P programs are also challenged with an acute need to ensure the integrity of the data used to score hospitals and make differential payments, which requires resources for data validation. Allocating sufficient resources to validation work is critical for program credibility, and today only limited resources are being used for data validation within P4P programs. Most hospitals stated that the current level of validation is insufficient, and the incentives to game the system will increase as the amount of money at risk in P4P programs increases.
In summary, P4P programs have the potential to drive system improvements but their impact is likely influenced not only by their design but also by what other structures are in place to support P4P—such as enhanced information systems for quality monitoring and feedback, aligned payments across all providers, and transparency. The success of these programs in meeting improvement goals likely will be affected by their design, how they are implemented, and whether sufficient resources are allocated to provide the necessary day-to-day support for program operations and ongoing modification of the program.
Hospitals understand that P4P is likely to be part of their future and generally seem supportive of the concept. They face a number of challenges to their ability to successfully participate in these programs, including lack of physician engagement, inadequate information infrastructure that necessitates the manual collection of data from charts, and potentially conflicting signals from various organizations measuring hospital performance. These implementation challenges are important to consider carefully in the design of any hospital P4P program.
We gratefully acknowledge the sponsors of the pay-for-performance programs and the hospitals, hospital associations, and data vendors whose people willingly made the time to participate in individual discussions with us. They offered us valuable information and insights about their experiences in designing and implementing pay-for-reporting and pay-for performance programs.
We also extend our appreciation to the members of our Technical Expert Panel—Dr. Elliott Fisher of Dartmouth University, Dr. Jack Wheeler of the University of Michigan School of Public Health, Dr. Dale W. Bratzler of the Oklahoma Foundation for Medical Quality, and Dr. Howard Beckman of the Rochester Individual Practice Association—for their thoughtful review of the discussion guides to help ensure that pertinent topics and issues were addressed and their review of this report. In addition, we appreciate the assistance provided by Geoff Baker of Med-Vantage in helping us construct and narrow the list of candidate hospital pay-for-performance programs with which we held discussions. Finally, we thank Susan Bogasky, from the Assistant Secretary for Planning and Evaluation, who served as Project Officer for this contract. We also appreciate the guidance and feedback provided by Dr. Julie Howell, Project Coordinator Hospital VBP, CMS Special Program Office for Value-Based Purchasing, and Dr. Thomas Valuck, Director, CMS Special Program Office for Value-Based Purchasing.
The Cost and Quality Problems
Substantial, well-documented deficiencies exist in the quality of care that is provided to patients in the United States (Institute of Medicine [IOM], 2001; Schuster, McGlynn, and Brook, 1998; Wenger et al., 2003). In a landmark study published in 2003, McGlynn et al. (2003) found that adult patients received only about 55 percent of recommended care and that adherence to clinically recommended care varied widely by medical condition. The follow-on analysis, conducted by Asch et al. (2006), found that the quality deficit was persistent across all sociodemographic subgroups and that although quality of care varied moderately across the sociodemographic subgroups, there was substantial underuse of recommended care regardless of income, race, or age. Other studies, such as those by Fisher et al. (2003a and b), have shown that among Medicare beneficiaries, there is substantial regional variation in the use of services and health spending. Also, regions where more services were provided did not show additional benefit to patients either through improved outcomes or improved satisfaction with care. These studies highlight that problems occur in both the underuse of recommended care services and the overuse of services.
Health care costs continue to rise at a steady pace and are anticipated to account for 18.7 percent of gross domestic product by 2014 (Heffler et al., 2005). In 2006, the federal government spent $600 billion for Medicare and Medicaid for care delivered to its approximately 87 million beneficiaries; and it is anticipated that by 2030, expenditures for these two programs will consume 50 percent of the federal budget, a financial burden that will place funding for other discretionary programs at risk (McClellan, 2006). To improve quality and hold down growth in the costs of the Medicare and Medicaid programs, the Centers for Medicare & Medicaid Services (CMS) will need to explore alternatives to existing policies and practices.
The Disconnect Between Payments and Performance
Existing mechanisms for paying hospitals, both Medicare’s per-hospitalization payments using diagnosis-related groups (DRGs) and the per diem payments used by commercial payers, do not differentiate payments to hospitals providing efficient, high quality care. Current payment policies in both the public and the private sector reward the quantity rather than the quality of care delivered and provide neither incentive nor support for improving quality of care. Historically, hospitals have gotten paid the same regardless of the quality of care they provided and, in some cases, may have even received additional payment for treatment of avoidable complications and for readmissions and complications that occurred as a result of providing poor quality care. Starting in 2008, CMS has announced that it will no longer pay Prospective Payment System (PPS) hospitals for the additional costs of certain preventable conditions acquired in the hospital (CMS, 2007a).
Calls for System Reform
The 2001 IOM report Crossing the Quality Chasm called upon policymakers in the public and private sectors to make reforms that would address problems of quality and inefficiencies. A key reform recommended by the IOM was to create financial incentives for quality and to make performance information transparent to ensure public accountability. More recently, the IOM made specific recommendations for implementing payment rewards for performance within Medicare in its 2006 report titled Rewarding Provider Performance: Aligning Incentives in Medicare. Additionally, the Medicare Payment Advisory Commission (MedPAC), which advises the U.S. Congress on issues related to the Medicare program, has recommended that Medicare adopt pay for performance (P4P) across various settings, including Medicare Advantage plans and dialysis providers and hospitals, home health agencies, and physicians (MedPAC, 2005).
Federal Actions to Reform the System
On August 22, 2006, President Bush issued an Executive Order, “Promoting Quality and Efficient Health Care,” that requires the federal government to: (1) ensure that federal health care programs promote quality and efficient delivery of health care and (2) make readily useable information available to beneficiaries, enrollees, and providers. These actions are designed to drive improvements in the value of federal health care programs.
To support this mandate, Department of Health and Human Services (DHHS) Secretary Michael Leavitt embraced “four cornerstones” for building a value-driven health care system:
Building on these four cornerstones, CMS has taken steps toward using incentives and making quality information transparent in order to become a value-based purchaser of care. The steps taken include funding a number of demonstrations regarding use of financial incentives across hospital, physician, and home health settings, and implementing pay for reporting (P4R) for hospitals and physicians through the Reporting Hospital Quality Data for Annual Payment Update (RHQDAPU) program and the Physician Quality Reporting Initiative (PQRI). In particular, the RHQDAPU program, which was mandated under the Medicare Prescription Drug Improvement and Modernization Act of 2003 (MMA),2 required hospitals to submit data on a defined set of performance measures to receive 0.4 percentage points of their annual payment upda(APU). The performance data from RHQDAPU are made transparent to Medicare beneficiaries and the public through the CMS Hospital Compare website (http://www.hospitalcompare.hhs.gov ). Section 5001(a) of the 2005 Deficit Reduction Act (DRA) expanded the set of RHQDAPU P4R performance measures and increased the differential payment for reporting from 0.4 to 2 percentage points.
The 2005 DRA also authorized the DHHS Secretary, under Section 5001(b), to develop a plan for value-based purchasing (VBP) for Medicare hospital services commencing fiscal year (FY) 2009. Congress specified that the VBP plan consider the following design issues:
Through implementation of VBP for Medicare hospital services, CMS would provide differential payments to hospitals based on their performance (i.e., P4P).
In response to the DRA mandate, CMS created an internal hospital VBP workgroup with responsibility for developing the VBP plan. To inform the development of the plan, the Assistant Secretary for Planning and Evaluation (ASPE), in collaboration with CMS, contracted with the RAND Corporation in July 2006 to conduct a literature review to synthesize the empirical evidence that exists on P4P in the hospital setting and an environmental scan of the existing P4P landscape.
To take advantage of the experimentation going on nationally with respect to P4P program design and implementation, RAND held discussions with P4P program sponsors, hospitals, hospital associations, data support vendors, and organizations experienced with small and rural hospitals to capture the array of experiences connected with the design and implementation of P4P and P4R programs. The discussions were necessary because this type of descriptive information and this level of detail about program design are not typically contained in peer-reviewed journal articles that summarize the results of P4P interventions. Additionally, many of the demonstration experiments are still in their infancy, and little has been formally documented about the related experiences.
RAND was tasked to:
Table 1 highlights core design issues that were examined as part of the environmental scan. Appendix A contains a complete listing of the design issues that were explored.
Issue Type: | Issue: |
---|---|
Overview | The goals of existing P4P programs and demonstrations in the hospital setting |
Whether and how hospitals were included in the design and implementation of P4P and P4R programs | |
The mechanisms used to monitor for unintended consequences, such as inappropriate clinical care or gaming of data to secure bonus dollars | |
Lessons learned by organizations with P4P and P4R programs in practice or participating in demonstrations | |
Measures | The measures of performance (clinical effectiveness, efficiency, patient experience, care coordination/transitions, etc.) that are currently being used for both inpatient and outpatient hospital care in practice and in demonstrations |
The measures selection criteria being used by P4P and P4R programs | |
Methodological issues around P4P, including the level of aggregation of measures (i.e., composite scoring, weighting); the establishment of benchmarks, thresholds, and targets; risk adjustment; and opportunities for gaming | |
Data | The data collection, data management, reporting infrastructure, and data outreach required to implement existing P4P programs |
Methods being used to validate data for use in P4P programs | |
Payment Mechanism | The types of incentives, financial or non-financial, that currently exist or are under consideration, and what has been learned from various incentive structure designs |
Examining the basis for payment, such as paying on meeting a threshold, improvement, and/or high achievement | |
The levels (fixed dollar, percentage of payments) and types (negative versus positive) of financial incentives being used | |
Public Reporting | How information from public reporting systems is being used, and the impact of this information |
Strategies for simplifying public reports to facilitate use and understanding | |
Outpatient | Whether outpatient hospital services should be incorporated into VBP in the future |
Extent to which current P4P programs include measures of hospital outpatient services |
This chapter builds the foundation for subsequent chapters of this report by defining P4P and its dimensions and by providing the policy context underlying the rationale for P4P as a system reform strategy.
Defining Value-Based Purchasing
VBP is a strategy that strengthens the link between quality and provider payments by rewarding providers that deliver high-quality, cost-efficient care. VBP encompasses a number of activities that can be used individually or as a mutually supportive set to engender provider behavior change. One activity that falls under the VBP umbrella and has garnered much attention and interest in recent years is P4P. P4P explicitly links health care providers’ pay to their performance on a set of specified measures such that better-performing providers receive higher payments than do lower-performing providers. The term provider, which we use throughout this report, encompasses a broad spectrum of health care providers: hospitals, individual physicians, physician practices, medical groups, and integrated delivery systems.
P4P programs seek to align measurement of and payments to providers with a program sponsor’s goals, such as the delivery of high-quality, cost-efficient, patient-centered care. For example, if a program sponsor is seeking to improve patient outcomes, the program will include either measures of risk-adjusted mortality or complications rates or clinical measures, such as the provision of disease-specific services. If that program sponsor also seeks to improve the cost efficiency of care, the program may also include readmission rates or risk-adjusted length of stay. P4P programs are designed to financially reward those providers whose performance is consistent with the program sponsor’s identified goals.
Three other mechanisms that use financial and non-financial incentives also seek to incentivize changes in provider and/or consumer behavior as means to improve quality and efficiency in health care delivery. These three mechanisms were excluded from our environmental scan of P4P in the hospital setting per se, although public reporting is often a component of P4P programs and is a core quality improvement strategy that CMS is currently implementing through the RHQDAPU program. The mechanisms are as follows:
Principles for Pay-for-Performance Programs
Numerous organizations have developed design principles for P4P programs in the hopes of influencing how CMS and other P4P sponsors structure their P4P programs (see Appendix B). Among these organizations are MedPAC, the Joint Commission, employer coalitions, the American Medical Association (AMA) and other physician groups, the American Hospital Association (AHA), and the Association of American Medical Colleges (AAMC).
The principles cover a wide variety of program design and implementation issues, and at times the recommendations made by the different organizations directly oppose one another. Five major areas of disagreement about P4P design and implementation issues are:
There was also variation in the topics explicitly included by organizations in their statements. For example, physician organizations frequently include these principles: voluntary participation, no link between rewards and the ranking of physicians relative to one another, reimbursement of physicians for the administrative burden of collecting and reporting data, and physician involvement in program design.
There are, however, areas of consensus. Nine or more organizations endorsed the following principles/recommendations:
The remainder of this report presents the findings of RAND’s environmental scan of hospital P4P. Chapter 2 reviews the empirical literature on the impact of hospital P4P. It also draws from the economics and organizational management theoretical literature that has examined the effect of incentives on behavior to assess possible implications for P4P program design. Chapter 3 summarizes our discussions with hospital P4P program sponsors nationally, focusing on a description of the measures being used by these programs, the structure of the incentive payments, operational issues associated with implementation, and lessons learned. Chapter 4 summarizes our discussions with hospitals that have been exposed to P4P and P4R efforts (such as the CMS RHQDAPU program, the Premier P4P demonstration, or private-sector P4P programs), hospital associations, and data vendors that support hospitals in their data submissions to the array of performance-reporting efforts. Our emphasis in these discussions was on learning what hospitals thought about the set of performance measures for which they were being held accountable, the structure of the incentive payments, issues related to data submissions and the quality and validity of data used to score their performance, the importance of public reporting, barriers they saw as hampering their ability to comply with the program requirements, and lessons they had learned. As part of these discussions, we also focused on understanding the unique issues of small, rural, and Critical Access Hospital (CAH) hospitals that would affect their ability to participate in P4P programs. Chapter 5 concludes by summarizing the key findings from the environmental scan.
This chapter summarizes the empirical evidence on the effect of P4P in the hospital setting, based on application and theory. We begin with a review of published studies that assess the impact of P4P programs on health care quality, safety, and/or resource use, including studies that address P4P in either the hospital inpatient or the hospital outpatient setting. We then follow with a summary of relevant lessons for hospital P4P that can be drawn from the management and economic literature on how individuals in general respond to incentives, and we consider the implications for structuring incentives to achieve the desired behavioral response.
Methods
Our review of the empirical literature on the effects of P4P included all peer-reviewed published studies describing the impact of a hospital P4P program for either inpatient or outpatient hospital services. We defined outpatient hospital services as any medical or surgical services performed primarily in an outpatient/ambulatory care setting that are billed through a hospital. Examples of outpatient hospital services include chemotherapy, outpatient surgery, and diagnostic tests such as colonoscopy. The review included any randomized control studies, quasi-experimental trials, and pre-/post-intervention studies. We only retained articles that reported empirical findings related to the effect of paying for quality, patient experience, and safety or resource use, specifically excluding articles focused only on the impact of changes in hospital payment, such as the shift to the Prospective Payment System (PPS) and P4P as applied to physicians in the ambulatory setting. Only studies that were in English and published in the last 10.5 years were included.
We searched for articles published between January 1996 and June 2007 using five bibliographic databases (PubMed, EconLit, CINAHL, Psycinfo, and ABInform) that could include articles related to P4P and financial incentives specific to the hospital environment. Table 2 displays the search strategy and terms used to identify relevant articles for hospital inpatient and hospital outpatient settings separately.
Hospital Inpatient |
Hospital Outpatient |
---|---|
pay for performance OR p4p OR “pay for quality” OR “pay for value” OR “value based purchasing” OR “financial incentives” OR “monetary incentives” | “pay for performance” OR p4p OR “pay for quality” OR “pay for value” OR “value based purchasing” OR “financial incentives” OR “monetary incentives” |
(bonus* OR reward* OR (incentive reimbursement)) AND quality | This resulted in a database of 1,575 articles. Within this database, we retained any article that included the following keywords:
|
hospital OR hospitals | |
(Results from search #1 or #2) AND (Results from Search #3) | |
NOT (organ donation) |
We combined the results of this search strategy for each setting (conducted initially in November 2006 and update with articles published through June 2007) from the five different databases and then eliminated duplicate articles. Titles and abstracts for these articles were reviewed, and potentially eligible articles were identified. The full text of the set of potentially eligible articles was then read to determine whether the article was appropriate for inclusion. Reference lists of the included articles were checked to identify additional relevant studies. To ensure our scan was comprehensive, we also consulted experts in the field of P4P and retrieved references from recent reports on P4P and payment reform from the IOM, the Joint Commission, MedPAC, and the Agency for Healthcare Research and Quality (AHRQ).
From the initial search strategy, we identified 902 non-duplicated articles for the hospital inpatient setting and 162 non-duplicated articles for the hospital outpatient setting. After the abstracts were reviewed, eleven articles were targeted for further review for the inpatient setting and zero for the hospital outpatient setting. Of the eleven articles, eight met our criteria for inclusion. After consultation with P4P experts and a review of relevant reports, one more paper was thought to be sufficiently important to include. It is a white paper, not published in the peer-reviewed literature, describing the early results of the CMS–Premier Hospital Quality Incentive Demonstration (PHQID). Our summary therefore focuses on the findings from nine articles that describe P4P intervention in the inpatient setting.
The methodological quality of the articles was assessed by evaluating the overall study design in terms of its strength in determining a causal relationship or an association between the intervention and the outcome. For example, we determined whether the study design was a pre-post measurement without a control group, a pre-post study with a control group (a quasi-experimental study design), or a randomized control trial. If there was a control group, we also assessed its adequacy, such as whether hospitals in the control group were reasonably similar to hospitals exposed to the P4P intervention. If there was no control group, we assessed whether the study controlled for pre-intervention trends in performance. Lastly, we assessed the studies’ use of appropriate statistical methods for estimating an intervention effect. These characteristics were used to determine the quality of the studies being reviewed, with randomized control trials providing the strongest evidence of a causal relationship between the implemented program and changes in performance measures, and uncontrolled studies providing weaker evidence.
Findings from the Literature Review
As of June 2007, few peer-reviewed studies existed on the use of financial incentives to affect quality, patient experience, safety, or the efficient use of resources. While more than 40 hospital-based P4P programs are operating in the U.S., few of them are undergoing formal evaluations to assess their impact.
The nine articles in our review address the impact of three separate hospital P4P programs in which formal evaluations have been occurring:
Hospital P4P Program |
Type of Measures |
Type of Performance Target |
Form of Financial Incentive |
|||||||
---|---|---|---|---|---|---|---|---|---|---|
Outcome | Process | Structure | Patient Experience |
Patient Safety |
Absolute | Relative | Bonus |
Withhold |
Penalty |
|
HMSA | X | X | X | X | X | X | X | |||
BCBS of Michigan | X | X | X | X | X | X | ||||
PHQID | X | X | X | X | X | X |
Table 3 presents a high-level summary of key design features of each of these three P4P programs. Table 4 provides descriptive data on the evaluation studies. More detailed findings from our evaluation are in the following subsections.
P4P Program |
Article |
Type of Study |
Change in Performance |
Control Group |
---|---|---|---|---|
HMSA P4P Program | Berthiaume et al., 2004 | Describes uptake of one component of program and how many dollars were dispensed | No | No |
Berthiaume et al., 2006 | Describes trends in measures | Yes | No | |
BCBS of Michigan Hospital Incentive Program | Nahra et al., 2006 | Cost-effectiveness analysis | Yes | No |
Sautter et al. 2007 | Qualitative interviews with leadership of 10 participating hospitals | NA* | No | |
Reiter, Nahra, and Wheeler, 2006 | Survey of participating hospitals to track behavioral responses | No | No | |
PHQID | Premier White Paper | Describes improvements in quality measures | Yes | No |
Grossbart, 2006 | Evaluates improvements in quality versus a “matched” control group | Yes | Yes | |
Lindenauer et al., 2007 | Evaluates improvements in quality versus a “matched” control group | Yes | Yes | |
Glickman et al., 2007 | Evaluate improvements in quality versus a control group | Yes | Yes |
Note to Table Four: Change in performance was used to select hospitals for the interviews and not the outcome examined by the research.
Hawaii Medical Service Association Pay-for-Performance Program
Two papers evaluated the impact of the HMSA P4P program, which started in 2001 and targeted all 17 hospitals in Hawaii. The program had four components:
The complication and length-of-stay measures focused on patients admitted to the obstetric service or undergoing one of the 18 most common surgical procedures, which accounted for approximately 50 percent of the surgical case volume. The HMSA hospital P4P program has been evaluated, and the results of the evaluation are contained in two articles by Berthiaume and colleagues (2004 and 2006).
Berthiaume et al., 2004: This study looks at the rates of participation in the “Get with the Guidelines—Coronary Artery Disease” component of the HMSA P4P program. The authors report that of the 13 hospitals in Hawaii with more than 30 admissions for acute coronary artery disease, 10 earned some points associated with participation in “Get with the Guidelines.” The average incentive amount to the 10 hospitals ranged from $5,514 to $114,574 in one year. The authors state that the fact that 85 percent (11/13) of the eligible hospitals participated in “Get with the Guidelines” is noteworthy because this level of program adoption “is much higher than would be predicted by models of diffusion of innovation in healthcare.” The authors report that the incentive dollars helped provide support within hospitals for salaries and travel costs and led to substantial changes to the systems of care.
This study suffers from several limitations that restrict our ability to assess the impact of the P4P program. It reports only how many hospitals participated in the program at a single point in time, 2003—not whether participation, number of points earned, or scores on the myocardial infarction process measures increased over the intervention period. Since there was no control group, it is unclear whether participation in the “Get with the Guidelines” care improvement effort was truly driven by the incentive program versus other factors. Hospitals around the country were being encouraged to enroll in the program, and many of the measures that the program used were also being used by the Joint Commission and CMS as part of their quality measurement and improvement efforts. This study does not provide evidence on the impact of the incentive program in changing clinical process or outcome measures and how the results might generalize more broadly.
Berthiaume et al., 2006: This second study by Berthiaume and colleagues reports changes in the following HMSA P4P program areas: length of stay, complication rates, patient satisfaction, and the hospital’s internal quality initiatives. It does not report changes in the clinical process of care measures for AMI. The study design used pre-post measurement with 2001 as the baseline year and 2004 as the final year of available data. The HMSA program awarded $9 million in financial incentives across all parts of the program in 2004.
The authors report that complication rates for both obstetric and surgical patients declined approximately 2 percentage points between 2001 and 2004. Average length of stay also decreased for both types of patients; surgical patients experienced a decrease in length of stay of approximately 1.2 days, whereas length of stay for obstetric patients decreased by approximately 0.4 days. Patient satisfaction with inpatient care remained stable (78 percent in 2001 versus 79 percent in 2004); satisfaction with emergency room care increased from 71 percent in 2002 to 75 percent in 2004. Lastly, the scoring mechanism for internal quality initiatives was changed halfway through the program; but between 2003 and 2004, the scores increased from 4.25 to 6.5 points out of a total of 10 possible points. The authors do not state whether the observed differences between time periods were statistically significant. However, confidence intervals shown in figures contained in the article appear to indicate that only the change in surgical length of stay was statistically significant.
The authors state that it is unclear whether these upward shifts in performance were caused by the HMSA P4P program intervention or other factors occurring more broadly, such as greater national emphasis on improvements in AMI care or efforts to reduce utilization. As is typical for P4P programs being implemented nationally, the HMSA program did not have a control group to determine the effect of the HMSA intervention separate from other factors that may have caused the observed changes.
Blue Cross and Blue Shield of Michigan Hospital Incentive Program
Two published papers have examined the impact of the BCBS of Michigan Hospital Incentive Program. This program was initiated in 2000 and fully implemented in 2001 between BCBS of Michigan and the 86 hospitals statewide with which it contracts. Under the incentive program:
As of this review, no results have been published describing changes in quality metrics in response to this program. The three evaluation studies that have been published examine the cost-effectiveness of the program (Nahra et al., 2006), results of qualitative interviews with leadership at 10 participating hospitals (Sautter et al., 2007) and the results of a survey of organizational changes that participating hospitals reported making in response to the P4P program (Reiter, Nahra, and Wheeler, 2006).
Nahra et al., 2006: This study estimated the cost-effectiveness of the Michigan BCBS Hospital Incentive Program from the sponsor of the health plan program’s perspective. In estimating the costs, the researchers included incentive amounts paid to hospitals by BCBS and the costs of administering the program. Benefits from the program were estimated by using increases in performance on the process measures to calculate the number of patients receiving improved heart care. These calculations were combined with published clinical trials data to estimate how many quality adjusted life years (QALYs) would be saved from the improved heart care over the 2000–2003 period. The researchers estimated that the clinical quality improvements observed would lead to savings of 733 to 1,701 QALYs. Based on this calculation and the cost of the program to the health plan, the cost per QALY was between $12,967 and $30,081, a range generally considered to be cost-effective (Ubel et al., 2003). This study illustrates that modest quality improvements can lead to substantial gains in QALYs saved. Additional unpublished information obtained from the program evaluator (private communication J Wheeler) indicated hospitals reported incremental costs for participation in the P4P program were on average $36,915 for large teaching hospitals and $28,525 for other hospitals. Even taking these into account, the program would be considered cost effective.
One limitation of this evaluation is the absence of a control group or trend data from the period prior to intervention to know whether the observed improvements in heart care are attributable to the BCBS Hospital Incentive Program or other secular trends in care for heart disease (such as the CMS RHQDAPU pay-for-reporting program, the Joint Commission quality improvement initiatives, or the CMS 7th Scope of Work quality improvement efforts).
Reiter, Nahra, and Wheeler, 2006: This study reports the results of a survey of the 86 hospitals participating in the BCBS of Michigan Hospital Incentive Program. The survey measured the effect of participating in the program on hospital behavior. The study outcomes were the number of hospitals self-reporting that the incentive program had triggered a structural change or a process change within the hospital. Structural changes included the formalization of a quality management staff position or a change in the person responsible for quality. Process changes included implementation of a computerized physician order entry (CPOE) system or creation of case-management teams. Of the 86 hospitals participating in the program, 66 responded to the survey (70 percent response rate). Of the respondents, 32 (48 percent) reported that they had made a structural change and 39 (59 percent) reported they had made a process change in response to the P4P program. Overall, 75 percent of the responding hospitals reported making at least one type of change as a result of the BCBS Hospital Incentive Program. The most common structural change was involvement of leadership and greater board engagement in quality improvement. The most common process changes were instituting physician education, developing case-management teams, and increasing leverage with hospital physicians. The authors observed that since most of the process changes focused on physician behavior, a hospital’s ability to improve quality might depend on its “willingness or ability to exert influence over physicians.”
While this study found changes in the behavior of hospitals in response to the P4P program, it does not demonstrate that the changes made by hospitals resulted in clinical quality improvements. Additionally, the combination of the BCBS P4P program and other quality improvement interventions that were occurring simultaneously (e.g., CMS P4R, Joint Commission quality improvement) may have created a tipping point for the hospitals to make the reported behavioral changes. This study does not include a control group, which means there is no way to determine whether hospitals not exposed to the BCBS of Michigan Hospital Incentive Program were making similar changes.
Sautter et al., 2007: This qualitative study described the findings of semi-structured interviews with senior management and cardiologists at 10 Michigan hospitals participating in the P4P program. Fifty-four hospitals that participated in the P4P program and reported cardiac care performance to BCBSM 2002-2004 were placed into strata based on their changes in performance on one of the quality measures used in the incentive program, assessment of ventricular function among CHF. Hospitals from each strata were selected for interviews to obtain variation in hospital characteristics, such as size and teaching status. Among the 10 hospitals selected for interview, 7 had improved their performance, 2 were top performers at baseline and remained top performers, and 1 hospital showed declining performance. Only two of the 10 hospitals interviewed reported that the P4P incentives were a driver for quality improvement; eight of the 10 reported their facilities were undertaking these activities anyways or that the incentive was not large enough to be effective. The authors, however, are not sure these responses imply that without financial incentives performance would have improved to the same degree. They note, “incentive rewards clearly enabled some hospitals to make investments in quality.” In explaining the variation in quality improvement, the authors believe “underperforming hospitals with some infrastructures for quality improvement had the greatest success when presented with incentives.”
CMS–Premier Hospital Quality Incentive Demonstration
Four studies have analyzed the effects of the PHQID, a three-year CMS-sponsored demonstration project initiated in 2003. The PHQID program allowed for voluntary enrollment (i.e., hospital self-selection into the study) and only included hospitals using the Premier Perspectives data system—two factors that may hinder the ability to generalize the experience of the demonstration hospitals to non-demonstration hospitals to the extent that participants differ in important ways from non-participants. It should also be noted that at the start of the Quality Incentive Demonstration period, CMS had already begun implementing its RHQDAPU P4R program, whose set of measures overlapped substantially with that of the PHQID. The PHQID program includes 34 measures of which 22 overlap with RHQDAPU measures in the areas of AMI, pneumonia, CHF, and surgical infection prevention.
The PHQID demonstration includes 262 hospitals across 38 states. Hospitals were paid an annual bonus based on their composite performance scores in five clinical areas: AMI, Coronary Artery Bypass Graft (CABG) surgery, Community Acquired Pneumonia (CAP), CHF, and hip and knee replacement surgery. The bonus dollars represented new money. Hospitals that did not achieve a minimum level of performance in the third year of the program (defined by the lowest two deciles of performance in the first year if the program) were assessed a financial penalty.
Premier, Inc., 2006: Premier published its own report describing the PHQID and the observed quality improvements from the first year of the incentive program’s implementation. Premier reported that between the first and fourth quarters of the first year of the program (October 2003 to September 2004), significant gains were made across the measures in the study, with an average 6.6 percentage point improvement across the five clinical areas. Within each of the five clinical composites, AMI performance increased from 87.4 percent to 90.8 percent, CABG surgery performance improved from 84.9 to 89.7 percent, CAP improved from 69.3 percent to 79.1 percent, CHF increased from 64.6 percent to 74.2 percent, and hip/knee replacement improved from 84.5 percent to 90.1 percent.
Although these results are positive, it is difficult to draw conclusions from this study about the effect of the PHQID program. An important challenge with this study is trying to assess whether non-participants were achieving similar gains in performance given the absence of a control group. As documented by Williams et al. (2005), there has been a strong trend across the country toward improvement in many of the same measures used as a basis for incentives in the PHQID. Disentangling the impact of the CMS-Premier demonstration from concurrent Joint Commission and CMS quality improvement efforts (i.e., RHQDAPU and the 7th Scope of Work) requires that there be a set of comparison hospitals with similar characteristics but no exposure to the PHQID. Selection bias is another issue to contend with in explaining the observed outcomes, since Premier hospitals that chose to participate in the PHQID had higher baseline quality scores than did Premier hospitals that chose not to. Thus, improvements in performance may be stem partly from characteristics of the hospitals that participated rather than from the incentive program itself.
Grossbart, 2006: This study examined the impact of the PHQID but focused solely on a subset of hospitals participating in the Premier system. The study followed the performance of hospitals in the Catholic Healthcare Partners system—four that chose to participate in the PHQID and six that chose not to participate and were used as controls. The analysis was limited to a subset of 17 of the 34 measures used in the PHQID initiative (for three clinical conditions, AMI, CAP, and CHF) that were collected by both intervention and control groups of hospitals as part of reporting for Joint Commission ORYX Core Measures program.
All 10 hospitals showed significant improvement across the measures. Those participating in the PHQID had a greater statistically significant increase in performance than did the non-participants. Across 17 measures, PHQID hospitals improved their scores by 9.3 percentage points, versus 6.7 percentage points for non-participating hospitals. Although the researchers matched hospitals on a number of key characteristics, one important limitation of this study is that they did not match them on baseline performance. The findings are confounded by the fact that the participating hospitals started at a higher level of quality than the non-participants did (80.4 percent versus 78.9 percent).
Much of the observed difference between the two sets of hospitals was driven by greater improvement in CHF care (19.2 percentage points for PHQID hospitals versus 10.9 percentage points for non-participants). Across the 17 measures examined, the two measures with substantial differences in improvement between PHQID and non-participating hospitals were (1) discharge instructions for patients with CHF (40.1 percentage points improvement for PHQID hospitals versus 14.6 for non-participants), and (2) pneumococcal vaccine delivery for patients admitted with pneumonia (31.6 percentage points improvement for PQHID hospitals versus 22.1 for non-participants). These two measures likely drive a substantial fraction of the overall observed differences in improvement between participating and non-participating hospitals.
The PQHID P4P intervention did not occur in isolation; it was conducted in an environment in which several national quality improvement efforts already in play were focusing on the same measures, particularly the HQA measures. These efforts included the CMS RHQDAPU program, the Joint Commission’s quality improvement initiatives, and the CMS 7th Scope of Work. Across the subset of ten HQA measures, the study found that there was no difference in the amount of improvement: 5.4 percentage points for PHQID hospitals, and 5.1 percentage points for non-participating hospitals. This very modest difference, while not statistically different, raises questions about the added value of P4P incentives above and beyond other quality measurement and feedback efforts, particularly the RHQDAPU P4R intervention, which appears to have driven improvements in performance nationally (Lindenauer et al., 2007). Similar levels of improvement were observed among all hospitals nationally, both those exposed to P4P and those exposed to public reporting, measurement, and feedback interventions.
The author described why only some Catholic Healthcare Partners hospitals chose to participate in PHQID. With the exception of those with the highest volume, hospitals saw the costs of participation, particularly for the extra staff required for the additional data collection, as being too high; and most hospital CEOs believed there was little to be gained by participation. Those that chose to participate thought the experience would provide them with a market advantage and a head start given the growing numbers of P4P programs in the market.
It is unknown from this study whether the ten Catholic Healthcare Partners hospitals making up the set are similar to or different from other hospitals nationally in ways that are important. To the extent that these hospitals differ in important ways from other hospitals, the results may not be more broadly generalizable. Another unknown is how Catholic Healthcare Partners hospitals and the system in which they operate may differ from other hospitals nationally, such as in the amount and type of systems and quality resource support that were provided. The six hospitals serving as the control group were selected because of “similar levels of service,” and the hospitals were shown to be similar in terms of availability of an open heart program and average number of beds, discharges, and case-mix index. A more rigorous method of selecting controls would have been to match each intervention hospital to a control on these characteristics as well as on baseline performance.
Lindenauer et al., 2007: This study provides the most comprehensive evaluation of the impact of the PHQID that has been published to date. The paper describes changes in performance on 10 measures that occurred over a two-year period, between the fourth quarter of 2003 and the third quarter of 2005. The study examined 207 PHQID hospitals and 406 control hospitals that were submitting performance data as part of the RHQDAPU program. Hospitals in this study were matched on bed size, teaching status, region (Northeast, Midwest, South, or West), location (urban or rural), and ownership status (for-profit or not-for-profit).
On an overall composite measure constructed from the 10 measures, PHQID hospitals experienced greater improvement than the control hospitals did (9.6 percentage point improvement versus 5.2 percentage points). This difference was seen consistently for each of the three clinical conditions (AMI, CAP, and CHF) for most individual measures and on an appropriate care measure.3 The greatest amount of improvement was seen among hospitals with the lowest baseline performance.
The authors did a number of sensitivity analyses to assess whether this differential response stemmed from a volunteer bias, meaning that Premier Perspectives hospitals that volunteered to select into the PHQID program were inherently different from Premier Perspectives hospitals that did not volunteer. The researchers found that after controlling for baseline performance and volume of patients, the difference in improvement decreased from 4.3 percentage points to 2.9 percentage points, but the improvement was still statistically significantly higher in PHQID hospitals. When all hospitals eligible to participate in the PHQID program were compared to all other hospitals nationally (so those exposed to RHQDAPU), the performance differential remained, but the gap was smaller (the difference in absolute performance point improvement was 2.1 points). Overall, this article provides the strongest evidence that the PHQID is improving performance beyond what is accomplished by public reporting of performance for some of the 10 measures, albeit modestly, once the hospitals’ baseline performance and characteristics are controlled for. Because this study describes the impact of the P4P intervention on top of the measurement and public reporting intervention, we do not know how the impact of the P4P intervention would have differed absent public reporting.
Glickman et al., 2007: This study examined the impact of the PHQID on hospitals voluntarily participating in the national quality improvement initiative Can Rapid Risk Stratification of Unstable Angina Patients Suppress Adverse Outcomes with Early Implementation of the American College of Cardiology/American Heart Association (ACC/AHA) Guidelines (CRUSADE). Hospitals participating in CRUSADE received performance feedback, including comparisons with other CRUSADE hospitals and national standards, as well as a variety of educational interventions. Trends in the cardiac care of patients with non-ST-segment elevation AMI from July 2003 to June 2006 were compared for 54 CRUSADE hospitals participating in PHQID and 446 CRUSADE hospitals not participating in PHQID (i.e., controls). In addition to the AMI measures included in PHQID, the comparison also used eight AMI process measures not included in the demonstration. The study sought to determine whether participation in the P4P intervention gave an additional boost to performance improvement above that from the CRUSADE intervention.
Both PHQID and control hospitals improved performance on PHQID measures and the other AMI measures over the period examined. There were not statistically significant differences between improvement in the PHQID and control groups on the composite measure for either PHQID (7.2 percentage points and 5.6 percentage points, respectively) or other AMI measures (13.6 percentage points and 8.1 percentage points, respectively). PHQID hospitals had significantly greater improvement on three individual measures—two that were included in PHQID (aspirin prescribed at discharge, p = .04; smoking cessation counseling for active or recent smokers, p = .05) and one that was not included in the demonstration (lipid-lowering agent prescribed at discharge, p = .02). There were no statistically significant differences in improvements in inpatient mortality between the two groups. In both groups, hospitals with lower levels of performance at the start of the observation period demonstrated greater improvements in performance than did higher-performing hospitals.
The authors concluded that P4P leads to only very small improvements in performance beyond what can be accomplished through engagement in quality improvement initiatives. Like the Lindenauer et al. (2007) article, the Glickman et al. article demonstrates the importance of using control hospitals and controlling for baseline performance in any analysis of the impact of hospital P4P. This study’s limitations are its focus on only one of the clinical areas included in PHQID and its narrow focus on patients with non-ST-segment elevation myocardial infarction. In addition, since the hospitals included in the study voluntarily participated in CRUSADE, it is not known whether hospitals would demonstrate the same level of performance improvement if participation were not voluntary.
Summary of the Evidence on Hospital P4P Programs
As of June 2007, there were only nine studies on the impact of hospital P4P programs, one of which was not peer reviewed. All of these studies evaluated programs that targeted the inpatient setting, and none examined P4P interventions in the hospital outpatient setting. Among the studies examining changes in performance, each one reported improvements over time in at least some of the hospital performance measures or condition-specific composites included in the specific study; however it is difficult to disentangle the P4P effect from the effect of other quality improvement efforts that were occurring simultaneously. Improvements in hospital performance have been observed in response to feedback reports (Williams et al., 2005) and public reporting with a financial incentive for submitting data (Grossbart, 2006; Lindenauer et al., 2007).
The two studies with control groups saw very modest improvements in performance associated with P4P compared with what was accomplished with public reporting (Grossbart, 2006; Lindenauer et al., 2007), but one of these studies saw improvements in a few performance areas associated with P4P compared with what was seen for control hospitals participating in voluntary quality improvement activities (Glickman et al., 2007). It has been argued, however, that in order to accomplished sustained quality improvement, interventions should be multifaceted and focus on different levels of the health care system (Grol et al 2002; Grol and Grimshaw 2003). This implies that to be most effective, P4P should be partnered with other activities such as public reporting and internal quality improvement activities that also encourage quality improvement for the same clinical area.
There is less evidence of the effect of P4P on patient outcomes. Berthiaume et al. (2006) found improvements in complication rates for obstetrical and surgical patients in an uncontrolled study but did not report whether those improvements were statistically significant. In the study by Glickman et al. (2007), they did not find significant differences in inpatient mortality improvement for AMI between PHQID and control hospitals. None of the studies evaluating PHQID separately analyzed the other patient outcome measures (for coronary bypass survey and hip and knee replacement surgery) included in the program, so it is not clear whether improvements occurred in these measures.
Most of the published studies have significant methodological limitations. Six of the nine had no controls, which are critical for providing evidence of a link between P4P and performance improvements. This is particularly important given the documented temporal trend toward increasing performance on many hospital quality metrics. It is challenging to disentangle the effects of the increasing use of financial incentives from the effects of greater use of quality improvement initiatives on the local and national level as well as the increasing use of public reporting when all activities are focused on the same clinical conditions. One of the studies that used a control group only included six control hospitals, and it is unclear whether the controls utilized were appropriate.
Beyond the specific limitations of the nine studies, another important issue is whether the experience of these geographically confined incentive programs that took place in the context of established relationships between the individual hospitals and the program sponsors would reflect the experience of wholesale national implementation of a hospital P4P program by Medicare. Medicare is the largest payer of inpatient care in the nation, accounting for 30.4 percent of third-party payments for hospital expenditures (CMS, 2007b). Given the importance of this revenue source for hospitals, it is possible that the level of engagement by hospitals in a national P4P program would be higher than that experienced in the programs in Michigan and Hawaii; though in both Hawaii and Michigan, the incentive program was administered by the dominant commercial payor in `each of those states. Another issue to consider when interpreting the impact of these smaller P4P programs and demonstrations is that they all generally focus on a small set of process measures covering a handful of diagnoses. It is unknown what the impact on raising quality performance more broadly might be if Medicare were to adopt a more comprehensive set of measures.
The published literature on the use of financial incentives in health care is sparse and provides little information about how specific design features may influence behavioral responses. P4P is common in industries other than health care, and economists and management experts have studied and developed theories on how individuals respond to financial incentives. In the sections that follow, we describe theories that are drawn from the economics and management literature and consider the implications of applying the findings from tests of these theories to the design of a P4P program. Our review is not exhaustive; instead it focuses on selected theories to illustrate how theory might inform program design to achieve the desired behavior changes. It should be noted that the theories described have examined the behavioral responses of individuals, not institutions. It is thus uncertain whether application of these theories would elicit the same type of behavior responses from organizations, such as hospitals.
Prospect Theory and the Role of Framing in Decisionmaking
P4P incentives are designed to change the behavior of providers and the systems in which they operate in ways that will improve quality or efficiency. Various factors, such as the size of the incentive, are likely to influence a hospital and its physicians’ behavioral responses to a P4P program. For example, a large incentive would likely lead to a larger behavioral response than would a small incentive. Another factor is how an incentive is structured, or “framed,” which can determine the behavioral response to it. Prospect theory is an economic theory that attempts to explain how individuals respond to the framing of choices (Kahneman and Tversky, 1979). What follows is a description of several applications of prospect theory and an exploration of the potential implications for structuring a P4P program.
Withholds May Have More of an Impact Than Bonuses
One aspect of prospect theory is the principle of “loss aversion,” which finds that individuals are more sensitive to incentives when they perceive they are losing as opposed to gaining something. This effect has also been described as “losses loom larger than gains.” This behavioral effect has been demonstrated in a series of experiments in which both doctors and patients are asked to make a choice of treatment—either surgery or radiation—for a patient with lung cancer. Both doctors and patients made different choices depending on whether the choice was framed as a loss (the probability of dying after surgery) or as a gain (the probability of surviving after surgery) (McNeil et al., 1982). In another experiment, Meyerowitz and Chaiken (1987) showed that a pamphlet that framed the benefits of self–breast examinations as a loss (lost ability to detect cancer early) led to a greater increase in the percentage of women doing these examinations than did an identical pamphlet that framed the benefits as a gain (gained ability to detect cancer early). The difference in the behavioral response for a choice framed as a loss rather than as a gain can be significant, almost twofold in magnitude (Kahneman and Tversky, 1979).
The principle of loss aversion may have implications for structuring a P4P incentive payment. Incentive payments can be structured as a withholding (a perceived loss in income)—for example, a portion of the hospital’s full payment for a service could be held back until the end of the measurement period and then released only if the hospital met the performance target—and they can be structured as a bonus (a perceived gain). The theory of loss aversion suggests that if the goal is to drive hospitals to make changes that improve quality or efficiency, withholding dollars with the likelihood of later releasing them based on performance (i.e., framing the incentive as a possible loss) may lead to a greater behavioral response than framing the incentive as a “gain,” in the form of a bonus, even if the same amount of money is at risk.
While framing something as a loss rather than a gain may result in a larger behavioral response, experiments have shown that doing so generally causes a negative reaction and violates what the parties exposed to the incentive believe to be fair. This point was illustrated in a study in which subjects were asked to respond to two decision scenarios. The economic impact of the two scenarios was the same, but one was framed as a loss, the other as a gain. In the first scenario, subjects were told that there was no inflation in the community and that employees were being asked to take a 7 percent wage cut (a loss). In the second scenario, subjects were told that there was 12 percent inflation and that employees were being given a 5 percent raise (a gain). The result in both of these decision scenarios was the same—employees would all experience a 7 percent reduction in net earnings—but the emotional response differed. A majority of subjects (62 percent) judged the first scenario to be unfair, whereas only 22 percent thought the second was unfair (Kahneman, Knetsch, and Thaler, 1986).
In terms of P4P program design, this research suggests that hospitals would be more likely to perceive a bonus in a positive light than they would a payment withholding, even if the net financial impact is the same. This conclusion is supported by a finding from a recent survey of 79 physician group leaders: When given a choice in the structure of a P4P program, 59 percent preferred a bonus, 24 percent preferred a withholding, and 17 percent felt they were the same (Mehrotra et al., 2007).
A Series of Small Incentives Might Lead to More Quality Improvement Than Would One Large Incentive
Why do people go across town to save $10 on a clock radio but not to save $10 on a large-screen TV? After all, the same amount of money can be saved in both cases.
The explanation for the difference in behavioral response in these two scenarios is called the principle of “diminishing marginal utility” (Lowenstein, 2001): the perceived value of a sum of money becomes progressively lower when associated with an increasingly larger sum of money. Thus, for example, an individual perceives the difference between $0 and $10 as being greater than the difference between $100 and $110, which is perceived as being greater than the difference between $200 and $210, and so on. This principle asserts that people tend to judge such gains or losses as changes from their current state of well-being (or reference point), rather than their final states (Kahneman and Tversky, 1979).
When we apply these findings to hospital P4P program design, it may be more psychologically motivating to provide smaller, more-frequent incentive payments than to provide a larger, lump-sum incentive payment. As an example, consider that a total of $1,000 in incentives is to be provided to a hospital based on its performance. According to the principle of diminishing marginal utility, the hospital’s behavioral response is likely to be greater if the $1,000 is divided into a number of payments—say, ten payments of $100 each—rather than paid as a lump sum. The reason for the greater motivation is that each $100 is perceived as a new $100 gain, capitalizing on the steepest portion of the utility function (the difference between $0 and $100), rather than simply as an addition to the previous gains (for example, from $500 to $600) (Thaler, 1985).
One way to structure this type of incentive in a P4P program would be to link the incentive payment to each applicable hospitalization. For example, the hospital could receive an extra payment of $100, on top of its usual DRG payment, for every patient admitted for pneumonia that received the care designated by the quality measure(s). This approach could lead to a greater behavioral change by the hospital than if it were to receive a lump sum, equal in dollar value, at the end of the year.
Uncertainty May Reduce the Behavioral Response
When given a choice, most people are risk averse; they will choose an option with 100 percent certainty over an option involving an uncertain but likely more valuable outcome. This principle of risk aversion is illustrated in a study in which subjects were given a choice between a one-week vacation that was certain or a three-week vacation they had a 50 percent chance of winning. The vast majority of subjects chose the one-week vacation (Kahneman and Tversky, 1979). Even though the 50 percent chance of a three-week vacation might be considered a more rational choice, most people will choose the sure thing because they perceive it to be a better choice than the possibility of getting nothing at all.
With regard to P4P program design, the principle of risk aversion suggests that decreasing the risk or uncertainty in the likelihood of receiving a financial incentive is likely to lead to a greater behavioral response to the incentive. Some P4P payment structures use relative thresholds, such as paying those in the top quartile of performance, as the basis for determining who “wins.” This type of payout scheme creates greater uncertainty for hospitals than do payment schemes that use absolute thresholds (i.e., a fixed target) for determining who receives an incentive payment. The reason for the greater uncertainty with relative thresholds is that the level of performance necessary to earn the incentive is unknown until after the fact, when hospitals can be sorted by rank order of performance. In contrast, absolute thresholds known in advance and thus provide greater certainty to the individual or institution trying to hit the target. Because of the uncertainty they create, relative thresholds may reduce the behavioral response to an incentive more than an approach using an absolute threshold will. Similarly, a shared saving program, such as is being used in the CMS Physician Group Practice (PGP) demonstration, might lead to a reduced behavioral response, in this instance because the providers in the PGP face uncertainty about whether there will be cost savings to fund incentive payments. In contrast, the most certain incentive would be an adjustment to the fee schedule. For example, for every admission for myocardial infarction, a hospital would receive an extra $100, on top of its DRG payment, if the patient received all applicable processes of care. In such an incentive system, the hospital would know that if its physicians provide these processes, it would definitely obtain the additional payment.
Reducing the Time Lags Between Performance and Receipt of Incentive Can Help to Achieve Maximum Response
In economics, the principle of discounting is based on the fact that individuals value having a sum of money now more than sometime in the future, even after accounting for inflation. The concept of discounting and the use of a discount rate are well accepted in both accounting and economics. Studies have found, however, that individuals discount in a way different than would be expected by classic economic theory. In one study, the vast majority of individuals chose to receive $10 immediately rather than $21 in one year (Loewenstein and Prelec, 1992). But when asked to choose between $10 in one year and $21 in two years, fewer individuals selected the $10. Instead of discounting in a linear fashion, the individuals in these experiments were discounting at a steeper hyperbolic curve, which led to the name of this phenomenon: hyperbolic discounting.
The application of hyperbolic discounting to P4P program design suggests that minimizing the lag time between the performance being incentivized and receipt of the incentive may strengthen the behavioral response. Money received right away is perceived as different in value from money to be received in the future—even the near future. For example, a hospital is more likely to implement an electronic medical record (EMR) if they know the money associated with doing so will be received quickly (e.g., within the next month) rather than years after the implementation. One criticism of current performance measurement and reporting programs is that the substantial lag between the provision of care (i.e., performance) and the reporting of results renders the results not actionable (Davies, 2001). Similarly, in a P4P program, the time required to collect and validate data and make the payout determination might mean that the incentive payment comes long after actual delivery of care. Substantial time lags may cause a hospital to see the incentive as occurring so far in the future that it is not worth pursuing. Strategies that tie payment to the provision of individual services or more frequent payouts may help reduce the time lag.
A Series of Tiered Absolute Thresholds May Be Better Than One Absolute Threshold
An individual’s motivation and effort when faced with a goal greatly depend on that individual’s baseline performance. Economists and psychologists have described this phenomenon as a “goal gradient” (Heath, Larrick, and Wu, 1999). If baseline performance is far away from goal performance, the individual exerts little effort, because the goal is viewed as not immediately attainable. As baseline performance gets closer and closer to goal performance, the individual exerts more and more effort to succeed. However, as soon as the goal is achieved, the motivation to improve decreases significantly. This phenomenon was illustrated in a study of a coffee shop reward program in which the tenth coffee purchased was free. Participants in this experiment slowly decreased the time between purchases of a coffee as they got closer to the free coffee (Kivetz, Urminsky, and Zheng, 2006).
The notion of a goal gradient may have application in structuring a hospital P4P program. This principle implies that there would be a greater behavioral response among hospitals if there were a series of quality performance thresholds to meet (e.g., increasing dollar amounts for achieving a 50 percent, a 60 percent, a 70 percent, an 80 percent, and a 90 percent performance threshold) rather than one (e.g., a 75 percent performance threshold). If, for example, there is just one 75 percent quality threshold (rather than a series of thresholds), a hospital whose baseline performance is at 45 percent is likely to see the goal as too difficult and not likely to be achieved, and is thus less likely to devote resources to quality improvement. If there is also a 50 percent quality threshold, however, the hospital’s leadership may see reaching the threshold as feasible and thus be more likely to devote resources to improving quality. A series of quality thresholds might also lead to a different behavioral response among hospitals that are doing well. In a single-threshold system with a goal of 75 percent, a hospital that is at 80 percent would have little reason to devote more resources to improve its quality performance any further. In a graded performance threshold system, however, this hospital would have an incentive to reach the highest threshold, 90 percent, to achieve additional payment. To stimulate continual improvement, some P4P programs have elected to use relative performance targets so that the bar keeps moving upward. However, absent some gradients or some allowance for payment along the entire continuum of improvement, a single relative threshold creates a cliff effect—meaning all or nothing winners.
Multidimensional Output
Multidimensional output, or multitasking, refers to situations in which the responsibilities of an individual encompass multiple activities or outputs that may require different types of skills to accomplish (Holmstrom and Milgrom, 1991). A hospital’s output includes many different components, such as managing a patient’s chronic illness, the timely and efficient diagnosis of a patient’s new symptom, counseling and advice on how to prevent illness, and emotional support.
Multitasking is relevant to P4P programs because the performance measures in these programs typically address only a narrow piece of a hospital’s outputs or the processes that contribute to outputs. For example, a program may measure the provision of aspirin for a patient with AMI but not other processes or outputs that are difficult to measure, such as diagnostic acumen for a patient hospitalized with unclear symptoms. It is hypothesized that if a large incentive is applied to one type of output, other outputs will be neglected, and overall care might worsen (Holmstrom and Milgrom, 1991). This reasoning is used to explain why few private-sector corporations put large fractions of employee pay “at risk,” making them dependent on measures of output for which only a small fraction of what contributes to output can be measured (Asch and Warner, 1996). A large financial incentive based on a narrowly focused set of measures may lead to the unintended consequence of having a hospital “teach to the test,” devoting resources to those things being measured and neglecting other important outputs that are not being measured.
There are several potential ways to overcome or minimize the problem of multitasking. One is to create an incentive program that addresses a broad array of a hospital’s outputs by applying a comprehensive set of performance measures. This approach has been taken by the primary care physician P4P incentive program in the United Kingdom, which has over 146 quality indicators covering clinical care for ten chronic diseases, organization of care, and patient experience (Doran et al., 2006). The challenge with this approach is to avoid creating a program that may be overly complicated and costly—absent efficient measurement tools. Another approach that employers in other industries have used is low-powered incentives (Asch and Warner, 1996). With this approach, the majority of an employee’s income is fixed, and only a small fraction is tied to an incentive. The incentive emphasizes the importance of the measured area but is not large enough to induce undesirable behaviors, such as gaming of the data to win or avoiding caring for sicker patients.
Intrinsic Versus Extrinsic Motivation
Empirical meta-analyses of studies that examined incentive programs show that such programs have a mixed response; some studies show an impact, and many others show little or even a negative impact (Rothe, 1970; Deci, Koestner, and Ryan, 1999; Cameron, Banko, and Pierce, 2001). Researchers have tried to reconcile the mixed results by theorizing that they are caused by a conflict between intrinsic motivation, which is a person’s inherent desire to do a task, and extrinsic motivation, which is the external incentive—such as might be provided in a P4P program. Researchers theorize that instead of supporting intrinsic motivation, extrinsic incentive “crowds out” intrinsic motivation (Deci, Koestner, and Ryan, 1999). This theory is used to explain why financial incentives for blood donation are ineffective: they inhibit the altruistic benefit of blood donation (Titmuss, 1970). The explanation for this crowding-out effect is that when a task is tied to an extrinsic incentive, people infer that the task is difficult or unpleasant (Freedman, Cunningham, and Krismer, 1992).
Empirical evidence of this effect was provided by a study in which students who were asked to collect money for a charity were put into two groups, one that was given an external incentive (a small amount of money), and one that was not. The group that was given the incentive collected less money than the other group did (Gneezy and Rustichini, 2000). A meta-analysis supported this study’s finding that performance-contingent rewards significantly undermine intrinsic motivation (Deci, Koestner, and Ryan, 1999), but the finding is not without critics (Cameron, Banko, and Pierce, 2001). Similar concerns have been raised about the effect of P4P in health care and how it may violate a physician’s sense of professionalism (Berwick, 1995). Application of this theory would imply that a small P4P incentive could actually lead to lower performance if it is tied to something hospitals are intrinsically motivated to improve, such as quality of care.
A potential way to address the crowding out of intrinsic motivation is simply to increase the size of the financial incentive. A very large external incentive will crowd out any inherent intrinsic motivation; but, in turn, it may create a greater behavioral response than would be obtained through intrinsic motivation alone. Gneezy and Rustichini, in “Pay Enough or Don’t Pay at All” (2000), illustrated this concept in a study of the average percentage of correct answers on an IQ test for four groups of college students that were given different incentives—one group received no incentive for each correct answer, one received a small incentive for each correct answer, one received a medium incentive for each correct answer, and one received a large incentive for each correct answer. The group given no financial incentive outperformed the group given the small financial incentive (56 percent versus 46 percent of questions correct, respectively), and the groups given the medium and large financial incentives (68 percent of questions correct in each group) outperformed both of the other groups.
The idea of using a large financial incentive to overwhelm the potential loss of intrinsic motivation is at odds with the recommendation to use low-powered incentives to mitigate the incentive to overfocus on measured areas of care to the detriment of unmeasured areas of care.
Together, the economic and management theories that we reviewed suggest that the way in which P4P incentives are structured, or framed, may influence whether they achieve the desired behavioral response. Incentives that are framed as withholdings, paid out in small and frequent payments, and paid out close to the time that care is delivered might drive the greatest behavioral response among targeted hospitals. Furthermore, in comparison to relative thresholds or one absolute threshold, a stepped number of absolute thresholds may be more likely to induce hospitals to devote resources to quality improvement. The two potential unintended consequences discussed serve as a helpful counterpoint to the economic theories. They emphasize that P4P incentives could lead to the neglect of other important, but unmeasured outputs in a hospital and that P4P programs could even have a negative impact on quality. Therefore, any program should closely monitor for these unintended consequences.
There are several important limitations and caveats to this interpretation of these theories. First, as noted above, the theories were developed to describe the behavior of individuals, not institutions; and it is possible that institutions may behave differently. Researchers have, however, applied theories of individual behavior to organizations and there is some anecdotal evidence that organizations respond similarly (Bazerman, Baron, and Skonk, 2001). Another caveat is that there are often practical reasons for not choosing the options suggested by these economic theories. For example, it was noted above that a more frequent payout might lead to a greater behavioral response. Yet this result might be outweighed by the higher administrative costs to the program sponsor of more frequent processing of data and payouts. An absolute threshold with an associated incentive with a fixed dollar amount might lead to a greater behavioral response than a relative threshold with an associated uncertain incentive. Yet such an approach leads to greater risk for the payer, which could face the prospect of paying out much more in incentives than was budgeted if providers outperform the predicted improvement. In the United Kingdom’s primary care physician P4P program, provider performance greatly exceeded the 75 percent predicted when the scheme was negotiated, so the cost to taxpayers was considerably more than expected (Doran et al., 2006). This could be avoided by setting a fixed incentive budget.
Given the scarcity of empirical data showing the effects of P4R and P4P programs on improving quality, safety, or efficiency and showing the effects of design elements that may influence provider behavior, RAND held discussions with a broad cross-section of P4P programs to gather information on the current state-of-the-art of P4P program design and operation. In this chapter, we describe key design features of hospital P4P programs that were being operated by both private- and public-sector sponsors across the United States as of October 2006. In addition to this cataloging of the designs, we asked about issues confronted in implementing and operating a hospital P4P program. The insights and perspectives gathered through these discussions reflect more than half of all hospital P4P programs in operation at the time the environmental scan was conducted.
From this scan, we identified 41 candidate organizations thought to sponsor hospital P4P programs. We then cataloged the 41 programs by a range of characteristics (e.g., type of sponsor, geographic region, type of insurance product) and selected a subset of hospital P4P program sponsors for discussions. During the selection process, we attempted to include a broad cross-section of programs that would encompass the range of variation in program design and operation. The goal of pursuing this strategy, as contrasted with a pure random sample, was to provide a rich base of information for consideration by ASPE and CMS.
The characteristics we sought to balance in our purposive approach to sampling were:
From the 41 programs, we selected 31 organizations and requested their participation in the discussions. We held discussions with 27 of the 31 organizations between August and December 2006. Of the four organizations that did not participate, one had no hospital P4P program, one declined to participate, one never replied, and for one we were unable to establish correct contact information.
The numerical statistics presented in the following sections reflect 23 of the 27 organizations. The four organizations excluded from our tabulations were in the planning stages of designing a P4P program or were the national plan office that delegated operation of P4P programs to the local plan. We did, however, include information gathered from our conversations with these four organizations in our descriptive summaries.
General Descriptive Characteristics of Hospital P4P Programs
Measures
Data Collection and Validation
Payment Structure
Public Reporting
General Comments. Sponsors had mixed thoughts on public reporting. Some saw public reporting as a critical part of the incentive program, saying that it captures the attention of all levels of hospital staff, as well as consumers. Others saw public reporting as creating a negative tone that is at cross-purposes with collaborative, quality improvement efforts between hospitals and program sponsors. Regardless of whether they were reporting specific data from their own programs or not, many sponsors provided a website link to the CMS Hospital Compare public report card that shows performance results for approximately 3,534 hospitals participating in the RHQDAPU program.
Reporters. Sponsors that reported publicly (12/22) usually posted performance scores on websites intended for health plan members (i.e., usually password protected). Data were often presented in a simple format (such as stars displaying different levels of performance) rather than as specific numeric values, and summary scores were commonly used. Most sponsors reported doing minimal to no testing of report presentation with consumers and did not know whether consumers understood or found useful the information as presented.
Non-Reporters. Sponsors not reporting data publicly tended to give two practical reasons for this. First, customized programs that are rolled out contract by contract do not permit comparisons, since not all hospitals have performance results or the same set of performance results. Second, some programs do not include all hospitals in a given area, again making comparisons difficult. Additionally, several sponsors underscored their desire to use their programs to work collaboratively with hospitals and thought that hospitals often viewed public reporting as a punitive strategy.
Hospital Assistance and Engagement
Program Evolution
Measures. Looking forward, many sponsors (11/20) plan to expand and/or modify the measure sets they are currently using. They anticipated including more measures in one or more of the following areas:
Program Evaluation
Most sponsors to whom we spoke were not conducting formal evaluations of their hospital P4P programs (5/22). However, some noted anecdotal evidence of positive program impact. For example, some said hospitals have improved their quality improvement infrastructure (e.g., dedicated quality improvement staff, regular quality improvement meetings) in response to P4P. Other sponsors reported seeing improved performance scores for participating hospitals. There was significant interest in tracking ROI, but there was also a lack of knowledge about how to do this and general difficulty estimating the costs associated with program development, implementation, and ongoing administration. For the most part, sponsors were not monitoring for potential unintended consequences of their hospital P4P programs, such as reduced attention and decreased quality of care in unmeasured areas. Sponsors did, however, recognize the need to do this, especially as P4P programs become more widespread and the amount of money tied to the financial incentive increases.
CRITICAL LESSONS LEARNED
We asked hospital P4P program sponsors to discuss the key lessons they have learned and the challenges they have faced in designing, implementing, and maintaining their hospital P4P programs. Their insights and recommendations based on their experiences are presented here for six key areas: overall design, measures, data collection, payment structure, hospital engagement, and public reporting.Overall Design
Program sponsors said that coordinating and aligning their P4P programs with other P4P programs and hospital reporting requirements constituted one of the most important considerations in designing a successful program. They noted that hospitals are often overwhelmed with requests for disparate information from a variety of organizations, and that streamlining these requests is key to making program participation feasible. An article by Pham et al. (2006) noted that on average, hospitals face 3.3 reporting requirements from various entities which are typically not fully aligned and which create additional reporting burdens.
Sponsors underscored the importance of striving for a simple program design and avoiding a “black box” that is difficult to understand and explain. They also noted that simplicity helps to win over skeptics.
Although a number of sponsors had programs tailored to individual hospitals, they noted the administrative advantages of a standardized program design and implementation. They felt, however, that separate programs may be necessary for small, rural, and CAH hospitals to accommodate their distinct challenges related to performance scoring, such as small case volume, less-educated patient populations, different mixes of services and patients, and different pools of providers.
Regional experimentation would allow various models of program design to be tested. For national programs, such as those that might be sponsored by a large insurer or CMS, sponsors felt a regional approach would allow for experimentation, which they saw as important for two reasons. First, several noted that health care is local and there are variations in infrastructure and patterns of care across regions; so, clinical areas that may be problems in one area may not be an issue in another area. As such, quality improvement may be best carried out through local initiatives that take into account local practices and organizational structures. Second, the best way to design a P4P program is not yet known (or there may be more than one best way, depending on the characteristics of the market).
Finally, sponsors said it was important for them to keep abreast of CMS’ future actions to facilitate advance planning and allow them to align their own programs with those of CMS.
Measures
Program sponsors said that based on their experience, the use of evidence-based measures that are standardized and have achieved a consensus base (i.e., are NQF and HQA endorsed) reduces hospital pushback. Sponsors noted that they would like to expand measurement beyond areas in which hospitals are already doing well to avoid the “teaching-to-the-test” phenomenon and to enable a more comprehensive assessment of performance. Areas suggested for additional measurement include:
The shortage of evidence-based measures in some of these areas will slow efforts to expand measures.
Sponsors reported that they were relying on CMS to take the lead nationally in both developing and maintaining measures. Sponsors believe CMS is the most suitable entity to develop reliable and valid measures. They feel CMS’ national presence and leverage will greatly facilitate adoption, leading to more programs using the same measures and thus decreasing the burden placed on hospitals to respond to the growing number of data requests and other new program requirements.
Data Collection
Sponsors reported that minimizing the data collection burden was critical for hospital acceptance of P4P programs. Suggested strategies for minimizing hospital burden included (1) alignment of measures and data collection across programs and (2) selection of a reasonable number of measures to include as part of the P4P program. Sponsors were unable to specify the precise number of measures that would be considered reasonable to include in a P4P program but stressed that there must be some limits. One suggestion was to retire measures as hospitals reach high-performance levels. However, this tactic raised concern that the areas no longer tracked would be ignored going forward. A suggestion for addressing this concern is to continue to track all measures but transition the high-performance metrics to threshold metrics after a specified amount of time. As such, a hospital would have to meet a certain level of performance on some metrics to be eligible for the financial incentive, but payouts would only be made based on performance on the current set of measures.
Payment Structure
The majority of P4P program sponsors advocated making the program as positive as possible. In this spirit, they suggested focusing on collaboration and rewards and avoiding financial withholds, which are viewed as punitive. This sentiment is consistent with the principle of framing noted in our review of economic theories in Chapter 2. Program sponsors found a more positive, collaborative approach yields the best results in terms of quality improvement. Sponsors also recommended rewarding improvement in combination with top performance to keep all hospitals engaged. Many sponsors believe that it is important to “spread the wealth” by rewarding top performers and also incentivizing the lowest performers to improve. Some sponsors also suggested supporting or rewarding participation in regional continuous quality improvement (CQI) efforts to improve systems of care. One sponsor noted that quality improvement efforts may best be served by focusing on systems of care, rather than relying on the current “one off” model of tracking performance on individual measures. They recommended expanding the focus of hospital P4P programs to include rewards for participating in quality improvement efforts at the system level.
Hospital Engagement
Sponsors unanimously agreed that interaction with hospitals is critical to P4P program success. They stated it was important to engage and work collaboratively with hospitals “early and often” in all aspects of the program design and operation. Sponsors noted that this builds a sense of ownership and partnership among hospitals involved, which, in turn, helps increase acceptance of and support for the P4P program. Program sponsors also feel it is important to provide quality improvement guidance and support to hospitals as part of an ongoing feedback loop. Many sponsors viewed their role not only as the operational manager of the P4P program, but also as an important quality improvement resource for hospitals. They underscored that if performance improvement is truly a goal of the P4P program, mechanisms must be built in to provide assistance to hospitals that are trying to improve.
Public Reporting
Not all sponsors agreed that public reporting should be a part of P4P programs. While some viewed it as an important component that compliments the financial incentive, others saw it as contentious and detrimental to creating a collaborative relationship with hospitals. Sponsors suggested that if public reporting were part of the program, performance should be reported on a wide range of measures—such as clinical, patient experience, and resource use—in order to communicate a complete picture of health care to consumers. Sponsors said that consumers do not make health care decisions in a vacuum and need additional information. As noted previously, many program sponsors provided links on their websites to the Hospital Compare website. Some sponsors suggested that the Hospital Compare website should be simplified for ease of use by consumers. Specific recommendations included (1) the use composite or summary measures within a service area or at the condition level, with information on individual measures available through “drilldown” capabilities to those wanting more-specific information and (2) increased consumer testing of the website to ensure that the information is understandable and useful.
RAND held discussions with a broad cross-section of hospitals, hospital associations, and hospital data vendors to learn about the experiences hospitals and their support vendors have had with the Medicare RHQDAPU P4R program, various private-sector P4P programs, and/or the CMS PHQID. Within the hospitals, we spoke to the Chief Executive Officer (CEO) or President; within the hospital associations, we spoke to the CEO and/or the lead policy and research staff dedicated to performance measurement and reporting. This activity was part of the larger environmental scan that RAND conducted to describe the current P4P and P4R landscape, in terms of how programs are designed and what lessons are being learned, in order to help inform the development of a VBP program for Medicare hospital services.
RAND drew a purposive sample of hospitals from the universe of hospitals included in the RHQDAPU program and PHQID to obtain a range of perspectives. RAND selected hospitals from the national pool of hospitals that provide services to Medicare patients, reflecting an array of characteristics:
We also spoke to a small number of hospitals exposed to a statewide private-sector P4P program, again selecting hospitals that were both large and small in terms of number of beds. In addition, we held discussions with the major hospital associations and a small number of vendors that support the hospitals in their data submissions to comply with P4P and P4R reporting requirements.
Between October of 2006 and March of 2007, RAND held discussions with:
To understand the unique characteristics and issues facing rural and CAHs hospitals that would affect their ability to fully participate in a VBP program, we held telephone discussions with seven hospitals (four rural, three CAHs), two government agencies with expertise in rural health issues, three state hospital associations located in states with a large number of rural providers and CAHs, one research center with expertise in rural health issues, and three consultancies with extensive experience working with rural providers and CAHs. For the rural hospital assessment, the organizations with which we spoke were identified through two sources: (1) hospitals reporting on the Hospital Compare website and (2) experts in the rural health field who were interviewed and asked to identify key organizations and individuals with rural health expertise in the hospital setting.
Hospital Experiences with the Medicare RHQDAPU P4R Program
In our discussions with hospitals about the Medicare RHQDAPU program, which as of 2007 held 2 percent of a hospital’s APU at risk for reporting, there was widespread sentiment that they would publicly report on these measures absent the RHQDAPU effort. The historical evidence suggests the contrary, however. Prior to tying reporting of performance measures to the APU, only a small number of hospitals (400 out of approximately 3,800 PPO hospitals) voluntarily reported performance data under the National Voluntary Hospital Reporting Initiative (NVHRI).
Helping the Hospitals Prepare for P4P. Most hospitals were fairly positive about their experience to date with the RHQDAPU program. Hospitals accepted the measures and agreed that the measures addressed important areas; they also felt that hospitals should be held accountable for these indicators of care. There was a unanimous belief among hospitals that P4P was inevitable, with a number observing that “P4P is going to be a way of life in the future.” Hospitals viewed the RHQDAPU program as a means to help them gain experience with data collection, submission, and validation and to make quality improvements before P4P starts. A number of hospitals commented, “We want to be prepared.” Hospitals indicated they were “OK” with shifting from RHQDAPU directly to P4P. Several hospitals expressed a desire to structure an incentive program with two payment components: a P4R component to allow all hospitals to receive funds to recoup their data collection costs and a P4P component to reward differential performance.
Challenges in Engaging Physicians. Hospitals stated that they were not currently financially incentivizing physicians on the performance measures for which they were being held accountable. Most observed that physician engagement was challenging and that, moving forward, it would be important to align physician incentives to ensure the right behavior occurred. A majority of hospitals, particularly large hospitals, indicated they could not do much to influence physician behavior and struggled with ways to ensure compliance on the performance measures. Frequently, the hospital CEOs with whom we spoke noted that “doctor’s don’t like to practice cookbook medicine” and “don’t like to be told what to do.” The problem of physician engagement was compounded occasionally when the performance measures on which the hospital was being asked to report were not in synch with current evidence-based medicine (i.e., as the evidence changes, reporting requirements frequently lag). A number of hospitals expressed the need to change gain-sharing laws so that hospitals could structure financial incentives internally for physicians, and that this would allow physicians to see “what’s in it for them.”
P4R and P4P Are Generating the Engagement of Hospital Leadership. Hospitals were in widespread agreement that the P4R program had caused important changes in their organizations, noting that it has resulted in a more proactive focus on quality improvement and attention on performance at all levels of the organization. A common sentiment expressed was, “Without P4R, the quality improvement effort would have been smaller and slower.” This sentiment was also indicated by hospitals exposed to P4P programs. Hospitals noted that their hospital boards and leadership were now much more focused on quality, and that typically there was a monthly review of progress on the performance indicators during the hospital board meetings, something that had not occurred prior to the P4R program. Hospitals stated that their leadership and boards frequently reviewed the Hospital Compare website to see where their hospital stood relative to others in their community and nationally; they also noted, “We don’t want to be in the bottom quartile.”
Hospital Experiences with Premier PHQID
Among Premier hospitals that were voluntarily participating in PHQID, we found broad agreement that their decision to participate reflected a desire to “get in at the start to hopefully shape it” and a recognition that “P4P is coming, and it is a way to gain experience.” Some of the Premier hospitals that were eligible to participate but had declined indicated that they were shadowing the PHQID project by collecting the same data and investing in quality improvement activities. They felt that it was important for them to do so to be prepared when P4P became a reality for all hospitals. Interestingly, among the subset of PHQID hospitals with which we spoke, many stated that the possibility of financial incentive was a negligible factor in their decision to participate in the demonstration.
While P4P and P4R Are Leading to Behavior Change Among Hospitals, the ROI Is Unclear. PQHID participants stated that the P4P demonstration is driving improvements in the care they provide but that it has required them to allocate significant staff and resources to meet program requirements. This sentiment was echoed by hospitals in the RHQDAPU program. Hospitals felt that incentive payments (actual or potential) did not offset costs they were incurring to participate. Among the hospitals in the RHQDAPU program, a number noted that the cost of participation exceeded the 0.4 percent update they could receive for reporting, although they noted this might change when CMS increased the update factor tied to public reporting to 2 percent. One hospital commented that “you’ve got to make it worth people’s time to do these things.” Several hospitals expressed the importance of having CMS help hospitals see the link between doing better on the quality measures and a positive ROI—such as reductions in costs, lengths of stay, and readmissions.
The PHQID Incentive Payment Structure Creates Cliff Effects and Penalizes Hospitals That Perform Well. The Premier demonstration payment structure provided financial rewards only to hospitals that performed in the top two deciles of performance, based on a relative comparison of performance among hospital participants in each year of the program. Across the board, hospital participants expressed dislike for the design of the incentive structure. They noted it created a cliff effect (all or nothing payment) by rewarding hospitals at or above the 80th percentile performance and not rewarding any hospital that fell below this cut point—even when there was no statistical difference in their performance. Hospitals felt they were being penalized unfairly under a relative scoring method when most hospitals were scoring at or close to 100 percent—which occurred for several of the performance indicators that had effectively topped out. One hospital cited, as an example, that for aspirin at arrival, the top four decile groups had effectively achieved 100 percent compliance with the performance measure, yet only the top two deciles were paid incentive dollars. Several hospitals questioned the value of having hospitals expend substantial resources chasing the top tail of the performance distribution when performance scores were so tightly clustered to the top right end of the distribution, expressing a belief that the relative benefit to patients was small and that it effectively was causing hospitals to divert resources that could be deployed to lower-performing areas that were not incentivized.
Over time, as providers make improvements, the compression of performance scores toward the top end of the performance distribution (i.e., the ceiling effect) will present challenges to P4P program sponsors that seek to differentiate providers on a relative performance basis. Common remarks by hospitals included: “All should get the bonus if they achieve top levels of performance,” and “Rewarding the top two deciles is meaningless when the scores are so compressed at the top end.” Other hospital comments reflected frustration with the relative performance incentive structure, for example: “Every time we do better the bar gets higher” (the hospital noted that it was effectively 100 percent on some measures and got no incentive dollars); “Funding [is] only for [the] top 20 percent of hospitals, so 80 percent are spending dollars to improve and getting nothing in return.”
Another reason why hospitals expressed a dislike for using a relative incentive structure is that this approach creates uncertainty about what level of performance is required to win. One hospital said, “The performance bar is constantly shifting up, and it is an unknown to hospitals.” Only at the close of the year, after the hospitals are arrayed in the rank order of their performance, does a hospital know what level of performance was required to hit the 80th percentile of performance to win. Hospitals and their professional associations expressed a strong preference for using an absolute performance threshold as the basis for determining whether a hospital would receive an incentive payment. The absolute threshold was viewed as a preferred approach to structuring an incentive payment because it is “predictable,” “allows a hospital to know in advance what performance target [it] would need to hit,” and “allows all who meet the threshold to secure the bonus.”
Hospitals also expressed support for establishing a lower threshold in order to be able to qualify for an incentive. It was noted that this threshold should “increase as more institutions met the minimum bar.” Our discussions found lukewarm support among individual hospitals for paying for improvement: “Hospitals should meet a minimum standard of excellence to be allowed to care for patients, so you don’t want to pay for improvement that occurs below this threshold.” Hospital associations, however, strongly supported paying on the basis of improvement.
At This Stage, It Is Unclear Whether PHQID Is Causing Unintended Consequences. While most hospitals stated they did not believe the focus on a limited set of performance measures has led to unintended consequences, such as ignoring other clinical areas, they did say that limited staff and financial resources had caused them to focus heavily on what was being measured and rewarded—providing support to those who claim financial incentives promote teaching to the test. Most hospitals said they either did not know whether negative consequences were occurring or were not specifically tracking them. One hospital remarked, “If anything, PHQID has increased activity and focus, and other quality improvement investments are being made, such as EHRs, CPOE, and use of intensivists, which will drive improvements across the board, not just on those things being incentivized.”
Hospital associations commented that they were aware of one unintended consequence associated with the “antibiotic timing” measure for pneumonia (i.e., percentage of pneumonia patients who have received the first dose of antibiotics within four hours after hospital arrival), which is a measure for PHQID and RHQDAPU. In an effort to do well on this measure, some hospitals may have been over-prescribing antibiotics to patients who did not have pneumonia, giving them the antibiotic within the four-hour window before a diagnosis of pneumonia could be confirmed. There is concern that the overuse of antibiotics will increase resistance to the drug in the future. As a result, this measure has been pulled from the measure set and is being respecified. Hospitals, while unable to cite specific examples, expressed concern that the relative incentive structure could lead to such unintended consequences as gaming of the data or hospitals chasing the very top end of the performance distribution by increasing a performance rate from 98 percent compliance to 100 percent with little to no clinical benefit, just to secure the incentive dollars. Several hospitals stated that because hospital margins are very thin, hospitals will chase the dollars.
The Reporting Burden Is Significant. Hospitals emphasized that the reporting burden for hospitals to comply with PHQID and/or RHQDAPU is significant given that data collection is still largely a manual exercise requiring chart abstraction. This was found to be true even in larger institutions having more information technology (IT) resources. EHRs and CPOE are not yet designed to provide data to populate measures such as those in PHQID, RHQDAPU, or other nationally endorsed measurement sets. Most EHRs capture relevant information in text fields; so even when EHRs are available, a text search must be done to determine if an event occurred. Hospitals universally felt that the data collection burden should be an important selection criterion for P4R and P4P programs. There was also consensus on the need to align measures and measure specifications to minimize data collection and reporting burdens—although it was also noted that the problem was less about alignment of specifications and more about getting the various stakeholders to align on what they want to hold providers accountable for. However, it is important to note that even though CMS allowed sampling of patient records to minimize the hospital reporting burden, many large hospitals reported that they did not use the sampling method, citing a need to have 100 percent of the cases to do their quarterly quality improvement work with doctors. These hospitals stated that the small number of sampled cases showed results that were too variable and did not provide a reliable source of information to give to doctors.
The Problem of Small Numbers Exists. The problem of only a small number of patients meeting the measure criteria was also raised, primarily by small hospitals, including rural hospitals and CAHs. Estimates of performance based on a small number of events (i.e., patients who receive appropriate processes of care) are not stable and vary substantially from period to period, making the task of separating out the “signal” (true performance) from the “noise” (random variation) a challenging one. Hospitals with small numbers of patients cited challenges in interpreting and using results that showed large variation from period to period. Among the smaller hospitals, there was agreement that “we should only be measured on what we actually do.” Smaller hospitals thought that CMS should work to construct measures that more readily apply to the care they provide, such as transfers. When asked whether hospitals would support the use of composites to help with the small-numbers problem, there was no strong signal of support. However, this response may have stemmed from a lack of understanding about how the composites might be constructed. There was, in contrast, strong support for risk adjustment to ensure comparability across hospitals.
Measures of Outpatient Hospital Services Are Not Being Used at This Stage. None of the hospitals or hospital associations with which we spoke reported measures of outpatient hospital services being included in any P4P or P4R program to which they had been exposed, although several of the hospitals exposed to the private-sector P4P program noted that its sponsor was beginning to discuss with hospitals how such measures might be developed. There was general agreement that services—visits, procedures, and tests—provided in the outpatient hospital setting represented a substantial portion of care for which there currently is no accountability. Hospitals noted that outpatient hospital services have been a huge revenue growth area, and some reported seeing “much utilization that seems questionable.” While hospitals recognized that a large amount of care is delivered in this setting, they cited many challenges with developing performance measures and holding hospitals accountable given that data are less standardized on the outpatient side, and the mix of services delivered in this setting varies substantially across institutions.
Support for Having a Robust Data Validation Process Is Strong. Hospitals universally agreed that data validation is a critical feature of P4P programs. Hospitals were concerned about possible gaming, especially if there is “too much money on the table and people start panicking,” and believed that an audit function was needed to guard against this behavior. An attestation-type approach to data validation, such as the process the Leapfrog Group uses, was not viewed as sufficiently rigorous for situations in which money is tied to performance. Hospitals expressed frustration with the substantial lag in the current validation processes—minimally six to nine months for PHQID, and 12 months before RHQDAPU results are posted on Hospital Compare—which slows down the process for getting feedback for CQI and public reporting. Hospitals stated a need for more-frequent updates—within three months of data submission—with comparisons to peers/benchmarks for use in quality improvement activities.
Transparency of Performance Results Is Viewed as a Positive. Hospitals indicated that they thought public reporting of performance on the hospital measures was good and that it has forced their doctors to pay attention and get engaged. One hospital noted that “an external force doing measurement and reporting is our key lever (other than relational) with doctors to get them to change their behavior.” Another noted that “it says someone is watching.” Only a few hospitals said that “reporting hasn’t been a factor in driving behavior changes.” Most hospitals stated that public reporting of their results compared with those of their peers has garnered the attention of their hospital boards and stimulated investment in quality improvement, noting that “no one wants to be at the bottom of the list.” Hospitals preferred that if the RHQDAPU program evolved into a P4P program, a pilot or dry-run period of data collection occur prior to public reporting and payouts.
Although hospital leadership and physicians are internally paying attention to the comparative results, hospitals seemed to be unsure about whether consumers really use the information. Many hospitals thought that the CMS Hospital Compare website should be simplified to make it easier for consumers to use. There was no consensus among hospitals about what would be the appropriate comparison group of hospitals or whether one is even needed for public reporting of results. One hospital stated: “The consuming public needs to know if a hospital will provide adequate care, so the focus should be on whether the hospital hits a threshold target [rather than] comparing one hospital to another.” Another hospital thought that regional comparisons would be helpful to consumers “who won’t be traveling to other states for care.”
Hospitals Are Encountering Certain Challenges. Many hospitals stated that it was difficult to get physicians to change their behavior regarding actions called for in the performance measures and that they felt as though they were serving as a go-between for CMS and the physician. Hospitals thought they had little leverage to affect physician behavior other than having good relationships. The current prohibition on gain sharing precludes hospitals from structuring provider financial incentives within their organizations, thus hindering their ability to motivate physicians to engage in the P4R and P4P programs (“A slow process until MD incentives are also aligned.” “Physician and hospital P4P programs shouldn’t be separate”).
Having to work with and win over doctors was a common theme in our discussions with hospitals (“Doctors don’t like hospitals telling them what to do.” “Doctor’s don’t like to practice cookbook medicine”).
Some hospitals reported that in response to the challenges of engaging physicians, they had developed solutions to force behavior change, such as creating admission and discharge forms that prompt doctors for information and/or to do required things, creating standing clinical protocols, and structuring clinical treatment paths differently. Hospitals appeared to be developing unique interventions rather than implementing a one-size-fits-all approach to driving improvements in care. It was noted that making P4P and quality improvement work requires a lot of coordination across departments.
Hospitals also noted that involvement in these programs requires a lot of staff resources for data collection and validation and quality improvement. Several remarked that to succeed in these programs, a hospital needs infrastructure and multidisciplinary teams, two things not available in smaller community hospitals and hospitals in rural areas, where there are no dedicated staff to perform these functions and “the CEO is often wearing several hats within the organization.”
On the subject of data submissions and the validation process, hospitals expressed broad appreciation for the important “assistance” role that Premier played as a “go-to” entity. The feeling was that Premier provided an important support function related to a hospital’s ability to comply with the program requirements.
Hospitals cited struggles faced because of ongoing changes in the evidence without corresponding changes in what hospitals are held accountable for. They reported that their physicians had made changes in practice consistent with new evidence, even though the hospitals were still required to comply with measure specifications that reflected out-of-date evidence. Hospitals urged that P4R and P4P program sponsors work to address, in a timely manner, changes in the evidence and what hospitals are held accountable for.
Advice Offered by Hospitals Regarding P4P Program Designs
The key recommendations that hospitals had for anyone considering designing and implementing a P4P program were as follows:
Mounting cost pressures and substantial deficits in the quality of care within the U.S. health care system have led policy makers to consider options for system reform to drive improvements. Value-based purchasing is one reform option being examined and tested by payers in the public and private sectors, and it includes both financial (e.g., P4P) and non-financial (e.g., transparency of performance scores) incentives designed to change the behavior of providers.
The Deficit Reduction Act of 2005 (Public Law 109-171, Section 5001(b)) created a statutory mandate for the Secretary to develop a VBP plan for Medicare hospital services commencing FY 2009. This mandate was delegated to the CMS Hospital VBP Workgroup. This environmental scan was conducted to inform the development of the VBP plan for Medicare hospital services. Our scan comprised a review of the literature and key informant discussions with a wide array of individuals who could provide a picture of the current state-of-the-art in hospital pay for performance, including 27 program sponsors, 28 hospitals, 7 hospital associations, 5 data support vendors, and a number of individuals with expertise in rural hospital issues. As part of our discussions, we also examined the experiences of hospitals participating in the Medicare RHQDAPU pay-for-reporting program.
Among the key findings of this review is that hospital P4P has been implemented by more than 40 sponsors, in some cases for more than three-to-five years. Little empirical evidence has emerged, however, from these initiatives to gauge the impact of hospital P4P in meeting a program sponsor’s objectives. This is primarily a function of the absence of formal evaluation occurring in most P4P programs and the challenges of conducting evaluation in real-world applications that lack comparison groups to assess the impact of the P4P intervention. The strongest evidence on the impact of hospital P4P to date has been shown through the Premier evaluation of the Premier Hospital Quality Incentive Demonstration (PHQID) and the Lindenauer study of the impact of PHQID relative to the Medicare pay-for-reporting program. These studies suggest the additional effects of P4P are somewhat modest relative to public reporting and other quality interventions that are occurring simultaneously. The literature suggests, however, that multifaceted interventions will be most effective at producing sustained improvements in patient care (Grol et al 2002; Grol and Grimshaw 2003).Drawing from the theoretical literature on the use of incentives, it appears that incentives can be effective in changing behavior, and that how the incentives are structured will determine the type and magnitude of the behavioral response.
In our hospital and P4P program sponsor discussions, there was an expressed desire to allow experimentation to create models where learning could occur, which could help inform design structures. The discussants anticipate that the results of P4P and specific design options may differ as a function of the varying structure of local health care markets.
Given that P4P is a newly emerging reform tool and that little information is currently available about the impact of P4P or the influence of various design structures on P4P outcomes, P4P programs should incorporate evaluation and ongoing monitoring into their design as a means of building a knowledge base. The collection and broad dissemination of this type of information will be critical to future efforts to construct P4P programs so that they can meet their programmatic objectives. Funding will be necessary to support program evaluation, and the evaluation work needs to be sustained over multiple years to fully assess impact and monitor for unintended consequences.
The key design and implementation lessons that emerged from our discussions with program sponsors, hospitals, and data vendors included:
Our discussions also uncovered a number of program implementation challenges that merit consideration during program design and implementation. One challenge that affects a sizeable number of hospitals is the problem of having only a small number of events or cases to report for one or more measures; a small number of events to score leads to unstable estimates of performance to use in performance-based incentive payments. While this is a more acute problem for small and rural hospitals with a small number of patients per year, the problem can also occur for medium- and large-size hospitals depending on their service mix, details of measure specifications, and the use of sampling during data collection. Use of all-payer data, collecting data over extended periods of time, use of composite measures, and identifying measures relevant to smaller providers are approaches that can help to mitigate the small numbers problem.
The data collection burden, which affects how many measures a P4P program can reasonably require a hospital to collect and report, creates challenges for efforts to comprehensively assess the performance of hospitals. The more comprehensive the measure set used, the greater the burden on hospitals, given existing information technologies. Current information systems are not equipped to capture and easily retrieve the clinical information used to create performance measures, nor are they structured to enable routine monitoring of quality of care. Until health information systems are upgraded to capture this information, program sponsors will be constrained in the number and breadth of measures they can expect hospitals to collect and report. P4P programs are also challenged with an acute need to ensure the integrity of the data used to score hospitals and make differential payments, which requires resources for data validation. Allocating sufficient resources to validation work is critical for program credibility, and today only limited resources are being used for data validation within P4P programs. Most hospitals stated that the current level of validation is insufficient, given the potential to shift large sums of money within the system.
P4P programs have the potential to drive system improvements. The success of these programs in meeting improvement goals will be affected by their design, implementation, and allocating sufficient resources to engage in the necessary day-to-day operations, program monitoring and impact evaluating, and ongoing modification. Given the limited knowledge base, it is critical that P4P programs include evaluation in their design to generate the knowledge to support smart program design and efficient use of resources.
Hospitals understand that P4P is likely to be part of their future and generally seem supportive of the concept. They face a number of challenges to their ability to successfully participate in these programs, including lack of physician engagement, inadequate information infrastructure that necessitates the manual collection of data from charts, and potentially conflicting signals from various organizations measuring hospital performance. These implementation challenges should be carefully considered in the design of any hospital P4P program.
This appendix lists the complete set of design issues that were identified by ASPE and CMS as being of interest for exploring through the environmental scan work.
OVERVIEW
MEASURES
DATA
PAYMENT MECHANISMS
PUBLIC REPORTING
CROSS-CUTTING THEMES
OUTPATIENT SETTING
This appendix builds on the summary of P4P design principles and recommendations presented in Chapter 1 of this report. Here we present and summarize the P4P design principles established by 26 organizations representing a variety of stakeholders, including purchasers, health care providers, policy organizations, accreditation organizations, health plans, and consumers. Table B.1 displays the P4P design principles for each of the 26 organizations. Table B.2 tallies the principles and recommendations across recommendations.
HEALTH PLANS |
||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
P4P Design Principles/Recommendations |
JCAHO |
MedPAC |
IOM |
NQF Conference |
Leapfrog |
IHA |
Natl. Business Group on Health |
eHealth Initiative Fdn. | Healthways/ Johns Hopkins |
Pacific Business Group on Health |
Alliance of Comm. Health Plans |
AHIP |
Medicare Specific | ||||||||||||
P4P in Medicare should be implemented using a phased approach that varies by setting, reward amount, and measures | X | |||||||||||
Medicare should fund the program by setting aside a small share of payments in a budget-neutral approach | X | |||||||||||
Congress should derive initial funding (3–5 years) largely from existing funds by creating provider-specific pools from a reduction in base Medicare funding for each class of providers | X | |||||||||||
A consolidated pool should be formed from which all providers are rewarded when measures allowing for shared accountability are developed | X | |||||||||||
A Medicare P4P program must not be budget neutral or subject to artificial Medicare payment volume controls | ||||||||||||
Medicare incentives should be financed with a new, dedicated stream of funding | X | |||||||||||
Medicare should distribute all payments that are set aside to providers achieving quality criteria | X | |||||||||||
Medicare should establish a process for continual evolution of measures | X | |||||||||||
A Medicare P4P program should be phased in gradually starting with reporting on structural measures and moving to enhanced payment based on evidence-based clinical measures | ||||||||||||
Medicare should initially reward care that is of high clinical quality, patient centered, and efficient | X | |||||||||||
Medicare should consider expanding the proportion of payment based on performance over time | X | |||||||||||
Medicare should initially reward both providers who improve performance significantly and providers who achieve high performance | X | |||||||||||
Medicare should offer incentives to providers for the submission of performance data, and these data should be publicly available in ways that are meaningful and understandable to consumers | X | |||||||||||
The program should be designed such that virtually all Medicare providers submit performance measures for public reporting and participate in P4P as soon as possible | X | |||||||||||
CMS should design the program to include components that promote, recognize, and reward care coordination across providers | X | |||||||||||
CMS should implement a monitoring and evaluation system for the program | X | |||||||||||
A Medicare P4P program must be pilot tested across settings and specialties and phased in over an appropriate period | ||||||||||||
Incentives should eventually apply to all Medicare providers, including FFS and Medicare Advantage | X | |||||||||||
Metrics | ||||||||||||
Programs should utilize accepted, evidence-based measures | X | X | X | X | X | X | X | X | ||||
Measures should be pilot tested, validated, and vetted through a process that includes public comment and phased in | X | |||||||||||
The measurement set should include measures of clinical quality, patient experience, and infrastructure | X | |||||||||||
Measures need to be prioritized to address areas that are important to patients (such as those that prevent deaths, complications, and discomfort), as well as those that improve satisfaction, outcomes, and experience with care | X` | |||||||||||
Incentives should be based on existing measures and should emphasize clinical effectiveness | X | |||||||||||
Measures adopted should be developed by nationally recognized measurement organizations and recommended by consensus-building organizations | X | X | ||||||||||
Metrics should be high volume, high gravity, and strongly evidence based; have a gap between current and ideal practice and good prospects for quality improvement; and have measurement reliability, validity, and feasibility | X | |||||||||||
Program designers should include a sufficient number of metrics across a spectrum of health promotion activities to provide a balanced view of performance | X | X | ||||||||||
The development, validation, selection, and refinement of measures should be a transparent process that has broad consensus among stakeholders | X | X | ||||||||||
The development and selection of metrics should include participation by the patient community as well as by physicians and other providers | X | |||||||||||
Distinct standards should be developed to evaluate performance relative to the most vulnerable patients: frail elderly and patients with chronic, debilitating, or life-threatening illness | ||||||||||||
Process measures, such as those used by the HQA, should be used | ||||||||||||
Process or intermediate outcome measures are preferred unless robust, well-accepted methods of risk adjustment can be applied to outcome measures | ||||||||||||
The focus should be on structure and process measures until evidence-based outcome measures are developed | ||||||||||||
Structure, process, and outcome measures should be utilized | X | X | ||||||||||
Outcome measures are the highest priority because of their central importance to patients | X | |||||||||||
Outcome measures must be subject to the best available risk adjustment for patient demographics, severity of illness, and co-morbidities | X | |||||||||||
Metrics should be selected from the following domains: patient centeredness, effectiveness, safety, and efficiency | X | X | ||||||||||
Metrics should include efficiency measures | X | |||||||||||
Efficiency measures should only be used when both the cost and the quality of a particular treatment are considered | X | |||||||||||
When measuring quality, focus on misuse and overuse as well as underuse | X | |||||||||||
Provide positive provider incentives for adoption and utilization of IT | X | X | X | X | X | X | X | |||||
Programs implemented by either the public or the private sector involving HIT should incentivize only those applications and systems that are standards based to enable interoperability and connectivity, and should address the transmission of data to the point of care | X | X | ||||||||||
Programs should move from an individual disease management approach to cross-cutting measures | X | |||||||||||
Metrics should be stable over time | ||||||||||||
Metrics should be kept current to reflect changes in clinical practice | ||||||||||||
Each measure should remain in the set for at least three years but should be evaluated annually to adjust weighting and specifications as necessary | X | |||||||||||
Local measures should closely follow national metrics as long as they are reportable from electronic data sets | X | |||||||||||
To prevent physician de-selection of patients, programs should use risk adjustment methods | X | X | X | |||||||||
To ensure fairness, performance data must be fully adjusted for sample size and case mix composition, including age/sex distribution, severity of illness, number of co-morbid conditions, patient compliance, and other features of the practice or patient population that may influence the results | X | X | ||||||||||
The responsibility for developing, maintaining, and revising measures must reside with the specialty organizations representing the providers in whose scope of practice the measure resides | ||||||||||||
Measures should be selected to ensure that all hospitals have an opportunity to participate and succeed | ||||||||||||
Measures should be uniform across all providers of imaging services and across payers | ||||||||||||
Measures used for P4P should meet higher standards than measures designed for other purposes | X | |||||||||||
Programs should reward accreditation or have an equivalent mechanism that rewards continuous attention to all clinical and support systems and processes | X | |||||||||||
Data Collection, Reporting, Feedback | ||||||||||||
Data should be collected without undue burden on providers | X | X | X | X | ||||||||
IT tools should be used whenever possible for data acquisition | ||||||||||||
Programs must reimburse physicians for any administrative burden for collecting and reporting data | ||||||||||||
Allow physicians to review, comment on, and appeal results prior to payment or reporting | ||||||||||||
Programs should have a mix of financial and non-financial incentives (e.g., public reporting) | X | X | X | X | ||||||||
Physician performance data must remain confidential and not subject to discovery in legal proceedings | ||||||||||||
Public reporting/recognition is essential | X | X | X | X | X | |||||||
Performance data feedback should provide comparisons to peers and benchmarks | ||||||||||||
Educational feedback should be provided to providers | X | X | ||||||||||
Physicians must have timely access to the comparative performance database to which they have contributed data, including the ability to benchmark their data | ||||||||||||
Programs should favor the use of clinical data over claims-based data | ||||||||||||
Programs should use administrative data and data from medical records | ||||||||||||
Measures should be feasible to collect using administrative data | X | |||||||||||
Performance data should be audited | X | X | ||||||||||
Programs should use an auditable data collection method tested for reliability and accuracy | X | |||||||||||
Metric assessments and payments should be made as frequently as possible to better align rewards with performance | X | X | ||||||||||
Hospital bonuses should be calculated every 6 months based on activity in the previous 6 months | X | |||||||||||
Data reporting must not violate patient privacy | ||||||||||||
P4P assessments should be done with sample sizes (denominators) large enough to produce statistically significant results | X | |||||||||||
Incentives | ||||||||||||
Reimbursement must be aligned with the practice of high-quality, safe health care | X | X | X | |||||||||
Incentives should be based on rewards, not penalties | X | |||||||||||
Hospital rewards should be based on 50/50 sharing of savings from improvement | X | |||||||||||
Programs should reward providers based on improving care and exceeding benchmarks | X | X | X | X | X | X | ||||||
A sliding scale of rewards should be established to allow for recognition of gradations in quality | X | |||||||||||
Programs must not reward physicians/hospitals based on rankings that compare them with other physicians/hospitals in the program | X | |||||||||||
Payments must exceed the total cost of implementation, including data collection and reporting costs | ||||||||||||
Incentives must be significant enough to drive desired behaviors and support CQI | X | X | X | |||||||||
Mechanisms must be established to allow performance awards for physician behaviors in hospital settings that produce cost savings | ||||||||||||
General Program Design | ||||||||||||
Funding for P4P initiatives should come from additional resources, not a redistribution of resources | ||||||||||||
Top performers should be eligible for market share through patient shift | X | |||||||||||
Programs should offer voluntary physician participation | ||||||||||||
Physicians and/or hospitals should be involved in the program design | X | X | ||||||||||
Programs should encourage strong alignment between practitioner and provider goals | X | |||||||||||
Providers must have the opportunity to understand the measures, analytical methodology, and use of data for public reporting before participating in a P4P program | X | |||||||||||
Most providers should be able to demonstrate improved performance | X | X | ||||||||||
When selecting areas of clinical focus/measures, programs should strongly consider consistency with national and regional efforts | X | X | X | |||||||||
Programs should be consolidated across employers and health plans to make the bonuses meaningful and the program more manageable for physicians | X | |||||||||||
Programs should be designed to include practices of all sizes and levels of IT capabilities | ||||||||||||
Physician organizations rather than individual physicians should be the accountable entity in P4P programs | X | |||||||||||
Initiatives need to be flexible enough to assess performance at both the individual and the group level | ||||||||||||
Accountability must occur at the individual physician level | X | |||||||||||
Payments should recognize systemic drivers of quality in units broader than individual provider organizations and practitioner groups | X | |||||||||||
The data or the program should be adjusted for patient non-compliance | X | |||||||||||
Programs should incorporate periodic objective evaluations of impacts and make adjustments | X | X | ||||||||||
As P4P methodologies develop, patient access to quality care should be facilitated and not impeded by reduced reimbursement | ||||||||||||
Programs should invest in sub-threshold performers who are committed to improvement | X | X |
P4P Design Principles/Recommendations |
HEALTH CARE ORGANIZATIONS |
PHYSICIAN GROUPS
|
Hospital Groups | Patient Groups | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
IHA | AAFP | ACP | Mass. Medical Society | ACC Fdn. | MGMA | AMGA | American Society | ACR | Surgical Specialty Orgs* | AHA | AAMC | Comm Hospital Assoc. |
National Patient Advocacy Foundation | |||
Leapfrog Data Collection, Reporting, Feedback | ||||||||||||||||
Medicare Specific | ||||||||||||||||
P4P in Medicare should be implemented using a phased approach that varies by setting, amount of reward, and measures | ||||||||||||||||
Medicare should fund the program by setting aside a small share of payments in a budget-neutral approach | ||||||||||||||||
Congress should derive initial funding (3–5 years) largely from existing funds by creating provider-specific pools from a reduction in base Medicare funding for each class of providers | ||||||||||||||||
A consolidated pool should be formed from which all providers are rewarded when measures that allow for shared accountability are developed | ||||||||||||||||
A Medicare P4P program must not be budget neutral or subject to artificial Medicare payment volume controls | X | X | ||||||||||||||
Medicare incentives should be financed with a new, dedicated stream of funding | ||||||||||||||||
Medicare should distribute all payments that are set aside to providers achieving quality criteria | ||||||||||||||||
Medicare should establish a process for continual evolution of measures | ||||||||||||||||
A Medicare P4P program should be phased in gradually starting with reporting on structural measures and moving to enhanced payment based on evidence-based clinical measures | X | |||||||||||||||
Medicare should initially reward care that is of high clinical quality, patient centered, and efficient | ||||||||||||||||
Medicare should consider expanding the proportion of payment based on performance over time | ||||||||||||||||
Medicare should initially reward both providers who improve performance significantly and providers who achieve high performance | ||||||||||||||||
Medicare should offer incentives to providers for the submission of performance data, and these data should be publicly available in ways that are meaningful and understandable to consumers | ||||||||||||||||
The program should be designed such that virtually all Medicare providers submit performance measures for public reporting and participate in P4P as soon as possible | ||||||||||||||||
CMS should design the program to include components that promote, recognize, and reward care coordination across providers | ||||||||||||||||
CMS should implement a monitoring and evaluation system for the program | ||||||||||||||||
A Medicare P4P program must be pilot tested across settings and specialties and phased in over an appropriate period | X | |||||||||||||||
Incentives should eventually apply to all Medicare providers, including FFS and Medicare Advantage | ||||||||||||||||
Metrics | ||||||||||||||||
Programs should utilize accepted, evidence-based measures | X | X | X | X | X | X | X | X | X | |||||||
Measures should be pilot tested, validated, and vetted through a process that includes public comment and phased-in | ||||||||||||||||
The measurement set should include measures of clinical quality, patient experience, and infrastructure | ||||||||||||||||
Measures need to be prioritized to address areas that are important to patients (such as those that prevent deaths, complications, and discomfort), as well as those that improve satisfaction, outcomes, and experience with care | ||||||||||||||||
Incentives should be based on existing measures and should emphasize clinical effectiveness | ||||||||||||||||
Measures adopted should be developed by nationally recognized measurement organizations and recommended by consensus-building organizations | X | X | ||||||||||||||
Metrics should be high volume, high gravity, and strongly evidence based; have a gap between current and ideal practice and good prospects for quality improvement; and have measurement reliability, validity, and feasibility | ||||||||||||||||
Program designers should include a sufficient number of metrics across a spectrum of health promotion activities to provide a balanced view of performance | ||||||||||||||||
The development, validation, selection, and refinement of measures should be a transparent process that has broad consensus among stakeholders | X | |||||||||||||||
The development and selection of metrics should include participation by the patient community as well as by physicians and other providers | X | |||||||||||||||
Distinct standards should be developed to evaluate performance relative to the most-vulnerable patients: frail elderly and patients with chronic, debilitating, or life-threatening illness | X | |||||||||||||||
Process measures, such as those used by the HQA, should be used | ||||||||||||||||
Process or intermediate outcome measures are preferred unless robust, well-accepted methods of risk adjustment can be applied to outcome measures | X | |||||||||||||||
The focus should be on structure and process measures until evidence-based outcome measures are developed | X | |||||||||||||||
Structure, process, and outcome measures should be utilized | X | X | X | |||||||||||||
Outcome measures are the highest priority because of their central importance to patients | ||||||||||||||||
Outcome measures must be subject to the best available risk adjustment for patient demographics, severity of illness, and co-morbidities | X | X | X | |||||||||||||
Metrics should be selected from the following domains: patient centeredness, effectiveness, safety, and efficiency | X | |||||||||||||||
Metrics should include efficiency measures | X | |||||||||||||||
Efficiency measures should only be used when both the cost and the quality of a particular treatment are considered | X | |||||||||||||||
When measuring quality, focus on misuse and overuse as well as underuse | X | |||||||||||||||
Provide positive provider incentives for adoption and utilization of IT | X | X | X | X | X | X | X | X | ||||||||
Programs implemented by either the public or the private sector involving HIT should incentivize only those applications and systems that are standards based to enable interoperability and connectivity, and should address the transmission of data to the point of care | ||||||||||||||||
Programs should move from an individual disease management approach to cross-cutting measures | ||||||||||||||||
Metrics should be stable over time | X | X | ||||||||||||||
Metrics should be kept current to reflect changes in clinical practice | X | X | ||||||||||||||
Each measure should remain in the set for at least three years, but should be evaluated annually to adjust weighting and specifications as necessary | ||||||||||||||||
Local measures should closely follow national metrics as long as they are reportable from electronic data sets | ||||||||||||||||
To prevent physician de-selection of patients, programs should use risk adjustment methods | X | X | X | X | X | X | ||||||||||
To ensure fairness, performance data must be fully adjusted for sample size and case mix composition, including age/sex distribution, severity of illness, number of co-morbid conditions, patient compliance, and other features of the practice or patient population that may influence the results | X | X | X | X | X | |||||||||||
The responsibility for developing, maintaining, and revising measures must reside with the specialty organizations representing the providers in whose scope of practice the measure resides | X | X | ||||||||||||||
Measures should be selected to ensure that all hospitals have an opportunity to participate and succeed | ||||||||||||||||
Measures should be uniform across all providers of imaging services and across payers | X | |||||||||||||||
Measures used for P4P should meet higher standards than measures designed for other purposes | ||||||||||||||||
Programs should reward accreditation or have an equivalent mechanism that rewards continuous attention to all clinical and support systems and processes | ||||||||||||||||
Data should be collected without undue burden on providers | X | X | X | X | X | X | X | |||||||||
IT tools should be used whenever possible for data acquisition | X | |||||||||||||||
Programs must reimburse physicians for any administrative burden for collecting and reporting data | X | X | X | X | ||||||||||||
Allow physicians to review, comment on, and appeal results prior to payment or reporting | X | X | X | X | X | |||||||||||
Programs should have a mix of financial and non-financial incentives (e.g., public reporting) | X | |||||||||||||||
Physician performance data must remain confidential and not subject to discovery in legal proceedings | X | |||||||||||||||
Public reporting/recognition is essential | X | |||||||||||||||
Performance data feedback should provide comparisons to peers and benchmarks | X | |||||||||||||||
Educational feedback should be provided to providers | X | X | ||||||||||||||
Physicians must have timely access to the comparative performance database to which they have contributed data, including the ability to benchmark their data | X | |||||||||||||||
Programs should favor the use of clinical data over claims-based data | X | |||||||||||||||
Programs should use administrative data and data from medical records | X | |||||||||||||||
Measures should be feasible to collect using administrative data | ||||||||||||||||
Performance data should be audited | X | X | X | |||||||||||||
Programs should use an auditable data collection method that is tested for reliability and accuracy | ||||||||||||||||
Metric assessments and payments should be made as frequently as possible to better align rewards with performance | X | |||||||||||||||
Hospital bonuses should be calculated every 6 months based on activity in the previous 6 months. | ||||||||||||||||
Data reporting must not violate patient privacy | X | X | ||||||||||||||
P4P assessments should be done with sample sizes (denominators) large enough to produce statistically significant results | X | X | X | |||||||||||||
Incentives | ||||||||||||||||
Align reimbursement with the practice of high quality, safe health care | X | X | X | X | X | X | X | X | ||||||||
Incentives should be based on rewards, not penalties | X | X | X | X | X | X | X | |||||||||
Hospital rewards should be based on a 50/50 sharing of savings from improvement | ||||||||||||||||
Programs should reward providers based on improving care and exceeding benchmarks | X | X | X | X | X | |||||||||||
A sliding scale of rewards should be established to allow for recognition of gradations in quality | ||||||||||||||||
Programs must not reward physicians/hospitals based on rankings that compare them with other physicians/hospitals in the program | X | X | ||||||||||||||
Payments must exceed the total cost of implementation, including data collection and reporting costs | X | |||||||||||||||
Incentives must be significant enough to drive desired behaviors and support continuous quality improvement | X | X | X | |||||||||||||
Mechanisms must be established to allow performance awards for physician behaviors in hospital settings that produce cost savings | X | |||||||||||||||
General Program Design | ||||||||||||||||
Funding for P4P initiatives should come from additional resources, not a redistribution of resources | X | |||||||||||||||
Top performers should be eligible for market share through patient shift | ||||||||||||||||
Programs should offer voluntary physician participation | X | X | X | X | ||||||||||||
Physicians and/or hospitals should be involved in the program design | X | X | X | X | X | |||||||||||
Programs should encourage strong alignment between practitioner and provider goals | X | X | ||||||||||||||
Providers must have the opportunity to understand the measures and analytical methodology and use of data for public reporting before participating in a P4P program | X | |||||||||||||||
Most providers should be able to demonstrate improved performance-focus on areas needing improvement | X | |||||||||||||||
When selecting areas of clinical focus/measures, programs should strongly consider consistency with national and regional efforts | X | |||||||||||||||
Programs should be consolidated across employers and health plans to make the bonuses meaningful and the program more manageable for physicians | X | |||||||||||||||
Programs should be designed to include practices of all sizes and levels of IT capabilities | X | X | ||||||||||||||
Physician organizations rather than individual physicians should be the accountable entity in PFP programs | X | X | ||||||||||||||
Initiatives need to be flexible enough to assess performance at both the individual and the group level | ||||||||||||||||
Accountability must occur at the individual physician level | ||||||||||||||||
Payments should recognize systemic drivers of quality in units broader than individual provider organizations and practitioner groups | ||||||||||||||||
Programs should be designed to acknowledge the united approach (team approaches, integration of services, continuity of care) | X | X | X | X | ||||||||||||
Fair and accurate models for attributing care when multiple physicians treat the same patient must be implemented | ||||||||||||||||
The results of P4P programs should not be used against physicians in health plan credentialing, licensure, or certification | X | X | X | |||||||||||||
The data or the program should be adjusted for patient non-compliance | X | X | X | |||||||||||||
Programs should incorporate periodic objective evaluations of impacts and make adjustments | X | X | X | |||||||||||||
As P4P methodologies develop, patient access to quality care should be facilitated and not impeded by reduced reimbursement | X | |||||||||||||||
Programs should invest in sub-threshold performers who are committed to improvement |
NQF Conference
*American Academy of Ophthalmology, American Academy of Otolaryngology, American Association of Neurological Surgeons, American Association of Orthopedic Surgeons, American College of Surgeons, American Society of Cataract and Refractive Surgery, American Society of Plastic Surgeons, American Urological Association, Congress of Neurological Surgeons, Society for Vascular Surgery, Society of American Gastrointestinal and Endoscopic Surgeons, Society of Gynecologic Oncologists, Society of Surgical Oncology, and The Society of Thoracic Surgeons.
Table B.2. Summary of P4P Design Principles and
Recommendations
Principles and Recommendations |
Number of Orgs Supporting(n=26) | |
---|---|---|
Metrics for P4P Programs: | ||
• Evidence based | 19 | |
• Risk adjust to mitigate impact of patient non-compliance, avoid physician de-selection of patients, and ensure fairness | 11 | |
• Comprehensive in scope | 5 | |
• The development, validation, and selection of measures should include all stakeholders | 5 | |
• Recommended by consensus-building organizations | 4 | |
• Keep current to reflect changes in clinical practice | 4 | |
• Focus on clinical areas needing improvement | 4 | |
• Stable over time | 3 | |
• Focus on misuse and overuse as well as underuse | 2 | |
• Developed, maintained, and revised by specialty organizations | 2 | |
• Include the patient community in the selection process | 2 | |
• Should meet higher standards than metrics used for other purposes | 2 | |
• Select such that all hospitals may participate | 1 | |
• Evaluate performance relative to the most-vulnerable patients (frail elderly and patients with chronic, debilitating, or life-threatening illness) | 1 | |
• Move from an individual disease management approach to cross- cutting measures | 1 | |
• Reward accreditation or similar process | ||
Process measures: | • Should be included in P4P programs | 1 |
Outcome measures | • Risk adjust | 8 |
• Should be included in P4P programs | ||
• Are not sufficiently developed | ||
• Give the highest priority | 11 | |
Structural measures | 2 | |
• Should be included in P4P programs | 1 | |
• Should include HIT adoption and utilization measures | ||
• Should require HIT systems to be standards based and provide data at the point of care | 15 | |
Efficiency measures | ||
• Should be included in P4P programs | 15 | |
• Use only when both the cost and the quality of a treatment are considered | 2 | |
Patient experience measures | ||
• Should be included in P4P programs | 5 | |
Data Collection, Reporting, Feedback: | ||
• Avoid undue burden on providers | 12 | |
• Include public reporting | 8 | |
• Allow providers to review, comment on, and appeal results prior to payment or reporting | 6 | |
• Audit performance data | ||
• Sample sizes must be large enough to produce statistically significant results | 5 | |
• Assess performance and make payments as frequently as possible to align rewards and performance | 4 | |
• Data reporting must not violate patient privacy | ||
• Give providers feedback with benchmarking data | 3 | |
• Favor the use of clinical data over administrative data | ||
• Use both clinical data and administrative data | 3 | |
• Choose measures that are feasible to collect using administrative data | 2 | |
• Performance data must remain confidential and not subject to discovery in legal proceedings | 1 | |
Incentives: | ||
• Reward high-quality, safe health care | 13 | |
• Base rewards on improving care and exceeding benchmarks | 12 | |
• Base incentives on rewards, not penalties | 9 | |
• Provide incentives significant enough to drive desired behaviors and support improvement | 7 | |
• Payment must exceed the cost of implementation (collecting and reporting data) | ||
• Do not base incentives on provider ranking | 5 | |
• Establish gain-sharing mechanisms | ||
• Base hospital rewards on a 50/50 shared savings with payers | 4 | |
• Top performers should be eligible for increased market share through patient shift (steering/tiering) | 1 | |
• Establish a sliding scale of rewards to recognize gradations in quality | 1 | |
General Program Design: | ||
• Providers should be involved in the program design | 7 | |
• Acknowledge team approaches, integration of services, care coordination | 6 | |
• Consider consistency with national and regional efforts | 5 | |
• Incorporate periodic evaluation of impacts and make adjustments | 5 | |
• Encourage strong alignment of physicians and hospitals | 4 | |
• Programs should be voluntary | 2 | |
• Give providers an opportunity to understand the measures, methodology, and reporting requirements before they participate in P4P | 2 | |
• Invest in sub-threshold performers who are committed to improvement | ||
• Funding should come from additional resources, not a redistribution of resources | 2 | |
• Include providers of all sizes and levels of IT capabilities | 2 | |
• Consolidate programs across employers and health plans | ||
• Design to mitigate the impact of patient non-compliance | 2 | |
• Patient access should not be impeded by reduced reimbursement | ||
• Implement fair and accurate attribution rules for providers | 2 | |
Medicare-Specific Recommendations: | • Program should not be budget neutral | 2 |
• Program should be budget neutral | 2 | |
• Use a phased approach | 1 | |
• Reward care that is of high clinical quality, patient centered, and efficient | 1 | |
• Reward improvement and high performance | 1 | |
• Require public reporting | 1 | |
• Reward care coordination | 1 | |
• Include a monitoring and evaluation system | 1 | |
• Provide incentives for FFS and Medicare Advantage providers | 1 | |
• Establish a process for continual evolution of measures | 1 | |
• Distribute all funds that are set aside to providers achieving quality criteria | 1 | |
• Consider expanding the proportion of payment based on performance over time | ||
• Pilot test across settings | 1 |
Measure |
Organizations Collecting/Utilizing Measures |
||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Joint Commission | CMS1 | HQA2 | CMS-RHQDAPU3 | Premier4 | SCIP5 |
STS6 | ACC7 | ACE8 | GWTG9 | IHI10 | Leapfrog11 | NSQIP12 | AHRQ13 | CDC14 | NQF En-dorsed | IOM Domain | |
AMI: | |||||||||||||||||
Aspirin at Arrival | X | X | X | X | X | X | X | X | Effective | ||||||||
Aspirin at Discharge | X | X | X | X | X | X | X | X | Effective | ||||||||
ACEI or ARB for LVSD | X | X | X | X | X | X | X | X | X | Effective | |||||||
Smoking Cessation Advice/Counseling | X | X | X | X | X | X | X | X | X | Effective | |||||||
Beta Blocker at Discharge | X | X | X | X | X | X | X | X | Effective | ||||||||
Beta Blocker at Arrival | X | X | X | X | X | X | X | X | Effective | ||||||||
Mean Time to Thrombolysis/Fibrinolysis | Effectiv | ||||||||||||||||
Thrombolytic/Fibrinolytic Received Within 30 Minutes of Arrival | X | X | X | X | X | X | X | Effective | |||||||||
Mean Time to PC | Effectiv | ||||||||||||||||
PCI Within 120 Minutes of | X | X | X | X | X | X | X | Effective | |||||||||
Smoking Cessation Advice | X | X | X | X | X | X | X | X | Effective, Patient Ctrd. | ||||||||
Beta Blocker at Discharge | X | Effective | |||||||||||||||
Inpatient Mortality | X | Safe | |||||||||||||||
30-Day Mortality (Medicare Patients) | Saf | ||||||||||||||||
30-Day All-Cause Risk Standardized Readmission | Effectiv | ||||||||||||||||
Pneumonia: | |||||||||||||||||
Oxygenation Assessment | X | X | X | X | X | X | Effective | ||||||||||
Pneumoccocal Vaccination | X | X | X | X | X | X | Effective | ||||||||||
Blood Cultures Within 24 Hours Prior to or After Arrival—ICU Patients | Effectiv | ||||||||||||||||
Blood Culture Before First Antibiotic Received | X | X | X | X | X | X | Effective | ||||||||||
Smoking Cessation Advice | X | X | X | X | X | X | Effective, Patient Ctrd. | ||||||||||
Antibiotic Timing (Median) | X | Effective | |||||||||||||||
Initial Antibiotic Received Within 8 Hours of Arrival | Effectiv | ||||||||||||||||
Initial Antibiotic Received Within 4 Hours of Arrival | X | X | X | X | X | X | Effective | ||||||||||
Initial Antibiotic Selection for CAP in Immunocompetent Patient | Effectiv | ||||||||||||||||
Initial Antibiotic Selection for CAP in Immunocompetent—ICU Patient | Effectiv | ||||||||||||||||
Initial Antibiotic Selection for CAP in Immunocompetent—Non-ICU Patient | Effectiv | ||||||||||||||||
Influenza Vaccination | X | X | X | X | X | X | Effective | ||||||||||
Inpatient Mortality | X | X | Effective | ||||||||||||||
30-Day Pneumonia Mortality | X | X | X | Effective |
APPENDIX C: INPATIENT HOSPITAL
MEASURES
Measure |
Organizations Collecting/Utilizing Measures |
|||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Joint Commission | CMS1 | HQA2 | CMS-RHQDAPU3 | Premier4 | SCIP5 | STS6 | ACC7 | ACE8 | GWTG9 | IHI10 | ACC7 | Leapfrog11 | NSQIP12 | AHRQ13 | CDC14 |
NQF Endorsed |
IOM
|
|
Prophylactic Antibiotics Discontinued Within 48 Hours After Surgery End Time—Other Cardiac Surgery | Effectiv | |||||||||||||||||
Pre-Operative Beta Blockade—CABG | Effectiv | |||||||||||||||||
Arrival | Anti-Platelet Medication at Discharge—CABG | Effectiv | ||||||||||||||||
Reperfusion Within 90 Minutes of Arrival | Effectiv | Beta Blockade at Discharge—CABG | ||||||||||||||||
Anti-Lipid lipid Treatment at Discharge—CABG | Effectiv | Inpatient Mortality | X | X | Effectiv | |||||||||||||
Risk-Adjusted Inpatient Operative Mortality—CABG | X | Safe | 30-Day Mortality (Medicare patients) | Saf | ||||||||||||||
Risk-Adjusted Operative Mortality—CABG | Saf | PCI Volum | Saf | |||||||||||||||
Risk-Adjusted Operative Mortality for AVR | Saf | PCI Mortality | X | Saf | ||||||||||||||
Risk-Adjusted Operative Mortality for MVR | X | Safe | Saf | |||||||||||||||
Risk-Adjusted Operative Mortality for MVR + CABG | Heart Failure: | Saf | ||||||||||||||||
Risk-Adjusted Operative Mortality for AVR + CABG | Discharge Instructions | X | X | X | X | X | X | X | Saf | |||||||||
CABG Inpatient Morality Rate | X | Effective, Patient Ctrd. | LVF Assessment | X X | X | X | X | X | X | X X | Safe | |||||||
PTCA Mortality Rate | X | Effective | ACEI or ARB for LVSD | X | X | X | X | X | X | Safe | ||||||||
Pregnancy and Related Conditions: | ||||||||||||||||||
VBAC | X | X | Effective | |||||||||||||||
Inpatient Neonatal Mortality | X | Safe | ||||||||||||||||
3rd or 4th Degree Laceration | X | X | Safe | |||||||||||||||
Birth Trauma-Injury to Neonat | Saf | |||||||||||||||||
Obstetric Trauma—Vaginal Delivery with Instrument | Saf | |||||||||||||||||
Obstetric Trauma—Vaginal Delivery Without Instrument | Saf | |||||||||||||||||
Obstetric Trauma—Cesarean Delivery | Saf | |||||||||||||||||
Surgical Care Improvement/ Surgical Infection Prevention: | ||||||||||||||||||
Prophylactic Antibiotic Received Within 1 Hour Prior to Incision—Overall Rate | X | X | X | X | X | X | X | Effective | ||||||||||
Prophylactic Antibiotic Received Within 1 Hour Prior to Incision—Hip Arthroplasty | X | X | X | X | Effective | |||||||||||||
Prophylactic Antibiotic Received Within 1 Hour Prior to Incision—Knee Arthroplasty | X | X | X | X | Effective | |||||||||||||
Prophylactic Antibiotic Received Within 1 Hour Prior to Incision—Colon Surgery | Effectiv | |||||||||||||||||
Prophylactic Antibiotic Received Within 1 Hour Prior to Incision—Hysterectomy | Effectiv | |||||||||||||||||
Prophylactic Antibiotic Received Within 1 Hour Prior to Incision—Vascular Surgery | Effectiv | |||||||||||||||||
Prophylactic Antibiotic Selection for Surgical Patients—Overall Rate | X | X | X | X | X | X | X | Effective | ||||||||||
Prophylactic Antibiotic Selection—Hip Arthroplasty | X | X | X | X | Effective | |||||||||||||
Prophylactic Antibiotic Selection—Knee Arthroplasty | X | X | X | X | Effective | |||||||||||||
Prophylactic Antibiotic Selection—Colon Surgery | Effectiv | |||||||||||||||||
Prophylactic Antibiotic Selection—Hysterectomy | Effectiv | |||||||||||||||||
Prophylactic Antibiotic Selection—Vascular Surgery | Effectiv | |||||||||||||||||
Prophylactic Antibiotics Discontinued Within 24 Hours After Surgery End Time—Overall Rate | X | X | X | X | X | X | X | Effective | ||||||||||
Prophylactic Antibiotics Discontinued Within 24 Hours After Surgery End Time—Hip Arthroplasty | X | X | X | X | Effective | |||||||||||||
Prophylactic Antibiotics Discontinued Within 24 Hours After Surgery End Time—Knee Arthroplasty | X | X | X | X | Effective | |||||||||||||
Prophylactic Antibiotics Discontinued Within 24 Hours After Surgery End Time—Colon Surgery | Effectiv | |||||||||||||||||
Prophylactic Antibiotics Discontinued Within 24 Hours After Surgery End Time—Hysterectomy | Effectiv | |||||||||||||||||
Prophylactic Antibiotics Discontinued Within 24 Hours After Surgery End Time—Vascular Surgery | Effectiv | |||||||||||||||||
Recommended VTE Prophylaxis Ordered | X | X | X | X | X15 | Effectiv | ||||||||||||
Recommended VTE Prophylaxis Received Within 24 Hours Prior to or After Surgery | X | X | X | X | X15 | Effectiv | ||||||||||||
Cardiac Surgery Patients with Controlled 6 AM Post-Operative Serum Glucose | X | X | X | X | Effective | |||||||||||||
Surgery Patients with Appropriate Hair Removal | Effectiv | |||||||||||||||||
Colorectal Surgery Patients with Immediate Post-Operative Normothermia | Effectiv | |||||||||||||||||
Surgery Patients on Beta Blockers Prior to Admission Who Received a Beta Blocker During the Perioperative Period | Effectiv | |||||||||||||||||
Mortality Within 30 Days of Surgery | Effectiv Safe | |||||||||||||||||
ICU: | ||||||||||||||||||
Ventilator-Associated Pneumonia Prevention—Patient Positioning | Effectiv | |||||||||||||||||
Ventilator Bundle | X | Effective | ||||||||||||||||
Stress Ulcer Disease Prophylaxis | Effectiv | |||||||||||||||||
DVT Prophylaxis | X | X | Effective | |||||||||||||||
Central Line Associated Blood Stream Infection | Effectiv | |||||||||||||||||
Central Line Bundle Compliance | Effectiv | |||||||||||||||||
Central Line Insertion Adherence Practices | Effectiv | |||||||||||||||||
Urinary Catheter–Associated Urinary Tract Infection | Effectiv | |||||||||||||||||
Severe Sepsis/Septic Shock: Activate Drotrecogin Alfa | Effectiv | |||||||||||||||||
Severe Sepsis/Septic Shock: Low Dose Glucocoticoid | Effectiv | |||||||||||||||||
Severe Sepsis/Septic Shock: Blood Cultures Collected | Effectiv | |||||||||||||||||
Severe Sepsis: Central Venous Oxygen Saturation | Effectiv | |||||||||||||||||
Severe Sepsis: Central Venous Pressure | Effectiv | |||||||||||||||||
Severe Sepsis/Septic Shock: Glucose Values | Effectiv | |||||||||||||||||
Severe Sepsis/Septic Shock: Median Inspiratory Plateau Pressures | Effectiv | |||||||||||||||||
Severe Sepsis/Septic Shock: Median Time to Broad Spectrum Antibiotic | Effectiv | |||||||||||||||||
Blood Cultures Performed Within 24 Hours Prior to or After Arrival for Patients Transferred to ICU | Effectiv | |||||||||||||||||
ICU Length of Sta | Effectiv Efficient | |||||||||||||||||
Hospital Mortality for ICU Patients | Effectiv Safe | |||||||||||||||||
Stroke: | ||||||||||||||||||
Deep Vein Thrombosis (DVT) Prophylaxis (Ischemic) | Effectiv | |||||||||||||||||
DVT Prophylaxis for Intercranial Hemorrhage | Effectiv | |||||||||||||||||
Discharged on Antithrombotics (Ischemic, TIA) | Effectiv | |||||||||||||||||
Discharged on Antiplatelet Therapy | Effectiv | |||||||||||||||||
Patients with Atrial Fibrillation Receiving Anticoagulation Therapy (Ischemic) | Effectiv | |||||||||||||||||
Tissue Plasminogen Activator (t-PA) Considered (Ischemic, TIA) | Effectiv | |||||||||||||||||
Antithrombotic Medication Within 48 Hours of Hospitalization (Ischemic, TIA) | Effectiv | |||||||||||||||||
Lipid Profile (Ischemic, TIA | Effectiv | |||||||||||||||||
Screen for Dysphagia (Ischemic, Hemorrhagic, TIA) | Effectiv | |||||||||||||||||
Stroke Education (Ischemic, Hemorrhagic, TIA) | Effectiv Patient Ctrd. | |||||||||||||||||
Smoking Cessation (Ischemic, Hemorrhagic, TIA) | Effectiv Patient Ctrd. | |||||||||||||||||
Plan for Rehabilitation Considered (Ischemic, Hemorrhagic) | Effectiv | |||||||||||||||||
Patient Ctrd. | ||||||||||||||||||
Lipids Measured | X | Effective | ||||||||||||||||
Blood Pressure Management | X | Effective | ||||||||||||||||
Non-Invasive Cartoid Imaging Reports | Effectiv | |||||||||||||||||
CT or MRI Report | Effectiv | |||||||||||||||||
Avoidance of Intravenous Heparin | Effectiv | |||||||||||||||||
Acute Stroke In-Hospital Mortality Rates | Saf | |||||||||||||||||
Cardiac Surgery: | ||||||||||||||||||
Participation in a Systematic Database for Cardiac Surgery (STS) | Effectiv | |||||||||||||||||
Surgical Volume—Isolated CABG | Saf | |||||||||||||||||
Surgical Volume—Valve Surgery | Saf | |||||||||||||||||
Surgical Volume—CABG + Valve Surgery | Saf | |||||||||||||||||
Prophylactic Antibiotic Within 1 Hour Prior to Surgical Incision—CABG | Effectiv | |||||||||||||||||
Prophylactic Antibiotic Within 1 Hour Prior to Surgical Incision—Other Cardiac Surgery | Effectiv | |||||||||||||||||
Selection of Antibiotic—CABG | X | X | X | X | Effective | |||||||||||||
Selection of Antibiotic—Other Cardiac Surgery | Effectiv | |||||||||||||||||
Prophylactic Antibiotics Discontinued Within 48 Hours After Surgery End Time—CABG | ||||||||||||||||||
Use of Internal Mammary Artery—CABG | Effectiv | |||||||||||||||||
Aspirin at Discharge—CABG | X | X | ||||||||||||||||
Post-Operative Hemorrhage or Hematoma—CABG | ||||||||||||||||||
Post-Operative Physiologic and Metabolic Derangement | ||||||||||||||||||
Prolonged Intubation—CABG | X | X | Effective | |||||||||||||||
Deep Sternal Wound Infection Rate—CABG | Saf | |||||||||||||||||
Stroke/Cerebrovascular Accident—CABG | Saf | |||||||||||||||||
Post-Operative Renal Insufficiency—CABG | Saf | |||||||||||||||||
Surgical Re-exploration—CABG | Saf | |||||||||||||||||
Cartoid Endarterectomy Mortality Rate | Saf | |||||||||||||||||
Bilateral Cardiac Catheterization Rate | Saf | |||||||||||||||||
Surgery (Non-Cardiac): | ||||||||||||||||||
Complications of Anesthesia | X | Safe | ||||||||||||||||
Failure to Rescue | X | X | Safe | |||||||||||||||
Foreign Body Left in During Procedure | Saf | |||||||||||||||||
Post-Operative Hip Fracture | X | Safe | ||||||||||||||||
Post-Operative Hemorrhage or Hematoma | X15 | Saf | ||||||||||||||||
Post-Operative Physiologic and Metabolic Derangements | X15 | Saf | ||||||||||||||||
Readmissions 30 Days Post-Discharge | X15 | Safe Efficient | ||||||||||||||||
Safe | Surgical Site Infection | |||||||||||||||||
Surgical Wound Disruptio | Saf | |||||||||||||||||
Post-Operative Respiratory Failure | Saf | |||||||||||||||||
Post-Operative Pulmonary Embolism or Deep Vein Thrombosis | Saf | |||||||||||||||||
Post-Operative Sepsi | Saf | |||||||||||||||||
Post-Operative Wound Dehiscence | Saf | |||||||||||||||||
Hip Replacement Mortality Rate | Saf | |||||||||||||||||
Esophageal Resection Mortality Rate | Saf | |||||||||||||||||
Pancreatic Resection Mortality Rate | Saf | |||||||||||||||||
AAA Repair Mortality Rate | X | Safe | ||||||||||||||||
Incidental Appendectomy Among Elderly Rate | Saf | |||||||||||||||||
Laparoscopic Cholecystectomy Rate | Saf | |||||||||||||||||
Other Surgical Wound Occurrence | Saf | |||||||||||||||||
Pneumonia Post-Surgery | X | Safe | ||||||||||||||||
Unplanned Intubation | X | Safe | ||||||||||||||||
Pulmonary Embolis | Saf | |||||||||||||||||
On Ventilator > 48 Hours | X | Safe | ||||||||||||||||
Other Respiratory Occurrences | X | Safe | ||||||||||||||||
Progressive Renal Insufficiency | X | Safe | ||||||||||||||||
Acute Renal Failur | Saf | |||||||||||||||||
Urinary Tract Infection | X | Safe | ||||||||||||||||
Other Urinary Tract Occurrence | X | Safe | ||||||||||||||||
CVA/Strok | Saf | |||||||||||||||||
Com | Saf | |||||||||||||||||
Peripheral Nerve Injury | X | Safe | ||||||||||||||||
Other CNS Occurrence | X | Safe | ||||||||||||||||
Cardiac Arrest Requiring CPR | X | Safe | ||||||||||||||||
Myocardial Infarction | X | Safe | ||||||||||||||||
Other Cardiac Occurrence | X | Safe | ||||||||||||||||
Bleeding Requiring > 4 Units PRBC/Whole Blood Transfusions Within the First 72 Hours Post-Operative | Saf | |||||||||||||||||
Surgical Graft/Prosthesis/Flap Failure | Saf | |||||||||||||||||
DVT/Thrombophlebitis | X | Safe | ||||||||||||||||
Systemic Sepsis (SIRS) | X | Safe | ||||||||||||||||
Systemic Sepsis (Sepsis) | X | Safe | ||||||||||||||||
Systemic Sepsis (Septic Shock) | X | Safe | ||||||||||||||||
Other Occurrences | X | Safe | ||||||||||||||||
Return to the Operating Room Within 30 Days of Surgery | Saf | |||||||||||||||||
Death Within 30 Days of Surgery | Saf | |||||||||||||||||
Death Greater Than 30 Days After Surgery in Acute Care | Saf | |||||||||||||||||
Venous Thromboembolism (VTE): | ||||||||||||||||||
Risk Assessment/Prophylaxis Within 24 Hours of Admission | Effectiv | |||||||||||||||||
Risk Assessment/Prophylaxis Within 24 Hours of Transfer to ICU | Effectiv | |||||||||||||||||
Documentation of Inferior Vena Cava Filter Indication | Effectiv | |||||||||||||||||
VTE Patients with Overlap Therapy | Effectiv | |||||||||||||||||
VTE Patients Receiving Heparin-Platelet Count Monitoring | Effectiv | |||||||||||||||||
VTE Discharge Instructions | X | Effective | ||||||||||||||||
Incidence of Potentially Preventable Hospital-Acquired VTE | Effectiv Safe | |||||||||||||||||
VTE 30-Day Hospital Readmission (ICSI) | Effectiv Efficient | |||||||||||||||||
Cancer: | ||||||||||||||||||
Patients with Early Stage Breast Cancer Who Have Evaluation of the Axilla | Effectiv | |||||||||||||||||
College of American Pathologists Breast Cancer Protocol | Effectiv | |||||||||||||||||
Colon Cancer: Surgical Resection Includes at Least 12 Nodes | Effectiv | |||||||||||||||||
College of American Pathologists Colon and Rectum Protocol | Effectiv | |||||||||||||||||
Completeness of Pathologic Reporting | Effectiv |
Measure | Organizations Collecting/Utilizing Measures |
||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Joint Commission | CMS1 | HQA2 | CMS-RHQDAPU3 | Premier4 | SCIP5 | STS6 | ACC7 | ACE8 | GWTG9 | IHI10 | Leapfrog11 | NSQIP12 | AHRQ13 | CDC14 | NQF En-dorsed | IOM Domain | |
Nursing/General Care: | |||||||||||||||||
Death Among Surgery Inpatients with Treatable Serious Complications | Saf | ||||||||||||||||
Pressure Ulcer Prevalence | X | X | X | Safe | |||||||||||||
Falls Prevalenc | Saf | ||||||||||||||||
Falls with Injur | Saf | ||||||||||||||||
Restraint Prevalence (Vest and Limb Only) | Saf | ||||||||||||||||
Influenza Vaccination for Healthcare Workers | Saf | ||||||||||||||||
Patient Safety (Non-Surgical): | |||||||||||||||||
Death in Low Mortality DRG | Saf | ||||||||||||||||
Decubitis Ulcers | X | Safe | |||||||||||||||
Failure to Rescue | X | Safe | |||||||||||||||
Iatrogenic Pneumothora | Saf | ||||||||||||||||
Selected Infections due to Medical Care | Saf | ||||||||||||||||
Transfusion Reactio | Saf | ||||||||||||||||
GI Hemorrhage In-Hospital Mortality Rate | Saf | ||||||||||||||||
Hip Fracture In-Hospital Mortality Rate | Saf | ||||||||||||||||
Structural: | |||||||||||||||||
Nursing Care Hours per Patient Day | Effectiv Safe | ||||||||||||||||
Nursing Skill Mix (RN, LVN, LPN, UAP, and Contract) | Effectiv Safe | ||||||||||||||||
Nursing Practice Environment | X | X | Safety | ||||||||||||||
Nursing Voluntary Turnove | Safet | ||||||||||||||||
Computer Physician Order Entry | Saf | ||||||||||||||||
ICU Physician Staffing (Intensivist) | Saf | ||||||||||||||||
Evidence-Based Hospital Referral | Saf | ||||||||||||||||
NQF Safe Practice | Saf | ||||||||||||||||
Psychiatric Services: | |||||||||||||||||
Assessment of Violence Risk, Substance Use Disorder, Trauma, and Patient Strengths | Effectiv Safe, Patient Ctrd. | ||||||||||||||||
Hours of Restraint Us | Saf | ||||||||||||||||
Hours of Seclusion Use | X | Safe | |||||||||||||||
Patients Discharged on Multiple Antipsychotic Medications | Effectiv Safe | ||||||||||||||||
Discharge Assessment and Aftercare Recommendations Sent to Next Level of Care upon Discharge | Effectiv |
Measure | Organizations Collecting/Utilizing Measures |
||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Joint Commission | CMS1 | HQA2 | CMS-RHQDAPU3 | Premier4 | SCIP5 | STS6 | ACC7 | ACE8 | GWTG9 | IHI10 | Leapfrog11 | NSQIP12 | AHRQ13 | CDC14 | NQF En-dorsed | IOM Domain | |
Care Coordination: | |||||||||||||||||
3 Item Care Transition | Effectiv Patient Ctrd. | ||||||||||||||||
Patient Experience: | |||||||||||||||||
Hospital Consumer Assessment of Healthcare Providers and Systems (HCAHPS) | X | X | X | X | X | Patient Ctrd. | |||||||||||
Cross-Cutting Length of Stay/Readmission: | |||||||||||||||||
Inpatient Hospital Average Length of Stay by Medical Service (Pacificare) | Efficien | ||||||||||||||||
Risk-Adjusted Average Length of Inpatient Stay (CareScience) | |||||||||||||||||
Severity-Standardized Average Length of Stay, Routine Care | X | ||||||||||||||||
Severity-Standardized Average Length of Stay, Special Care | |||||||||||||||||
14 Day All-Cause Readmission Rate | |||||||||||||||||
Inpatient Readmission Rate by Medical Diagnosis (Pacificare) |
1 Center for Medicare and Medicaid Services
2 Hospital Quality Alliance
3 Reporting Hospital Quality Data for Annual Payment Update
4 Premier Hospital Quality Incentive Demonstration
5 Surgical Care Improvement Project
6 The Society of Thoracic Surgeons
7 American College of Cardiology
8 Alliance for Cardiac Care Excellence
9 Get With the Guidelines
10 Institute for Healthcare Improvement
11 The Leapfrog Group
12 National Surgical Quality Improvement Program
13 Agency for Healthcare Research and Quality
14Center for Disease Control
15These Premier measures apply only to Hip and Knee Replacement.
Anthem, National office
Anthem, VA
Blue Cross Blue Shield, HI
Blue Cross Blue Shield, IL
Blue Cross Blue Shield, MA
Blue Cross Blue Shield, MI
Blue Shield, Northeastern NY
The Employer Healthcare Alliance Cooperative (“The Alliance”)
Employers’ Coalition on Health
Excellus/Univera
Fallon Community Health Plan
Harvard Pilgrim Health Plan
Health Partners
Highmark BCBS
Horizon BCBS, NJ
Independent Health
Kaiser Permanente, National and Northern and Southern CA offices
Leapfrog Group (Hospital Rewards program)
Maine Health Management Coalition
PacifiCare/United Healthcare
Premier Health System
Priority Health
Providence Health Plan
Regence Blue Shield
Tufts Health Plan
The Veterans Administration
Anonymous program sponsor (1)
Amsterdam Memorial Hospital, Amsterdam, NY
Baptist Health System of East TN, Knoxville, TN
Bleckley Memorial Hospital, Cochran, GA
Crenshaw Community Hospital, Luverne, AL
Fairchild Medical Center, Yreka, CA
Foote Memorial Hospital, Jackson, MI
Franklin Medical Center, Greenfield, MA
Geisinger Health System, Danville, PA
Hackensack University Medical Center, Hackensack, NJ
Henry Ford Health System, Detroit, MI
Hopi Health Care Center, Polacca, AZ
Kaiser Permanente, CA
McLeod Medical Center, Florence, SC
Mercy Medical Center, Centerville, IA
Park Nicollet, St. Louis Park, MN
Rice County District One Hospital, Faribault, MN
San Luis Valley Regional Medical Center, Alamosa, CO
South Central Regional Medical Center, Laurel, MS
Southwestern General Hospital, El Paso, TX
Spruce Pine Community Hospital, Spruce Pine, NC
St. John Health System, Warren, MI
St. Joseph Hospital, Polson, MT
St. Jude Medical Center, Fullerton, CA
Trinity Health System, 20 hospitals in 7 states
Walla Walla General Hospital, Walla Walla, WA
White River Medical Center, Batesville, AR
William Beaumont Hospital, Royal Oak, MI
Anonymous hospitals (2)
American Hospital Association
Association of American Medical Colleges
Catholic Health Association
Federation of American Hospitals
National Association of Children’s Hospitals & Related Institutions
North Carolina Hospital Association
South Dakota Hospital Association
Voluntary Hospital Association
Hospital Corporation of America
Illinois Hospital Association
Maryland Hospital Association
Premier Health System
Quantros
Thomson Healthcare
Cypress Healthcare
Kansas Department of Health and Environment, Office of Local and Rural Health
Health Resources and Services Administration, Office of Rural Health Policy
National Rural Health Association
Stratis Health (Minnesota QIO)
Stroudwater Associates
Upper Midwest Rural Health Research Center
Asch SM, Kerr EA, Keesey J, Adams JL, Setodji CM, Malik S, McGlynn EA. (2006) Who Is at Greatest Risk for Receiving Poor-Quality Health Care? New England Journal of Medicine 354(11):1147–1156.
Asch B, Warner J. (1996) Incentive Systems: Theory and Evidence. In Lewin D, Mitchell D, Zaidi M (eds), The Human Resource Management Handbook, Part One. Greenwich, CA: JAI Press, 175–215.
Barnato AE, Lucas FL, Staiger D, Wennberg DE, Chandra A. (2005) Hospital-Level Racial Disparities in Acute Myocardial Infarction Treatment and Outcomes. Medical Care 43:308–319.
Bazerman MH, Baron J, Skonk K. (2001) You Can't Enlarge the Pie. Cambridge, MA: Basic Books.
Berthiaume JT, Chung RS, Ryskina KL, Walsh J, Legorreta AP. (2006) Aligning Financial Incentives with Quality of Care in the Hospital Setting. Journal for Healthcare Quality 28(2):36–44, 51.
Berthiaume JT, Tyler PA, Ng-Osorio J, LaBresh KA. (2004) Aligning Financial Incentives with “Get with the Guidelines” to Improve Cardiovascular Care. American Journal of Managed Care 10(7 Pt 2):501–504
Berwick DM. (1995). The Toxicity of Pay for Performance. Quality Management in Health Care 4(1):27–33.
Birkmeyer NJ, Birkmeyer JD. (2006) Strategies for Improving Surgical Quality—Should Payers Reward Excellence or Effort? New England Journal of Medicine 354(8):864–870.
Cameron J, Banko KM, Pierce WD. (2001) Pervasive Negative Effects of Rewards on Intrinsic Motivation: The Myth Continues. The Behavior Analyst 24(1):1–44.
Casalino LP, Elster A. (2007) Will Pay-for-Performance and Quality Reporting Affect Health Care Disparities? Health Affairs 26:w405–w414.
CMS. (2007a) CMS Announces Payment Reforms for Inpatient Hospital Services in 2008. As of August 1, 2007: http://www.cms.gov/apps/media/pressrelease.asp
CMS. (2007b) National Health Expenditure Prospectus 2006-2016. Office of the Actuary. As of July, 2007: http://www.cms.hhs.gov/nationalhealthexpenddata/downloads/proj2006.pdf
Davies HT. (2001) Public Release of Performance Data and Quality Improvement: Internal Responses to External Data by US Health Care Providers. Quality in Health Care 10(2):104-10.
Deci EL, Koestner R, Ryan RN. (1999) A Meta-Analytic Review of Experiments Examining the Effects of Extrinsic Rewards on Intrinsic Motivation. Psychological Bulletin 125(6):627–668; discussion 692–700.
Doran T, Fullwood C, Gravelle H, Reeves D, Kontopantellis E, Hiroeh U, Roland M. (2006) Pay-for-Performance Programs in Family Practices in the United Kingdom. New England Journal of Medicine 355(4):375–384.
Fisher ES, Staiger DO, Bynum JPW, Gottlieb DJ. (2007) Creating Accountable Care Organizations: The Extended Hospital Medical Staff. Health Affairs 26(1):w44–w57.
Fisher ES, Wennberg DE, Stukel TA, Gottlieb DJ, Lucas FL, Pinder EL. (2003) The Implications of Regional Variations in Medicare Spending. Part 1: The Content, Quality, and Accessibility of Care. Annals of Internal Medicine 138(4):273–287.
Fisher ES, Wennberg DE, Stukel TA, Gottlieb DJ, Lucas FL, Pinder EL. (2003) The Implications of Regional Variations in Medicare Spending. Part 2: Health Outcomes and Satisfaction with Care. Annals of Internal Medicine 138(4):288–299.
Freedman JL, Cunningham JA, Krismer K. (1992) Inferred Values and the Reverse-Incentive Effect in Induced Compliance. Journal of Personality and Social Psychology 62(3):357–368.
Glickman SW, Ou F, Delong ER, Roe MT, Lytle BL, Mulgund J, Rumsfeld JS, Gibler WB, Ohman EM, Schulman KA, Peterson ED. (2007) Pay for Performance, Quality of Care, and Outcomes in Acute Myocardial Infarction. Journal of the American Medical Association 297:2373–2380.
Gneezy U, Rustichini A. (2000) Pay Enough or Don’t Pay at All. The Quarterly Journal of Economics 115(3):791–810.
Grol R, Baker R, Moss F. (2002) Quality Improvement Research: Understanding the Science of Change in Health Care. Quality and Safety in Health Care 11:110–111.
Grol R and Grimshaw J. (2003) From Best Evidence to Best Practice: Effective Implementation of Change in Patients’ Care. The Lancet 362:1225–1230.
Grossbart SR. (2006) What’s the Return? Assessing the Effect of “Pay-for-Performance” Initiatives on the Quality of Care Delivery. Medical Care Research and Review 63(1 Suppl):29S–48S.
Heath C, Larrick RP, Wu G. (1999) Goals as Reference Points. Cognitive Psychology 38:79–109.
Heffler S, Smith S, Keehan S, Borger C, Clemens MK, Truffer C. (2005) Trends: U.S. Health Spending Projections for 2004–2014: What Do They Portend For The Federal Growth Initiative? Health Affairs 24(2):465–472.
Holmstrom B, Milgrom P. (1991) Multitask Principal-Agent Analyses: Incentive Contracts, Asset Ownership, and Job Design. Journal of Law, Economics, and Organization 7:24–52.
Institute of Medicine. (2006) Rewarding Provider Performance: Aligning Incentives in Medicare. Washington, DC: National Academy Press.
Institute of Medicine. (2001) Crossing the Quality Chasm: A New Health System for the 21st Century. Washington, DC: National Academy Press.
Jha AK, Li Z, Orav EJ, Epstein AM. (2007) Where Do Elderly Blacks Receive Hospital Care? The Concentration and Quality of Hospitals That Care for Elderly Black Americans. Archives of Internal Medicine 167:1177–1182.
Kahneman D, Knetsch JL, Thaler R. (1986) Fairness as a Constraint on Profit Seeking: Entitlements in the Market. American Economic Review 76.
Kahneman D, Tversky A. (1979) Prospect Theory: An Analysis of Decision Under Risk. Econometrica 47(2):263–292.
Kivetz R, Urminsky O, Zheng Y. (2006) The Goal-Gradient Hypothesis Resurrected: Purchase Acceleration, Illusionary Goal Progress, and Customer Retention. Journal of Marketing Research 43(1):39–58.
Leapfrog Group. (2007) Incentives and Rewards Compendium. As of July 30, 2007: http://ir.leapfroggroup.org/compendium/
Lindenauer PK, Remus D, Roman S, Rothberg MB, Benjamin EM, Ma A, Bratzler DW. (2007) Public Reporting and Pay for Performance in Hospital Quality Improvement. New England Journal of Medicine 356(5):486–496.
Loewenstein G, Prelec D. (1992) Anomalies in Intertemporal Choice: Evidence and an Interpretation. The Quarterly Journal of Economics 573–597.
Lowenstein R. (2001 Feb 11) Exuberance Is Rational. New York Times Magazine.
McClellan MB. (2006 Feb 7) Presentation given at National Pay for Performance Summit, Los Angeles, CA.
McGlynn EA, Asch SM, Adams J, Keesey J, Hicks J, DeChristofaro A, Kerr EA. (2003) The Quality of Health Care Delivered to Adults in the United States,” New England Journal of Medicine 348(26):2635–2645.
McNeil BJ, Pauker SG, Sox HC, Tversky A. (1982) On the Elicitation of Preferences for Alternative Therapies. New England Journal of Medicine 306(21):1259–1262.
Medicare Payment Advisory Commission (MedPAC). (March 2005) Report to the Congress: Medicare Payment Policy. Washington, DC: MedPAC.
Med-Vantage. (2006) Provider Pay-for-Performance Incentive Programs: 2005 National Study Results. San Francisco, CA: Med-Vantage, Inc.
Mehrotra A, Pearson SD, Coltin KL, Kleinman KP, Singer JA, Rabson B, Schneider EC. (2007) The Response of Physician Groups to P4P Incentives. American Journal of Managed Care 13(5):249–255.
Meyerowitz BE, Chaiken S. (1987) The Effect of Message Framing on Breast Self-Examination Attitudes, Intentions, and Behavior. Journal of Personality and Social Psychology 52(3):500–510.
Nahra TA, Reiter KL, Hirth RA, Shermer JE, Wheeler JRC. (2006) Cost-Effectiveness of Hospital Pay-for-Performance Incentives. Medical Care Research and Review 63(1 Suppl):49S–72S.
Peterson ED, Roe MT, Mulgund J, et al. (2006) Association Between Hospital Process Performance and Outcomes Among Patients with Acute Coronary Syndromes. Journal of the American Medical Association 295(16):1912–1920.
Pham, HH, Coughlan, O’Malley AS. (2006). The Impact of Quality-Reporting Programs on Hospital Operations. Health Affairs. 25(5): 1412-1422.
Premier, Inc. (2006) Centers for Medicare and Medicaid Services (CMS)/Premier Hospital Quality Incentive Demonstration Project: Project Overview and Findings from Year One. Charlotte, NC: Author.
Reiter KL, Nahra TA, Wheeler JRC. (2006) Hospital Responses to Pay-for-Performance Incentives. Health Services Management Research 19(2):123–134.
Rosenthal MB, Frank RG, Li Z, Epstein AM. (2005) Early Experience with Pay-for-Performance: From Concept to Practice. Journal of the American Medical Association 294(14):1788–1793.
Rothe H. (1970). Output Rates Among Welders: Productivity and Consistency Following Removal of a Financial Incentive System. Journal of Applied Psychology 54:549–551.
Sauter KM, Bokhour BG, White B, Young G, Burgess JF, Berlowitz D, Wheeler JRC. (2007). Early Experiences of a Hospital-based Pay-for-Performance Program. Journal of Healthcare Management 52(2):95–108.
Schuster MA, McGlynn EA, Brook RH. (1998) How Good Is the Quality of Health Care in the United States? Milbank Quarterly 76(4):517–563.
Shekelle P. (2007 Apr 4) Medicare’s Hospital Compare Performance Measures and Mortality Rates. Journal of the American Medical Association 297(13):1430–1431; author reply 1431.
Skinner J, Chandra A, Staiger D, Lee J, McClellan M. (2005) Mortality After Acute Myocardial Infarction in Hospitals That Disproportionately Treat Black Patients. Circulation 112:2634–2641.
Sorbero ME, Damberg CL, Shaw R, et al. (2006) Assessment of Pay-for-Performance Options for Medicare Physician Services: Final Report. RAND Working Paper prepared for the Assistant Secretary for Planning and Evaluation, U.S. Department of Health and Human Services. Santa Monica, CA: RAND.
Thaler R. (1985) Mental Accounting and Consumer Choice. Marketing Science 4(3):199–214.
Thompson RE. (2005) Is Pay for Performance Ethical? Physician Executive 31(6):60–62.
Titmuss RM. (1970) The Gift Relationship: From Human Blood to Social Policy. New York, NY: Allen & Unwin.
Ubel PA, Hirth RA, Chernew ME, Fendrick AM. (2003) What Is the Price of Life and Why Doesn’t It Increase at the Rate of Inflation? Archives of Internal Medicine 163(14):1637–1641.
Wenger NS, Solomon DH, Roth CP, MacLean CH, Saliba D, et al. (2003) The Quality of Medical Care Provided to Vulnerable Community-Dwelling Older Patients. Annals of Internal Medicine 139(9):740–747.
Werner RM, Bradlow ET. (2006) Relationship Between Medicare’s Hospital Compare Performance Measures and Mortality Rates. Journal of the American Medical Association 296(22):2694–2702.
Williams SC, Schmaltz SP, Morton DJ, Koss RG, Loeb JM. (2005) Quality of Care in U.S. Hospitals as Reflected by Standardized Measures, 2002–2004. New England Journal of Medicine 353(3):255–264.
If you are interested in this, or any other ASPE product, please contact the Policy Information Center at (202) 690-6445. Or you may email us at pic@hhs.gov
Home Pages:
Health Policy
Assistant Secretary for Planning and Evaluation
(ASPE)
U.S. Department of Health and Human Services
(HHS)
Last updated: 11/01/06