[This Transcript is Unedited]

Department of Health and Human Services

National Committee on Vital and Health Statistics

Workgroup on Quality

Use of Administrative and Clinical Electronic Data for Quality Assessment Hearing

June 19, 2007

Hubert H. Humphrey Building
Room 505A
200 Independence Avenue, SW
Washington, D.C. 20201

Proceedings By:
CASET Associates, Ltd.
10201 Lee Highway, Suite 180
Fairfax, Virginia 22030
(703) 352-0091

TABLE OF CONTENTS


P R O C E E D I N G S [8:30am]

Agenda Item: Introductions and Purpose

DR. CARR: I think we are ready to begin. Good

morning. This is a hearing of the Quality Workgroup of the National Committee of Vital and Health Statistics. I am Dr. Justine Carr, from Beth Israel Deaconess Medical Center, chair of this committee and member of NCVHS. I have no conflicts on what is being discussed today. I will ask as we go to the right each person to introduce themselves.

[Introductions around the room.]

DR. CARR: Welcome everyone, and thank you for being here. I know we have a couple of workgroup members who are splitting their time with the Privacy Committee and we have another workgroup member in a cab arriving momentarily. I think we want to make sure that everyone has their time that is due them today. So, with that, we would like to stay on time. I actually had pre-introductory slide to help frame the day. As we know, there is strong momentum from many sectors to measure and report quality of healthcare. As we will hear more about today, the burden of collection and reporting is high. With the goal of electronic health records by 2014, the hope is that some of the burden will be lifted. Our focus today is discussion on how we are functioning today in the hybrid era where quality reporting is both derived from administrative and electronic sources. Our key question is how are we doing, and how is care improving? The questions for the speakers are briefly: one, describe your initiative, but then what data did you select and why? What resources were required? How did you ascertain data reliability?

A second question, and the most important, how did you use your data? What interventions were triggered, and how did it affect the quality of care? So in other words, as we hear about the burden of collection and the various configurations of data collection, we do not want to lose sight of the fact that how we get there that the there is quality of care. We will be very interested in lessons learned. What works and what things might inform the configuration of the electronic health record going forward?

This one you just have to keep tapping. This is my picture of measure and quality in the hybrid world. Up in red there is safe, effective care. As you know, we create and define measures. We collect the data and put it in electronic format. As you can see, we sometimes abstract from paper records, abstract even from electronic records. We depend on administrative data and we are adding and will hear today about how adding some electronic elements such as lab or medications has helped us.

The goal and the data can be aggregated and reported back and acted upon with a final goal of improving safe and effective care. I think we are well aware of the fact that there is a lot of activity going on about aggregating data and the electronic health record. I just wanted to focus the fact that we want to talk about today. What is getting better? What is the burden of collecting this data? What is the return once we have this data?

So, I want to give special thanks to Marybeth Farquhar from AHRQ and Cynthia Sydney who were indispensable in getting all of this organized, and of course the members of the Quality Workgroup for their insight and recommendations. So, again, as a reminder please ask the speakers to keep to twenty minute presentations and leave ten minutes for discussion. I am also reminded to ask you to keep your Blackberries away from your speakers. If you do not, you will find out what happens. I would also like to welcome two additional members. Carol, do you have any additional comments? And then Simon?

MS. MC CALL: My name is Carol McCall. I am a member of the Quality Workgroup as well as a member of the NCVHS full committee. I just want to thank you for taking the time to be with us today. We are very excited to hear the stories. As Justine has said, it is not about the data. It is about how you use it. So, we are anxious to hear the stories about what works, what you have been able to achieve that can inform some of the policies and processes as we move forward. So, thank you.

DR. COHN: I am Simon Cohn and I chair the full committee. I am here as a guest of the workgroup today. I will be here for most of the day.

DR. CARR: Okay, well I think we are even a little ahead of schedule, but I would like to invite Crystal and David and Allison to move forward then with their framing of the testimony that we will hear today.

Agenda Item: Framing the Testimony - AHIMA

MS. VIOLA: Dr. Carr, members of the Quality Workgroup, and ladies and gentlemen, good morning. I am Allison Viola, Director of Federal Relations at American Health Information Management Association. Joining me this morning is Crystal Kallem, Director of Practice Leadership at AHIMA and Dave Gans, Vice President of Practice Management Resources at the Medical Group and Management Association MGMA. They will be providing detailed testimony regarding the issues surrounding healthcare data, collection, and reporting. On behalf of AHIMA and MGMA and its members, thank you for allowing us this opportunity to provide input on the issues and challenges associated with collecting and reporting healthcare data. Although we have developed written testimony, and you should have this documentation with your handouts, I would like to turn the discussion over to Crystal where she will delve a little bit deeper into the issues to provide a more practical overview of the challenges faced by increased quality measurement and reporting initiatives.

MS. KALLEM: Thank you, Allison. Thank you, Dr. Carr for inviting us. AHIMA and MGMA are very aware of the current environment related to healthcare quality. A large number of our members are managing the data collection and recording responsibilities within healthcare facilities on a daily basis and continue to express concerns surrounding the ever-mounting requests for data. As a result, we begin to see the need to highlight this critical issue on a broad scale. So, AHIMA and MGMA formed a partnership and approached the Agency for Healthcare Research and Quality with a proposal to gather key stakeholders from the industry to help us identify solutions and direct change. We greatly appreciate the decision by AHRQ to fund both an invitational conference and a task force of our members to develop supporting conference materials. The taskforce was composed of a group of health information and office management experts who helped identify the issues and the variations associated with performance measurement, data collection, and recording.

The findings from the taskforce address the impacts of healthcare providers and organizations forced to respond to the ever increasing reporting requirements, including the lack of uniform data collection and analytic specifications, the lack of qualified staff to support the requirements for data, technological challenges, organizational challenges, economic pressures, and other competing priorities.

These findings laid the foundation for productive dialogue during the conference. Although the taskforce did not have enough time to quantify the specific costs of the current obligations on providers at the time we created the report, we were able to categorize and describe the scope of these issues that contribute to the increased costs and demands. This invitational conference was held last November. We brought together over 50 experts from public and private healthcare organizations to address how best to collect and report data for quality, public health, and performance initiatives. The participants represented a wide array of stakeholders, including hospital and physician organizations, payers, employers, government agencies, accrediting agencies, and other stakeholders with performance measurement and data management background and expertise. The briefing paper developed by the taskforce and the full conference report can be obtained at the link provided on this slide.

So, as the industry moves forward there are a large number of issues and challenges related to this topic. With the widespread adoption of electronic health records, interoperability, and paper performance programs, the need to align these initiatives is becoming vital. Dr. George Ishim from Health Partners allowed us the opportunity to share this slide with you that provides a visual depiction of the various demands on healthcare organizations and providers as they deal with the increasing and disparate requests for data.

At the same time, providers continue to struggle with staffing shortages, tighter reimbursements, and pressures to accomplish more with less making their ability to meet these various requirements and increasing concerns. Not only are there a large number of organizations demanding data, but each requestor of data has its own set of requirements and specifications to comply with. One quality measure could have varying specifications among two or more requesting organizations. A healthcare provider must dispense each measure's numerator and denominator statements, data elements and abstraction specifications, allowable data sources, date of submission deadlines, analytic specifications, and the list goes on. So, you can have one measure – for instance, diabetes hemoglobin A1B -- that is requested from five different performance measurement requestors and they could all have different specifications.

These issues are present in both the paper and electronic environments. In an electronic environment, providers must map the data from their existing systems to the various performance measurement data requirements in an effort to obtain appropriate high quality data. Not only does this data need to be mined from their electronic systems, but it has to be formatted to comply with each requestor's data submission requirements.

In preparing for today's discussion, I wanted to provide some specific details regarding how the data collection process works. Hackensack University Medical Center allowed me the opportunity to share with you a copy of their manual data collection workflow diagram. This diagram depicts the actual process used by the organization to manually collect and report data from the CMS and joint commission measures. In addition, this organization voluntarily participates in a CMS premier project as well. Thankfully, in this particular example, CMS and the joint commission measures are more closely aligned than some of the other performance measurement initiatives. This flow depicts four topic areas that are being monitored: heart failure, acute myocardial infarction, community acquired pneumonia, and surgical infection prevention.

Each quarter, the organization identifies the sample of cases that are needed to be abstracted for each topic. Some topics require 100 percent of the cases be abstracted while others allow for sampling. Even for sampling requirements, different organizations could require different mechanisms for sampling of the data. This step alone can be a confusing and complicating process.

After the population is identified, patient lists are created so that the medical records can be located and pulled for abstraction. On average, it takes approximately 27 to 43 hours per month, per topic to extract charts in preparation for data extraction. Then it takes approximately two weeks after the end of the discharged month, charts are ready for data abstraction. It takes just over three weeks to complete the actual data abstraction requirements. It does not end there. After the charts are abstracted, the data are grouped for data submission. Following data submission, variances, and data errors are identified and corrected and then resubmitted.

Additional requirements are depicted on this particular flow chart as well. Some of these additional steps required include the validation activities. Periodically CMS identifies a sample of cases that are requested for data validation. Hospitals are required to submit paper copies of their medical records to the CMS central data abstraction center who then re-abstract that data to identify whether or not the data was abstracted correctly. All of these steps contribute to the process of this manual data collection activity.

In September of 2004, AHIMA testified before this very workgroup regarding the challenges associated with quality measurement data collection reporting. Barbara Seagle, Director of Health Information at Hackensack University Medical Center and a member of AHIMA testified to this workgroup about current organizations' experiences with voluntary and mandated reporting requirements. Three years later, the issues remain the same and the demands for data continue to increase. Hackensack University Medical Center allowed me the opportunity to share with you some of the numbers from 2004 compared to 2007.

Barbara reported in 2004 that the number of cases required for manual data abstraction were 500 cases per month. Since then, this number has increased to over 650 cases per month that they abstract for the data reporting requirements. The number of full time staff required to collect and report the data has increased by two full-time equivalent staff. Barbara's department is in the process of requesting an additional FTE to support the increasing performance measurement demands. Fortunately, Barbara has managed to maintain her highly qualified and trained staff throughout the years, but her salaries are now averaging $42.00 an hour, which is an increase of $10.00 since 2004. In all, Barbara's organization has experienced the 72 percent increase in the financial resources required to collect and report these demands in data elements.

In preparation for today's discussion, I reviewed the data collection specifications for four of the hospital quality heart failure measures. There are a total of 34 data elements required to calculate the results for four heart failure measures. I identified 13 key clinical elements for each of these four measures. There are 24 different data sources from which you are allowed to abstract data within a medical record. When manually abstracting these data, data abstractors must have a clear understanding of the data element's specifications to know where they can pull information and when. This does not include all of the corresponding inclusion and exclusionary criteria that the abstractors must also be familiar with including all of the synonyms and varying terms that must be knowledgeable about as well. In a small hospital, one abstractor may be abstracting data for all measures for all topics and will have to be familiar with all of the requirements for all of the data elements. You can see that there is a potential for error in this situation.

The current environment does not make electronic retrieval of data any easier. In an electronic environment, healthcare providers must identify how the data are stored in their electronic systems and map the data according to the corresponding data abstraction guidelines and variables. Without broadly agreed upon standards for defining data content, variations and taxonomy of terms among performance measurement systems are difficult to interpret and often require costly and laboring data mapping activities to link and extract data from these electronic systems.

I hope that some of these examples have been helpful as you prepare for today's discussion. I will now turn the presentation over to Dave Gans who will describe the remaining challenges in our recommendations. The information that I have provided so far has been primarily focused on a hospital setting. Although hospitals are the focus of today's meeting, Dave will highlight a few of the key challenges faced by physician practices and hopes that this committee will consider addressing these challenges during a future meeting.

MR. GANS: Thank you, Crystal. Our meeting in last November examined the needs of the healthcare data collection and reporting. One important element was to examine what happens if we were to look at the clinical data or as many health care payers had looked at administrative information predominantly from the billing record. What we know from looking at clinical information, as Crystal described, is that it oftentimes has data not available on the clinical insurance claim form. It has the laboratory test results. It has imaging results. It has the nursing notes. It has diagnostic information that may or may not be replicated on the billing form. Also, while this information is extremely rich, in many cases it must be manually obtained from the medical record. Even in an automated environment, that still requires personal intervention to acquire information. Contrasting with that would ease a collection of administrative information because this is already intact in the billing record. It lends itself to be very attractive for many payers and others to look at quality data. However, there are problems and in so far as that there may or may not be uniform coding rules. There may or not be uniform conventions, guidelines, and definitions and in fact brought out during discussions at our meeting was the fact that different payers have different standards, different definitions for the same topics. They use same terms to mean different activities, so there is a definite need that you will see in a recommendation for standardization in the data collection and reporting mechanism.

Also, information in the billing record is not necessarily complete. For example, examining these prescription drugs by patients. If the patient fills a prescription at the Veterans Administration, that will typically not occur. That will not be shown in the payer record – at a military treatmen facility because this is a military dependent, a beneficiary retiree – that will not show. If the patient chooses to purchase their drugs utilizing a generic drug discount at a store such as Wal-Mart, it will not appear in the billing record. Also, it will include patients who may refuse therapy. Discussions with an MGMA member, Cardiovasculare Associates in Clearwater, Florida described the problems that they have experienced with patients who may be in a protocol that would normally require a blood-thinning therapy, but the patient has contraindications to blood thinners. Consequently the medical record will show the contraindication, will show the need for an alternative therapy. However, the billing record will show a deficiency. So, the billing record has opportunities, but also has substantial disadvantages in examining quality care information because it is oftentimes incomplete.

What we know is that there is substantial variations in performance measurement systems and reporting standards. While the intentions are well intended from each of the payers for each of the data and quality collecting organizations, the many different standards and many different organizations have caused many of these problems to occur – often has required formats, you can see the problems for updating performance measures are not streamlined, are not standardized. Also there are oftentimes updates in performance measures required by a payer and certain payers utilize so-called "black box" systems, where the actual protocol may be hidden behind a computer algorithm known only to the payer. It cannot be replicated by the providers for reporting data. SO you only know the results of the black box edit; you do not know the criteria.

All of this goes on at a time of increased economic pressures on physicians and hospitals. We have increased higher costs of doing business. Medical Group Action Association has a long series of surveys examining the economic costs and efficiencies of medical group practice. In the past year, we have observed increases in cost of 6-8 percent among physician practices even with the lessening of pressures due to malpractice insurance expenses.

At the same time, we are seeing declining reimbursement. Medicare currently pays physicians at the same rate as in 1999. We are potentially looking at substantial decreases in Medicare payment in 2008. At the same time we have an expectation. In fact in many cases a mandate, electronic prescribing systems have been shown to be substantial benefits in quality of care. The use of electronic health records has potential to improve quality carriers(?). However the cost of implementing these systems are borne by the doctor without subsidy. Oftentimes with increased inefficiencies, not necessarily increased efficiencies, physicians and hospitals and all providers have an expectation to do more with less.

Examining the physician office environment, first thing we observe is that physician and most doctors in the United States are in solo practice and small medical groups, especially among primary care physicians. These organizations are relatively unsophisticated. We are talking small business, and many cases we are talking family business. These organizations do not have electronic health record. One in seven medical groups have the electronic health record based on a study that the Medical Group Management Association set up a research conducted under AHRQ funding two years ago. We are observing that there are increased interests in electronic health records, but still among small practices and especially among solo physicians they live in a manual environment.

Also, very few medical group practices have a certified coder on staff. Certified coders are very expensive. There are un-reimbursable expenses, and they are not necessary in that small primary physician practice. These same small practices do not have a chief information office. They typically will not have an information staff which makes even more complex the use of electronic health records or obtain the information for that medical record. As that comment regarding to sophistication, even in a moderate-sized medical group where you have a medical record supervisor, this is a staff supervisor, not necessarily a trained medical records expert. There are consequently data extraction oftentimes used for research purposes, for clinical, our device research is often times designed by the practice administrator or the physicians because the lack of sophistication and trained staff in medical record extraction.

Even in electronic health record environments, software is often times maintained by the electronic health record company, not by the practice. Consequently access to the databases of the electronic health record is maintained by your EHR company even though it may be held physically on servers in the practice, and the practice may not even have access to that information without the permission and authority and codes passed on to it by electronic health record company.

As I mentioned, looking at the cost of providing health care services today and the relatively low level for reimbursement that medical practices oftentimes examine cost first as long as it does the minimal accepted function. We will look at maximizing functionality especially when it comes to measuring quality because it is un-reimbursed activity for the most part today. Practices are so concerned on minimizing costs and increasing efficiency.

Our taskforce at our conference last year had a series of recommendations. These recommendations incurred at the conclusion of our November 11th meeting and were codified over the next two months in a series of communications with our attendees. We had three major recommendations. The first was to form public/private entity to oversee and evaluate policies and procedures for the collection and reporting of healthcare data performance measurement information.

We also had recommendations provide funding to support research on the quality of data and also to provide funding to support additional research on the cost associated with performance measurement data collection. I was most pleased that the agency helped research and quality a step forward to further gather information in these three areas. These recommendations occurred last November. On the 4th of June, AHRQ issued an RFI to examine, under the health data stewardship title, which I saw a copy of the Federal Register announcement in the handouts. The Healthcare Stewardship is going to allow to gather information to foster broad stakeholder discussions on the topics. While outstanding in its data collection, the same RFI notes that there is currently no intentions or plans to issue a related request for a proposal.

MGMA and AHIMA encourages this committee to give a recommendation that such an RFI be issued following an appropriate time to understand the dimensions of healthcare data collection and in creation of this public private entity. Also, on June 6th AHRQ issued task orders. They are going to examine and de-fund examining the cost of acquiring healthcare data in primary care practices. We are extremely pleased to see an affirmative action in this area and wait with great interest to see the results.

We notice several opportunities for action in our report. We feel first that this opportunity for the public private entity provide the policies for healthcare data measurements. We need an organization that is impartial, that can create core data content standards as prerequisite for reliable and consistent data collection and reporting. We feel it is extraordinarily important to assist providers in their data collection efforts to standardize performance measurement systems. Also, we feel it is extremely important to continue the collaboration among critical stakeholders in healthcare data performance.

This entity needs to be empowered. It also needs to be held accountable to collect and prioritize input from key stakeholders, to facilitate the process to obtain regular input regarding measurement standards, to develop a plan for short, mid, and long term goals and tactics, to reach a national consensus on a starter set, a basic set of uniform data that measure healthcare quality and performance. Also, to coordinate health information exchange and quality initiatives at all levels, national, state, and local for both data integrity and response from use of information. Last, to conduct all business in a very public and transparent manner.

Crystal and I anticipate questions, and we welcome them. Thank you so much.

DR. CARR: Thank you, that was very thorough and

powerful and a sobering presentation. I appreciate the time you took on it. It was of particular interest to get the update on Hackensack Hospital, because as you say they were here a few years ago. They enlightened us at that time on the burden.

My question is what your thoughts are on how we get to quality. I think there are two approaches. When we are starting in the clinical setting, and we read the literature, and we know what is the evidence based best practice. Then we have the challenge of finding a way to get that data. So, one is just combing through records. We also use administrative data, which is easy. It helps us get to categories of patients. There is a danger in sort of measuring what is available.

So, I see this dichotomy of measure what is available versus take the data on quality and find a way to measure it. I am thinking in particular about where we will go. We will hear from Kelly this afternoon, but should we be thinking that some day the electronic health record will be able to remove this burden and give us everything we need, or will we always have a need to be in a hybrid state for the issues that you described about a particular individual that makes them the exception.

MR. GANS: I will make one comment and then let

Crystal conclude. Looking at the electronic health record, that may give us a substantial improvement in the collection of quality data. Discussions with MGMA members who have electronic health records, developing the query can take as long as three weeks to understand the multiple databases that the EHR maintains to understand how to extract the data from the electronic health record. Once the queries are written, as long as they remain standard, they can be repeated with relatively little cost to the organization in efficiency or time. The major problem that organizations have had is the lack of standardization in developing the queries. As I said, looking at the very sophisticated practice, because of the complexity of the electronic health record and the multiple databases, which will maintain information, it can take a substantial time to write the necessary queries to gain the information you need. Once they are written, they can be maintained. So, consequently having an electronic health record and having a standardized set of guidelines for clinical data measurement and data collection, we have an opportunity to make data collection comprehensive and efficient. When the guidelines change or when different group providers or different organizations measure quality have different standards, the cost of acquiring the information is so excessive that even with an EHR it is going to be difficult, too complex.

MS. KALLEM: I concur with what Davis has

mentioned. I would like to further extend upon that. The use of electronic health records will be very beneficial when providers adopt them. There is that level of risk associated with adopting and implementing these electronic products. In turn, EHR vendors need direction when it comes to the incorporation of the performance measurement requirements into their products. They need to have clear guidance on how the data should be collected within their products and how the data should be exported from their products. There are some initiatives going on within the industry to start tackling those issues. We need to continue to move those efforts forward, and then that information needs to be standardized.

So, measure developers need to develop their measures in the standardized format so that the vendors can then take that information and incorporate it into their EHR products. In addition, there is a need to standardize the data itself, the clinical data within the EHR products. We need to identify the key clinical elements and then formalize and standardized the clinical data that needs to be captured within the EHR products and also share that information with the vendors.

MS. MC CALL: I have a question. I would like to

build on the comment you just made. I would preface it by a statement saying that I completely agree with the need for two things. One is to look at the design of measurement itself and the fact that there seems to be a lot of variation in that. There is also the variation and the lack of standardization within the data. Those are very different things. If you think about the analogy of putting together a meal, you have ingredients for your recipe and then you have the actual act of putting it together and then you have the joy of eating it. We will call that quality. My question is not about EHRs. My bias is that I think we have talked too much about the record, and the record just provides the wrong ingredients, but what we seek is a delightful meal. How much variation, when you go out and talk to people, how much natural variation do you think there will be in the measures themselves? Obviously, we want to first try to standardize, and then the standardized new studied act approach. How much natural knowledge creation should we anticipate as we build entities to design measures?

MS. KALLEM: I have been spending quite a bit of

time actually evaluating the different performance measures to assess what variations exist between those different measures. There are a lot of similarities. A lot of the differences are with regard to the specific ways that the data should be collected and reported. The actual overall measure itself has the same goal in mind. The challenges that I found is actually in locating some of the specifications for the measures and identifying. They provide such a high level overview of how the information should be collected. There are specific requirements that it is difficult to know how I would collect that information within an office. I cannot imagine small physician practices having to jump in and try to collect information and report that information when they have to spend hours trying to find the specifications required. Having an overarching entity to help direct that and even provide central locations for where information about the measures should be stored and can be gathered would be useful. I am not sure that has appropriately addressed your question, but those I have known from my experience.

DR. GREEN: I would like to ask you a question

about something I am going to call the Charlie Effect. The Charlie Effect goes like this, you have a group of providers who want to have improved quality in their care related to behaviors and unhealthy behaviors like smoking cigarettes, being physically inert, and eating unhealthy diets. They work with a large electronic health vendor and had a delightful design how to create prompts in the record and to drop out of those prompts measures of whether or not they get to their goals, the measures, of where they want to arrive at. The vendor decides that it is just too complicated, and it cannot be done for at least two years. They go 112 miles away to where there is a guy named Charlie who is the informatics program person for a hospital-based information health system, and 25 minutes later it is over. They have got it. They are ready to rock and roll and follow under the implement. What is your opinion about the size of the impediment that relates to simply not having the capacity on site in the hospital or in the practice to do a little bit of programming?

MR. GANS: I will make the comment. I concur exactly with your comment on the Charlie Effect. In interviews with practices, I talk to organizations that hack their own electronic health record to get quality information out because they are not authorized by their vendor to have that full access and unlimited access to their own data.

What they will do is they will find that individual who has great programming skills, buy their time, and to be able to implement and obtain information that they need for their own information. Also what happens is that as the practices become more sophisticated, as you move from a manual paper record, you exchange the staff for no longer having file clerks to have information system staff members in the practice. Those information staff members are invaluable.

We have examined organizations that have implemented electronic health records. The ones that have the greatest success, one of their key factors has been having IS staff in-house that work for the practice, work for the doctors, and consequently are available to meet the standards of the physicians in that organization. I concur exactly that IS staff is essential. However, it is a problem that they are expensive. They are not a reimbursable cost to the practice. They individually may add very little to the patient environment, but they are absolutely essential to updating information on electronic health record.

DR. CARR: I want to be cognizant about time.

Crystal, did you want to add a comment? Okay, and Carol briefly?

MS. MC CALL: I had one more question. The question is, in your discussions with hospitals and physician groups, what do they say regarding the emerging world of personalized medicine and whether or not they believe or you believe that that will materially change the burden or if the burden will essentially remain the same?

MR. GANS: My contacts have been very concerned about the personalized health records because of the difficulty of getting information into that personal health record is – I presume that is where you are going?

MS. MC CALL: It was really around personalized medicine and what is happening with certain types of tests.

MR. GANS: First of all the medicines and testing to patients to identify appropriate regime customized to their body genome -- that has been thought that if the patients will agree to be the next goal of medication is to personalize the treatment to the specific genome of the patient.

DR. CARR: Let us hold a little bit more on that and just continue. Actually Bill Scanlon has a question. I would like to invite if there is anyone in the back who would like to ask a question just to come up.

DR. SCANLON: I wanted to ask about this public private entity to build a consensus because I think I feel this is a little déjà vu. This was 1995 you might be talking about through the promise of administrative simplification for standardization and write an act to HIPAA. I think some people would argue we have not necessarily realized all of the promises of HIPAA. I guess I am a little worried that we need to identify what the teeth part needs to be that this entity has in order to make that consensus really be effective. A consensus may be a necessary condition but might not be sufficient. How do we get everyone to participate or play in this process? You are actually speaking from a provider's perspective of how do we get everyone to play? I think we are going to have to at some point listen to what the payers have to say about this because it is part of why HIPAA has not been fully realized as they had a different perspective of what they need and want.

DR. CARR: Thank you very much. Now, I invite

Denise Remus to speak with us -- Denise?

Agenda Item: Performance Measurement and Quality Improvement – Bay Care Health System

DR. REMUS: Good morning. I appreciate the opportunity to speak with you this morning. I actually did not create a presentation. I have some written notes that I will be speaking from and actually taking a prerogative to try to expand on some of the comments made by some of the speakers earlier this morning.

I am the current Chief Quality Office of BayCare Health System. I am a registered nurse with my doctorate in nursing and have spent the last 15 years of my career focused on quality measurement and quality improvement using data from a variety of sources, primarily a hybrid of clinical and administrative data. I have worked with hundreds of hospitals across the country. Prior to taking the BayCare CQO position a few months ago, I was Vice President of Clinical Informatics, Premier, where I was responsible for the analytics and methodologies underlying their performance measurement products and oversaw the analyses conducted for the CMS/Premier Hospital Quality Incentive Demonstration Project which evaluated the impact of pay for performance on quality improvement. You will hear more about that from my colleague, Dr. Wynn. I conducted a research referred to as a Performance Pay Study, which identified the relationship between reliable care and improved outcomes.

BayCare is a nine-hospital system located in Florida, specifically the Tampa/St. Petersburg area. It was formed ten years ago as a larger health system made up of three independent health systems. In 2006, our hospitals had 2,707 beds; 121,700 inpatient discharges; nearly 49,000 outpatient surgeries; over 350,000 emergency room visits; and over 500,000 home health visits. We have 17,000 team members. We are the largest private employer in the Tampa Bay area. While the majority of my comments are based on my professional experience since I have only been in the CQO role for last two months, they will be framed within the BayCare base.

BayCare's experience in quality measurement and improvement, BayCare is committed to clinical excellence and quality improvement. Several years ago BayCare created a clinical outcomes warehouse comprised of all patient administrative, clinical, and financial data. The administrative data elements are pulled from internal systems such as TSI and Envision. We maintain patient data regardless of whether there is a bill generated. We derive it straight from our administrative systems that serves as a secondary billing source. The clinical data are based on national definitions and abstracted by more than a dozen dedicated team members based at the hospitals. Actually, in thinking about the Hackensack story I realize that we probably have about 16 FTEs across the system that are dedicated full-time to just clinical record extraction. These individuals support the clinical data needs associated with all regulatory and accreditation requirements including the Joint Commission Core Measures, Joint Commission Centers of Excellence, CMS Annual Payment Update, Quality Alliance, and the State of Florida reporting requirements. They also abstract data for other measurement programs including the Society of Thoracic Surgeons, American College of Cardiology, American Heart Association "Get With the Guidelines," and American Nurses Association's Magnet program as well as clinical trials and clinical research and of course additional registry data. BayCare is an active participant in the CMS/Premier Hospital Quality Incentive Demonstration Project, which evaluates the impact of financial incentives on quality improvement.

The clinical outcomes warehouse provides critical information to support our internal quality improvement efforts and operations. We use administrative data to evaluate the outcomes – that is the primary source of looking at mortality, readmissions and complications and we support the Agency for Healthcare Research and Quality Indicators.

There are many challenges associated with maintaining this clinical data system. The billing data that we use, the administrative data that we pull from is dynamic, with updates and modifications occurring frequently based on ICD-9 coding and processes, time lags, different decisions made on reevaluation of the records, changes in charges, and payments. Any data pulled, even for our own internal purposes, represents a snapshot in time. We do not have an electronic health record system yet and are dependent on medical record abstraction for all clinical data. It is difficult to abstract data during the care delivery process due to management of a paper record and the operational definitions of measures. In fact even after the patient is discharged I frequently hear from the abstractors we cannot find the records. It is somewhere – we are trying to find it within our time frame. Getting all of the components of the record together remains a challenge.

The operational definitions are also difficult. When the patient is actually in house it is a challenge to understand if they are going to end up in that clinical population or not, what are the definitions. The clinicians are often saying, just tell me where the patient going to fall(?)? So it is the dynamic balance between evidence-based care and what we are actually trying to measure.

Most abstraction is done after the patient is discharged and the clinical documentation has been coded into the HIM system and transferred into ICD-9 codes. Inter-rater reliability of abstractors is continued concern. There are many challenges in interpreting the national definitions and applying the algorithms to patient records within which clinical documentation rarely follows an ideal path.

We have nine hospitals; each of them has a standard form and structure of a medical record. But any time that I go on a unit and pull a chart you will find there is variation in how the information is compiled, what is documented where. So the abstractors and their training and education remain key to help them understand where to look un a medical record and how to interpret the different components, to follow the national guidelines and algorithms.

We have case managers and quality team members who review patient lists, records and during rounds when the patient is actually in the hospital in an attempt to proactively identify patients who will fall into clinical populations such as Heart Failure and AMI to ensure delivery of evidence-based care in a timely manner. We utilize standardized order sets of reminders. We have medication reconciliation forms. We still have a challenge insuring that all of the right care is delivered at the right time.

We have extensive educational programs and we maintain a dedicated team of educators regionally as well as within each of the hospitals to try to focus on understanding what evidence-based, high quality care means. Performance on evidence-based measures are incorporated into our organization's Key Performance Indicators and Quality and Safety Plan goals which are reported to senior management and our Board of Trustees on a regular basis. We do incorporate our performance metrics into our management performance evaluation. So compensation for management is tied to a component of these metrics. However our performance on evidence-based measures, while very good, is not as high as we would like it to be. One of the things that we recognized early on was the comparative data from our hospital quality demonstration is that we did very well in the first year of the project and actually earned incentive payments in several of the areas. We were not able to sustain that improvement, and as we stepped back and looked at the processes we found too many of them were dependent on a case manager dogging(?) the patient for the system, reminding the physicians – we have not established reliable systems of care. That is because of a huge focus for what we are doing currently.

We recognize the need to approach our improvement efforts in a methodological manner. So two years ago, BayCare implemented Six Sigma across our organization: we have seven permanent black belts and master black belts that are employed, 20 black belts in training, and have provided Green Belt training or are in the process, to all Directors. The goal of the current CEO is to ensure that all of our team members understand we need to be a data driven organization and that we need to form our decision making with the best information possible. We have conducted over 115 Six Sigma projects in the last two years that have focused on patient satisfaction, throughput, clinical quality, and financial/operational processes. Last year the projects identified more than $7 million in forecasted savings.

We continue to expand the use of this methodology, and as I mentioned, trying to make sure we focus on building reliable care and stable systems, which is not easy to do. One of the challenges is that clinicians in the delivery of care do not think about quality measures. They are taking care of that one patient at a time. Which is giving the individualized focus that we want them to consider but their forethought is not necessarily always on the evidence-based measures. So, we are continuing to try to bring that into alignment. The physicians are eager to understand where the variances are. We do generate physicians with specific profiles on all of the measures, but we really would like to strike a better balance between the care at that point in time to that patient and immediate feedback and triggers. We have been a paper environment. That has been difficult to do.

What are some of our future steps? Education, education, education is always key. Continuing to try to help the clinicians understand the evidence based measures, best practices and how to build reliable systems of care delivery. For example, this year our Key performance Indicator that focus on clinical quality we are looking at the overall appropriate care score. That is all patients that fall into the populations of AMI, CABG, we have two hospitals that do CABG, pneumonia, heart failure and hip and knee, rather than individual quality measures or the composite quality score. So by helping them understand where the care delivery system breaks down and ensuring that that patient gets the appropriate care from start to finish, we are sharing the insight on where the system failures are occurring and that way we can begin our interventions more quickly.

We are working on enhancing documentation in a paper world. Paper records often contain incomplete and inconsistent documentation and are difficult to read. I have had – not sure I would say the privilege – in reviewing records for different opportunities I have established a new administration for the abstract person. As I tried to make my way through physician writing, documentation, use of abbreviation – just their unique abbreviation for how they describe the clinical presentation of a patient. One of the ongoing challenges for documentation of times and reconciliation across multiple forms as we look at many of the timing metrics, what we find is throughout our system, depending at what clock you look at on the wall, they vary. We have instructed our clinicians not to rely on their watch for anything other than the second hand if they need to use that, but to actually use the standardized clock that we have in the system. What we find is that the clock that is on the lab machine that punches or indicates the time of test may vary dramatically from the wall clock in the critical care unit. I mean literally ten minutes. When you are looking at timing measures where you are trying to meet something, it is difficult to do that kind of reconciliation.

Some of the lessons learned that have helped us as we start to transition to the future stay, BayCare is implementing an electronic medical record system. We are in the 2nd year of a 7-year project. It has been very interesting to me to step into the project when it has already been underway because I actually come with an unbiased perspective and a more reality based perspective then the key members who are actually involved in the implementation. They are so eager and enthusiastic, and knowing what I know about some of the data challenges, just trying to help them with the reality base.

The projected costs are in excess of $200 million for complete transformation across our nine hospitals. We do have a major proof for a 10th hospital, again, it can add $30 million to the cost of building a new hospital to ensure that today's technology is put in place to support the electronic medical records. Our EMR will integrate rules and alerts into clinical workflow to enable better clinical decisions; the project includes CPOE; electronic physician, nursing, and other clinical documentation; and clinical order entry. One of the things I want to emphasize is that in looking at the system, of which this is not the first installed in this vendor – this is a national system, I am astonished at how much build is going on in the install and recognize that the electronic medical record is only as good as the design. It is only as good as the programmer. It is only as good as the forethought of those who are building and designing this system and to understanding what do we mean by clinical data that we need to store, that we need to consider for a delivery of care as well as retrospective evaluation, and that is not always on the forethought in their mind. The alignment of the data that we are going to be storing with the performance measure remains a challenge. When you look at the programming, there is an electronic medical record that is put is place that is a vendor program and they will make all kinds of decisions along the way about what data fields are allowable, the ranges, etc. and then the decision needs to be made on what to store. What will we actually dump into our clinical warehouse, our clinical outcome data base that we can use for subsequent analyses. It is that subsequent analyses that requires some of the highest programming resources that we referred to. We maintain several full time programmers already to support our clinical warehouse. Pulling those queries is not as complicated as you would think from a programmer perspective. The biggest challenge is the translation of what you need from a clinical standpoint and helping them understand what we really mean by some of the inclusion/exclusion. What are the ICD-9 codes we need to look at? What are these other characteristics of the patient we need you to look at? That is where I am finding myself continuing to go back and visit with the programmer to see what did you really catch or hear because that translation between the clinical need and the information system programmer is extremely difficult.

One of the other things that I am finding is that the vendors will often have specifications. For example, they don't allow certain fields to have a null value even though that would be appropriate in the real world of clinical care, so defaults are put in depending on how they are defined could be seen as a true value. Those are things that unless you watch it with due diligence, you can end up with a lot of information that is not very relevant and in fact could give you a false picture of the care of the patient.

The project has also required a thorough evaluation of our physical environment. When you think about it we have a system structure – we are trying to place within them an electronic world that would set up the whole paper chart. We have to think about whether the PCs, laptops, monitors, and printers that we need? Where are they placed? What is portable? How many of those portable Units do we need? Where are the plug ins? How do we store them for recharge? What are the handhelds of the future? Where are we going there? How do we communicate throughout our system and transfer information in a more live state so that if that patient actually goes from the critical care down to x-rays, that information is readily available in real time electronically. Those are all challenges within an existing physical plan of trying to identify that. We have had to look at our work station. We believe that with the scope of information the clinicians will be looking at that they need very large monitors or two monitors. Again, trying to find that in your existing work space is extremely difficult.

Change is difficult under the best of circumstances. One of the other things that we are doing is training all of our key members in change management methodology. We have adopted Kotter's(?) eight steps with a focus on the simple phases of: Prepare, Engage, and Sustain.

Another change that BayCare is making is to be fully transparent in what we consider to be quality of care. We will be publishing our quality measures publicly, including tests of statistical significance against national comparative data when available. That has been one of the real challenges in finding national comparative data to help evaluate our opportunities for improvement.

Some suggestions I thought of in preparing this presentation, and I will run through it. Please continue to enhance the administrative data. The administrative data will remain the base of all health care information in your future. I do not think it is going to go away in my career, and I doubt it will go away much after that. We have a lot of opportunity to look at how can we the ICD-9 coding system more effectively? Obviously moving to ICD-10 will be extremely helpful. How do we think about it? What are the clinical data elements that we could actually move into a code that would move us away from that metaphoric distraction. One of the frustrations I have had is that you look at certain medications for the patient in heart failure and you look to see what was their ejection fraction, why cannot we categorize that like we do many other things and move that into an ICD-9 code so that we know what that patient's ejection fraction is? We do not have to rely on pulling in an old record. If there is a question, we certainly do the appropriate thing, but if we already have that information we follow the patient through and be much more effective than continually to try to find these records.

Present-on-admission is an enhancement to the current ICD-9 codes and will assist greatly in distinguishing co morbidities from complications. However early experience in implementation of this in Florida has identified challenges in definitions and we are really struggling the AHIM team to help them understand what we mean by present-on-admission. The guidelines that have been published are not sufficient for them to really understand and move that. When you actually look at them and the initial data that has been submitted, you find very few codes that are not present-on-admission. If they were never present-on-admission where are we in capturing our complication? That is a real concern for the way that something has been set up.

The other piece is the clinical complexity of patient and at one point can we actually identify some kind of a system that will allow us to understand a stelloquy of disease that is not necessarily a potentially avoidable complication. It might be a sequence of information, but it involves new changes to that patient that we want to be able to identify, but if the present-on-admission is not done appropriately, it could be considered a complication. So, that distinction is making the clinicians extremely nervous. I think moving them away and more toward we believe it was present-on-admission. We need to take a real look at that.

Continue to expand the detail in operational definitions. Focus on inter-rate of reliability. We need to standardize definitions across measurement programs and encourage the public sector to do so. There has been great effort on the part of CMS and the Joint Commission to standardize their definition but there is still a gap in other national programs. For example the Joint Commission for Stroke Centers of Excellence is different than AHA's stroke data in Get with the Guidelines. So we continue to have to maintain two separate data collection systems for those two different definitions.

We need to support measures that have transparent operational definitions and are in the public domain. Please do not force hospitals to have to adopt proprietary databases. We need to encourage those organizations to put the measures that are clinically relevant and evidence based within the public domain. Please let them compete on something else, but not on the operational definitions of the measures. If they can do a better job collecting and reporting the information, that is wonderful. If we can do it with someone else or internally, we would prefer the opportunity to do that as long as again, there are the appropriate checks for validity.

We need to focus on standards for health information technology. Again, my astonishment that we are doing so much build with a system that has supposedly been implemented across the country. I am astonished that what I see is lack of health information technology guidelines. It appears to be vendor driven and vendor centric. There is a gap between data needed for clinical care delivery, quality measures, and performance improvement. Will it all come together in the future? At what cost?

Evidence base measure and practice should drive what is in the health systems, not the other way around and too often we have been forced to create measures by what we have available. That is from looking at the quality care and clinical delivery of care in the wrong way.

One of the other areas that I want to talk briefly about is the challenges in identifying the patient. An electronic health record, even the best quality measures are all based on identifying the right patient, making sure that that patient's history is pulled in, that we have good clinical information. I can tell you story after story where we have individuals coming into the system using a veteran's(?) ID card – his identification of the patient – we cannot link our records across studies and across time. We have a true integrated delivery system within Baycare. We provide home care services, behavioral health, inpatient and outpatient in these areas, and we are challenged with tracking patients across time. We have to large primary care groups that we work with across our system and we can't link the information. We thought CIPA would save us. We thought we might come up with a national patient identifier. It remains a concern. We are starting to look at Biometrics. There is a healthcare system that we have talked with that uses palm prints to try do identify individuals. Again, when we move into an electronic world we have to be very careful to make sure we are pulling together the right information. One mistake, typo in a date of birth or a middle initial could pull in easily another individual. It is a real challenge. It is going to be harder to track that through in an electronic world than it is with a paper record where there might be other cues that help you understand that you are linking information from the wrong person.

We do need to look at expanding measures in other clinical areas. BayCare is not opposed at all to the concept of value-based purchasing. We want to recognize and acknowledge when we are doing well and when we are not. We want to identify our opportunities for improvement. There is only a small set of metrics that we have that actually cover the clinical population. We know that that will create some additional burden to the extent that we can use it, we can create those quality measures, use information either in administrative data, enhancing administrative data, help us more effectively gather the information, help us more detailed operational definitions that will make a big difference in how we can insure that we deliver high quality care. We do have gaps in a lot of other clinical areas. We are looking at oncology right now, for example, and there is just a huge variance in the clinical information that could be collected there, what those quality measures might look like. Behavioral help is another challenging area for us where we know we have a lot of opportunity and it is a real challenge to come up with sound evidence-based measures and good information.

I appreciate the time to talk with you today. I would certainly be open to questions. Thank you.

DR. CARR: Denise, thank you very much. That was very informative. Carol?

MS. MC CALL: Thank you very much. I would echo

Justine's comments. This is a delightful set of testimony. I have some specific questions, which go to some of what we are trying to learn throughout. You obviously have a very robust health statistics enterprise. You recognize this is what you want to do. It sounds like you have a real good handle on what you are doing and where you want to go. Within that, my question is, can you pick one or two areas where you have made a specific effort and you have really seen either dramatic improvement or you have learned something just really profound. I was really intrigued by your comment that you had created some evidence based measures and you have done very well, and you found that you were not able to sustain them. So maybe part of your comment could also include, do you and the folks at BayCare, think of that as a success or a failure?

DR. REMUS: In the world of continuous quality improvement, we see this as an opportunity. One of the examples I used was heart failure. We did much better in heart failure in the initial year of the demonstration than we have subsequently. We have drilled into that. We have found that most of our failures started around the discharge instruction. In the model of insuring that medication reconciliation occurred if the patient had all of the appropriate discharge instructions and that the right prescriptions were sent home, all of those other components, we certainly also looked if we could help facilitate a follow-up visit. We found that in looking at the model for one of our hospitals that was extremely challenged for the second year, they actually had one dedicated a case manager that followed all of these patients through the hospital and made sure that everything was done. I had no idea what happened when the poor woman had the day off. What we found was that role changed. She left the organization, and they restructured a little bit how they handled things, and again, without the recognition of some of the system-ness and the challenge of that when someone quit watching it, it fell through the crack. We really fell down on that measure of heart failure discharge instructions.

So, one of the things we have done is look at it and have implemented across the system the standard medication reconciliation form. We have had educational programs for our physicians on what do we mean and why is it critical to ensure that when that patient goes home, we have an understanding of what they were using at home, how that might differ from what they are going home with, the dynamic of changing the medication dose, the doctor doesn't often think about it and they will prescribe a different dose – well, the patient looks at it and goes I've got the 20 milligrams at home I am not going to change it again. I have already paid for that. I will try to split it or I will try to do whatever I can and you end up with a patient that is readmitted. We need to look at the systemness.

We do have some wonderful success in some of the clinical areas, for example, MRSA infection, ventilator associated pneumonia that we have been doing improvement projects on. What is interesting is that as we look at what we call our report cards of the board and we are enhancing that. We have been presenting this information for a while. I started to drill into it because these are not national measures. What I found, even across our own system, we did not have apples to apples. Some of the systems were not too bad, one was a red apple and one was a green apple. We were close, but one of the challenges as we move to national benchmarks is saying, how can we make sure that we consistently internally use an operational definition that we can prepare across ourselves as well as nationally? That is one of the areas where we have done well, but we need to enhance how we do the measures. We need to keep that flow going. We have some great records that we have put in place in the critical care unit. We have a new software tool we are implementing called IQ Trackers. That is something that understand more clearly at the exact time of the admission of the patient to ICU in helping us collect information so that we can look more clearly at all the patient mix and see where opportunities are. We have done a lot of training around all of the bundle of ventilator associated pneumonia. So there has been some good success there.

DR. GREEN: I would like to ask you to return to something you said about fifteen minutes ago where you were pointing out how that you thought for the rest of your career and your lifetime that we would be dealing with an administrative dataset.

DR. REMUS: Some administrative data, yes.

DR. GREEN: Could you just elaborate your own thinking a little more about what the constraints are that we face in trying to get to quality measures and clinical phenomenology as long as we are bound by administrative data that has basically as it here the analysis of claim in a commercial transaction? Could you say how you think about that?

DR. REMUS: The way I think about administrative data is not necessarily as much feeling as it is the internal data that is collected in the hospital system operations as well as subsequent billing. So when I think of administrative data, I think of the patient demographic, discharge status, date of birth, and the physician that took care of them. That key demographic and operational data element, but additionally all of the ICD-9 codes. Unlike a bill for example, our internal system will hold unlimited ICD-9 diagnosis codes and procedure codes. So the accuracy and integrity of ICD-9 codes, when you look at them in the clinical definitions there is just as much variance there. The ICD-9 codes that we have provided with some comparability to look at the patient history, things that were done to them that are translated into a future code, so I see all of that information continuing in the near future which is that we still need the demographic information on a patient. We need to know who is paying. We need to know the physicians that took care of them. We need to enhance that. I want to know the doctor that did the procedure. I want to know the date and time of the procedure. We can't vector that administratively in the system. That may never end up in a bill that is being submitted but to the extent that I have an internal to my care delivery system and I can use that to evaluate that administrative data has a lot of value. On top of it, the clinical data will have to deal with lab value. That has to do with medication administration. Has to do with some of the patient's response. Has to do with pain management and other things that we can collect clinically that enhances that additional information phase. I think we are still going to be capturing patients and looking at them in ICD-9 codes for a while. Does that answer your question? I see it is different – the administrative data that they have in hospitals is much intensive than what a secondary administrative data set.

DR. GREEN: Do you have anything you would like to say about how sticking with a core of administrative database actually enables the quality?

DR. REMUS: Enables? To the extent—one of the advantages that we have of administrative data today is that it is a known entity of consistency. We have good training. We have good systems that capture the information. There is a lot of integrity. The concern that I have about some of the transition in clinical data is that we do not have that experience and history. We do have vendor centric systems. We have a lot of variability there. We have challenges with the abstraction and inter-rate reliability of the data but until we have more experience with that world, we still have the base to be able to pull our patients to identify heart failure, patients that have had hip and knee replacements, patients that had AMI —there are certain things we can pull from that that make it easy to then say, oh, in that patient population this is what we want to look at retrospectively for quality measures. The clinical data is going to be critical, however, there is no question as we move to a better system in electronic health records that can inform our care delivery, hopefully we will have that panacea of clinical alert. We will know very quickly this is a pneumonia patient they should have the antibiotic. This is a patient who is eligible for a PCR(?) we need to get him in there and we need to acclimate all the systems. That we won't rely on human memory and human factors to start some of those better processes going. We will have some alert that when the patient cherkonan(?) assay is being elevated, what is critical and what do we need to pay attention to. Those are the things that a good clinical system can do. They help us deliver higher quality of care more reliably as well as look at it retrospectively. Maybe it is just my experience, but I am a little suspicious that it is going to take us a while to build a system that will do that well that we can maintain and can use to retrospectively evaluate quality and care.

DR. CARR: I will take my privilege of making the

final comment before we ask Mark Wynn to speak. What I am impressed with from all of the speakers this morning is that independent of the precision of getting at specific quality issues, the act of collecting and reporting data has had a transformational change and is pointing us not just to electronic health record, but to a new workforce that is sophisticated in IS as well as clinical and coding and beyond to individual definitions and exclusions, and to your point, geographical change in institutions. Then I think most compelling is that this measurement is here for the long haul. You cannot study for the exam. You can do it for a year, but you cannot sustain it unless you understand it. I think that in terms of quality, coming out of our current state, this transformational quality that you need Six Sigma and that you need to change what you do and look more carefully at what you do. I think that is huge quality impact.

We will move now to Mark Wynn. Thank you. Mark?

Agenda Item: Performance Measurement and Quality Improvement - CMS

DR. WYNN: Thank you very much. I will briefly talk about what we have learned in CMS about incentives for hospital quality and especially what we have learned from our Premier demonstration that Denise was involved with and where we are going from here. I will be talking a little bit about the hospital pay for performance agenda, the Premier payment model. Some of the changes we have made in that payment model which we have learned to deal from and some of the challenges and issues as we move forward.

First of all we have a very active demonstration program within CMS on the Medicare side. Those demonstrations have included a number of trials of things that are now rolled out into the program at large including the DRGs system, the initial Medicare managed care, our Critical Access Hospitals in rural areas, and a number of other areas.

We currently have about 30 demonstrations operational with another dozen or so under development, most of which has been required by law and which we are busily implementing. We have a number of paper performance demonstrations on going, not only the premier demonstration with hospital quality incentives, but also a large physician group practice demonstration – the PGP demonstration and then a smaller physician group called Medicare for Care Management Performance Demonstration. Another demonstration in development on nursing homes and home health quality. The Deficit Reduction Act of two years ago requires a CMS report on how CMS would implement Medicare Pay for Performance in the fiscal year 2009. We have been busily working on that project with a good deal of attention, subcontractors and contractors support and so forth. Tom Velick(?) is in charge of that. I am a member of that committee and it is not entirely a coincidence the number of the things that we have learned in the premiere demonstration have been proposed for implementation in the Medicare Pay For Performance Policy evaluate purchasing program. We expect to send a report to Congress we hope in of August of 2007 depending on clearances and further developments.

In addition, we are continuing to work on closing out the Premier Demonstration Phase 1 with data collection and verification and so forth, and we have just started the second phase of the Premier administration. I will go into some of the differences there in a moment.

The basic background issues on the Premier Hospital Quality Incentive Demonstration. This is a demonstration with the Premier Incorporated. We use this financial incentives to encourage hospitals to demonstrate high quality inpatient care. We report quality measurement data on our CMS website or roughly 260 hospitals in the demonstration. It is intended to test the impact of quality incentives. The initial implementation was in October of 2003. The performance period of first base to the demonstration ended in September of 2006. We just started phase 2 which is for another 3 years, which is 2006 through September of 2009. The initial phase of the demonstration, we are scoring hospitals on quality measures related to each of the five clinical conditions. We are rolling up the individual measures into overall scores for each of those clinical conditions, scoring them to determine top performers by each of the conditions and pay incentives for those conditions.

Our recognition and financial rewards system for the initial phase of the demonstration is to give a two percent bonus for the hospitals in the top decile who have reached those five conditions and a one percent bonus for the second decile. It is a single check in an annual bonus amount.

Here we are using the incentives in proportion to the Medicare payment amounts. That is the one or two percent, for the basic DRG payment suggested by area wages but not including direct or indirect medical education, disproportionate share in other add ons to the system. We are looking at cases defined by the principle diagnosis or procedure and if there is a distinction, for example, somebody who has both that they are presenting with AMI, but they have a CABG, the procedure will trump the diagnosis. We have some hospitals that are not paid under the DRG system such as the hospitals in Maryland or the Critical Access Hospitals. In those cases, we simulate what they would have been paid under the DRG system and pay them under the same type of procedures. All of the Medicare Fee for Service cases in the clinical category are included in the payment base. There are sometimes a disconnect between these two so let me briefly go over that. For example, all hip and knee cases are brought in but not of them are measured under clinical poling. For example, accidents, secondary procedures, are excluded from the quality measures but all of the Medicare beneficiaries who are in the Fee For Service and come in and get a hip and knee or knee replacement are included in the payment amount.

The steps we use to determine the payment amount is simply listing out the ICD-9 codes for each of the clinical categories, running the data to determine the payment amounts for all the Medicare discharges with those principle diagnoses or procedures. We determine which are the highest quality hospitals and then calculate the two percent or one percent amounts.

Now one of the interesting things about this demonstration is, do we have a penalty box performance area? That is our shorthand for saying that in the 3rd year, the quality score for each of the hospitals must exceed the baseline lower two deciles. We set the clinical thresholds for the year one sores at the lower 9th and 10th deciles and we insist that the hospitals must exceed those performance scores in the 3rd year of the demonstration. Otherwise, they have a penalty of one or two percent.

Here is our anticipated scenario. In this chart, on the left-hand side you see in the first year the 9th and 10th decile are determined for one of these clinical areas that say AMI. You must exceed 80 quality points to exceed that 9th decile. Then we measure that. It takes us several months. So, we are into the 2nd year by the time we can report the clinical areas that define this 9th and 10th decile. We hope that all of the hospitals exceed this baseline in the 3rd year and that there is nobody in that penalty box. If there is anybody below that 9th or 10th decile in quality score as measured in the first year, there is a penalty of one or two percent. We have not done our final measurements there, but I think the glass is 95/96 percent full. There are very few hospitals that are potentially in that penalty box. I think almost all of them have exceeded that and that is I think, good news.

Another thing about the demonstration is we have really seen quarter improvements. We have seen each quality score for AMI, CABG, pneumonia, that is community acquired pneumonia, heart failure and hip and knee replacement. As you can see, in each of the quarters here reported for each of the 5 categories, we see continued quality improvement. I think that is very good news. I would argue, on a basis of research, we have seen quality improvement not only in these major areas but also in these areas that have not been major. There is this spillover effect throughout the hospital and to continue quality improvement we are not just teaching to the test.

Another thing that we have seen that I think is very good news is that all the hospital categories have improved. As you might expect from quality improvement literature, we have seen reduction in the variance, improvement in the means score, improvement in the bottom scores and improvement in the top scores. We think it has been a very successful project so far.

The first year and second year of the demonstration, we have distributed a little bit less than $9 million to about 123 hospitals in the first year, 115 hospitals in the second year. These top performers have represented large and small facilities across the country.

One of the things that we do in our demonstrations is formal analyses. We think that is very important to learning what we can learn in transferring what we have learned to the rest of the program. In addition, a lot of what we learn is informal, ongoing implementation seat of the pants type of learning. Despite the fact that we have not completed our formal and objective evaluations, we did learn that there were some opportunities for improvement in our incentive types of policies. So, in the second phase of the demonstration, we changed the incentive policies a little bit. We think it is going to be fair. It is going to be broader distribution. It is going to do a little bit better moving from what you might call a tournament model to something which is a little bit more broad based and especially one that encourages improvement of all hospitals, including some of those that were below the average hospital in the demonstration.

Therefore, we have a 3-part payment system in the second page of the demonstration. First of all, we are going to be paying incentives to all hospitals that exceed the baseline mean as defined a few years earlier. We are reserving 40 percent of our funds for that. Secondly, similar to what we have right now, we are going to be paying for high attainment. That is the top 20 percent attainor's will get a bonus. Thirdly, objectively speaking, you really get more bang for the buck if you are able to take some mediocre hospital and raise them from the 10th percentile to the 55th percentile. That is an enormous change. That is compared to taking what we were talking about in Hackensack or some other hospital that has very good quality scores in the past and changing them from the 95th to the 96th percentile. That is good. Hats off to them. It is really great. We want to encourage that but really you get more improvement in the total system if you include and encourage those mediocre performers to move up. I have visited some of those hospitals and I have to tell you that some of them have done a very good job in really getting behind this and using this not only looking for the money, frankly, that is a minor part of it but using this as a focus and ability to improve quality.

What sorts of lessons have we learned? Well, Pay For Performance can work. It provides focus and incentives to improve quality and we have seen substantial quality improvement in the hospital area. Secondly, we think it is inevitable. It is coming. We want to prepare ourselves to move forward in this area. Thirdly, very modest dollars can have big impacts. We have substantially less than one percent of the payments these hospitals are going into. We are seeing big impacts just from minor dollars. We have seen continued improvement overall in the 2nd and 3rd and 4th years of the demonstrations. As Denise has said, there is some difficulties, but we have seen continued improvement overall. Finally, I would argue that the precise methodologies are somewhat important. Frankly, it is less important than getting the signal out there in terms of the choices of measures and overall perception of fairness. The exact role of percentiles and so forth may be less important than just doing the demonstration and getting it started.

We have a number of challenges to Pay For Performance. I am not going to take the time to go into all of it. I just want to acknowledge that we are very aware that there are operational issues, financial issues, scoring issues, major selection issues, and lots and lots of issues as we work through these policies and it will take continued work both from the hospital quality measurement community, from CMS and from all of the stakeholders to more in this quality area. Thank you very much for the opportunity to get to talk to you this morning.

DR. CARR: Thank you. That was very efficiently presented and very informative. Again, it goes back to this theme that having the measures is creating improvement within that measure, but transformational spillover into other areas. I think that is impressive. Do I have any questions?

DR. W. SCANLON: I think in terms of what we have heard this morning, this whole issue of there are so many different requests for information that that becomes problematic and then becomes more problematic as that shifts over time. I guess I am wondering about where you are in terms of thinking about changing measures. I do not know whether they have changed at all in phase two or whether you deliberately avoided that with the issue of retiring measures, which has come up, when you reach a peak in terms of performance. Should a measure be retired or should it be retained? Then there is the second issue, which is something important from a payer's perspective, which is the integrity of the data, and what kinds of investments need to be made to assure that one has a good, reliable information.

DR. WYNN: Thanks, Bill. A couple of things here. First of all, yes, we are adding measures in the second phase to the demonstration. We are looking at a number of these measures right now. Some of those areas include the age gaps as a clinical experience from questionnaires of patients. I think that is really good to include and we are going to start to roll that out in a few months for all of the hospitals and we are looking at how we can include it next year in the demonstration. I think there is a lot of interest in including it in the MedCare systemwide policy proposal as well

Some of the reasons I think that is good is that there are some very difficult to measure areas that I wish we could just look at, say, proficiency of discharge planning. At Acme Hospital it is 92 points and at St. Mary's it is 95 points. Well, that sort of a system doesn't exist. So this is at least a way to get at some of those very important issues regarding discharge planning, information that is given to the patient, patient perception, and quality of care which is so important, not just from perception, but also the measurement of these objective areas that are hard to measure otherwise..

In terms of the retirement of measures that peaked out, that is a continuing discussion. I do not know if there is a single answer to that. I personally think it is a good idea to retain some of these measures that are relatively high. The reason for that is that it retains continued attention to these measures. Some of these measures, which have been told by some of physician friends, that is cookbook medicine. Giving an aspirin to a patient who comes in with potential AMI. It may be a cookbook, but if you look back a couple of years to some of the work that came out of a few IO programs, reported by Steve Jenks(?) about 3 or 4 years ago, I saw shockingly low attention to doing these simple and cookbook types of things. A continued attention, especially if it is relatively easy to measure, seems to me to be a good idea. We are back and forth on that, and I know there is some tension on retiring some of these measures.

In terms of where we go in the future, we will have a report hopefully published and sent to Congress – we are hoping in August. We never know about clearances. We are hopeful that will go through because there is a lot of support for it. I think that even if it does not go through instantaneously, sometimes it takes a little longer to get these policies put into law and moving forward. We will be working on a continuing basis in terms of expanding our faithful performance demonstrations, testing new measures, and looking at adjustments and potential policies from which we can learn from future nationwide policies.

DR. CARR: Just to add on to that, has there been any thought to putting -- if you are a high achieving hospital and you report less frequently is it almost — given the cost we heard this morning of the burden of data collection. I think it is a kind of incentive if the high performing hospitals could report less frequently?

DR. WYNN: We have looked into a number of these

areas and sampling is part of it, too, but even using sampling procedures, you still have a very high data burden, even if you are having 400 cases in a large hospital. Well, most hospitals do not have 400 cases in a given clinical area. So, it takes a pretty large hospital to get to that point where you are sampling. In terms of frequency, no we have not been looking at that — I am not quite sure how to do that and still hold them to the same standards that other hospitals are in the system.

Quality of data has been quite good. We have been doing random sample evaluations for the data, the inter-rater reliability has been over 90 percent consistently despite the challenges of working with paper records that Denise talked about a moment ago. We have heard very few reports of problems in this area. It is a challenge that I think we have been able to meet. We do have some proposals of how you can do data verification on a more targeted basis, perhaps some random sampling and some more focused sampling for those hospitals that have shown rapid improvements or aberrant data. Those are under continuing discussion. We do not have a published plan on that yet.

DR. W. SCANLON: This issue of giving some relief to a higher performer. I am afraid that Denise's testimony indicated how fragile high performance may be.

DR. CARR: It was defined year over year.

MD. W. SCANLON: Right, yes. Changes not measures may be something that triggers a significant change in the performance.

DR. CARR: Carol?

MS. MC CALL: Congratulations on some early success on what I think is something that is pretty exciting. I have a couple of very specific questions. You had on your last slides outlined of a number of challenges. I was assuming in those challenges a number of plans to address as many as possible. So, my question for you is a two-part question really. Some of these things will happen naturally and some of them won't. And some of them hold greater opportunities than others. If you have to pick just three things out of there, which do you think won't happen naturally and therefore you need our support to remove barriers? Where can we help, and specifically, how? Second, which hold the greatest opportunity for synergy with something else that if we could somehow – it could happen naturally – but if we could amplify it would be greater than if we did not?

DR. WYNN: That is a tough couple of questions. In terms of what is going to happen naturally, nothing happens naturally. It takes continued effort on every single one of these issues. In terms of where we need to work in terms of synergy, I think that to me one of the areas that the government has a special role to play is the standardization of the measures in the reporting systems. Certainly, CMS is well aware of that and every time there are differing measures, there is an attempt to precisely standardize that. There is nothing that drives a coder crazier than these minor changes as much as a comma or a point or something like that. That really speaks to the continued need for standardization and I know there is an enormous amount going on in that area. I think that is a particular area, both standardization of measures and the standardization of the reporting system and the incentive system. That is, we do have a responsibility as the largest payer for health services to lead the way here and, to the extent possible, to lead in such a way that private employers, anyone else, any of the other stakeholders can join with us, not mandated to but can if they wish to join the train and move in the same direction and incentivize in much the same way for their own programs.

MS. MC CALL: Thank you very much.

DR. GREEN: Mark, I am wondering if you could teach me something about the long-established persistent pattern in Medicare beneficiaries of expenditures being concentrated in a relatively small percentage. It looks to me as if you have enough demos in your package that you might know what the impact of this quality measurement is on that subgroup of beneficiaries. Can you tell us anything about that?

DR. WYNN: Well, certainly there is an 80/20 rule that works in spades in health care. Everybody knows that. So far, number one, in some cases quality measurement and quality improvement may not do a lot of good. In some cases, these are patients that are so frail and so much in need of health care that even the best types of preventive care, quality improvements, and so forth may have marginal impacts on them, at least in many clinical categories.

Another thing that we have learned is that – unfortunately this is especially true for the Medicare beneficiaries. In many cases these folks are so frail, hard to reach, hard to engage, and they have not only mental impairments that it is hard to engage them in the types of case management that may do them the most good. It is relatively straightforward to do it in an employed(?) population. A third thing that we have learned is that 20 percent does not stay put. What you see in one year is not what you see in the next year. There is a lot of regressions from the mean. I am afraid there are a lot of folks who do not quite understand it is that the extreme groups in time one in most cases, in almost the measurement of any type of phenomenon, turn to be less extreme in time two. In order to really measure what you are doing for these folks in terms of quality services, support, that they reduce future expenditures. You cannot just do a simple time series analysis. You really have to do a demonstration that has random assignment of control groups and some of that is hard to do. Sometimes the answers do not turn out as well as they seem to if you don't do that kind of control group experiment. Those are some of the things we learned. It is a hard nut to crack.

DR. GREEN: Thank you very much.

DR. CARR: I would like to thank our first panel of speakers. It was incredibly informative and you answered our questions and you made it very understandable. It really set the stage for the questions that we want to ask so thank you very much. We are going to break now. It is 10:20 and we will break until 10:35. We will be five minutes behind, but hopefully not too much more than that.

[Brief recess.]

DR. CARR: Okay I think it is time to regroup and take your seats. Okay, welcoming Dr. Tang who is also doing double duty with the privacy hearing along with Dr. Cohn. So, thanks both for coming up. We will start off now with Dick Johannes about the hybrid clinical data model.

Agenda Item: Performance Measurement and Quality Improvement

DR. JOHANNES: Thank you very much and good morning. My name is Dick Johannes. I am a clinical gastroenterologist with a master's level training in computer science. I have had academic appointments at Johns Hopkins and still hold one at Harvard Medical School where I still practice gastroenterology. But 80 percent of my effort goes into supporting an outcomes research group and clinical research group within Cardinal Health that supports quality initiatives in all hospitals in the State of Pennsylvania as well as many hospitals beyond Pennsylvania. My comments this morning are going to be principally related to inpatient data and reporting at the hospital level.

Shortly before his death in 1934, William H. Wells, who was one of the founders of the Johns Hopkins School of Medicine and the only one to live long enough to be recorded on film. On that film, he credited his success to the intersection of good fortune and preparedness. That somewhat familiar twist describes how we, at Cardinal Health Medical actually have come to have data that is relevant to today's discussion.

In the early 1980's, several meetings occurred between the founder Medic Well(?) and Donald Fetteroff who was then the Chief Medical Officer at Highmark Blue Cross Blue Shield and Ernie Sessa who founded the National Association of Health Data Organizations and was to become the first Executive Director of the Pennsylvania Health Care Cost Containment Council or PHC4.

All three of these people shared a belief, which was then only a belief, that validity and precision of adjustment models used for public reporting, depended on clinical data that went beyond claims data. PHC4 acted on this belief. Pennsylvania is the only state to perform uninterrupted public reporting of hospital performance for over 20 years across more than 50 medical conditions. Cardinal Health has supported the data definition, data collection, and risk adjustment methodology over the entire time. Through this pathway, we have had data on all discharges from roughly 190 Pennsylvania hospitals with the standard data UB92 data, but coupled with laboratory results, vital signs, and selected abstracted data elements such as the presence or absence of thorough(?) infusions, assessment of level of consciousness, or left ventricular rejection fraction. It is worth noting that both the fine and pneumonia severity index and the arc failure rescue metric were both developed using these data.

I will try to cover four central themes during my time this morning. First, the timing of clinical data is important to admission-based severity stratification. Second, laboratory data and to a lesser extent vital signs are highly objective and powerful predictors. Laboratory data is becoming widely available in electronic format. Finally, the influence of clinical data for face validity should not be underestimated.

Turning to the clinical data and timing, since vital signs and laboratory data are both time stamped, they provide better identification of risk in the peri-admission period. The free risk adjustment models from the criticism that late hospital events are used in the adjustment process and hence make it impossible to separate complications from comorbidities. The issue was not one of being able to discern whether an event in an ICD-9 code and when – because if that ICD-9 code were also related to mortality, then the C statistic could actually improve. This is why I think the methods describing this need to move away from the battle of C statistics. We need to continue to pursue message methods that identify mortality attributable to patient admissions severity.

One new approach already mentioned on the near horizon is the use of present on admission coding or POA Coding to identify whether secondary diagnoses were or were not present at the time of hospital admission. While an important addition, some pause is warranted. Recall that a POA flag to be added that the code must first be collected.

Let us examine how this might work for a situation such as hyponitremia. Hyponitremia or low serum sodium is defined as the sodium level below 135 milliequivalents per deciliter. There is an ICD-9 code for it, and if used can actually advance cases into higher-level DRGs for reimbursement. Hence, there is actually an incentive for getting it right. These data show the results from 578,878 cases from 83 hospitals reporting data electronically over the years 2002-2004. We asked the question, what fraction of cases with laboratory-documented hyponitremia on admission also carry a secondary diagnosis of hyponitremia down that column.

As can be seen, the sensitivity is only 11.8 percent and only improves marginally when looking across the full length of the hospital stay. Nearly nine of ten cases lack a secondary diagnosis for which to affix a POA code. Since patients with abnormal laboratory results often have them repeated as their clinical course unfolds. We have also looked at cases where the diagnosis has been repeatedly confirmed by laboratory data and again asked, what is the detection rate? Sensitivity for hyponitremia only rose to 30 percent when over more than ten determinations of this abnormality were in fact known. Similar results can be seen on this slide for a variety of other laboratory findings.

Let me now turn to which laboratory studies our group has relied n over the years. Two-hundred-thirty laboratory studies are represented in this table. They are all common tests that, with the exception of cardiac enzymes and blood gases, are collected in over 90 percent of inpatient admissions, which means that the issue of missing data is minimized. At present we have used these data primarily to examine clinical status on admission. However, using automated laboratory data collection, there are opportunities to examine changes during the hospital stay. For example, tracking creatinine longitudinally could provide insight into renal function throughout the hospital course.

Having now looked at which laboratory values we have found useful, I wanted again to ask the question, what is the predicted power at laboratory values? What as particular ICD-9 code tells you whether something such as renal failure was or was not present? Laboratory data also failed(?) with severity. One of the steps in constructing our risk adjustment models entailed examining potential variables in a univariate manner prior to testing them in multivariate manner.

Here are the results for one such analyte, serum albumin. This is 2003 data on roughly a million patients from 218 hospitals and plots serum albumin levels against mortality. The dotted blue line shows the normal range and the dotted red lines are the cutoff points used to transform the continuous data into five discreet ranges. As you can see is markedly elevated mortality risk once the albumin crossed the normal level, rising to nearly 17 percent for albumins lower than 2.5 grams per deciliter. To help put this into perspective, the in hospital mortality risks for several diseases – heart failure, myocardial infarction and sepsis – are shown as reference.

Having now begun to define curves like this one, w are moving on to do it for a specific disease. That is, does a creatinine of 3.5 carry the same risk in a patient with chronic renal failure who has had it for three years as someone with acute renal failure who has had it for three days? There are strong argument for using disease-specific cutoffs in these laboratory values.

Lisa Isani(?) is a major contributor in this area and used the term "dimensions of risk" to characterize the various classes of data used in the risk adjustment process, and I borrowed that term and concept from her. This can be shown in this stack of cylinders, each of which represents data of a different type. As you go up that chain, cost and difficulty of collecting the data and collecting it accurately clearly increases. It is often thought of as a relatively thin slice of added clinical data atop a larger base of claims data. WE will come back to that formulation near the end of my comments

There has been recent rekindling in interest in attempts to quantify the benefits of clinical data. A series of studies that are the result of an AHRQ sponsored contract led by Dr. Anne Elixhauser, who is with us today is making this occur. The contract was led to Abt Associates with a subcontract to Michael Pine and Associates. The first results were reported in December publicly at the annual NAHDO meetings and two publications appeared, one in the American Surgeon and the other in the Journal of the American Medical Association at the beginning of this year.

What was done in these studies, to begin to appreciate this literature, it is important to at least at some level to understand the design. Three years of data was used across 2000 and 2003. When multiple models were developed for several disease groups, eight conditions were examined. Five were medical and three were surgical. Myocardial infarction, congestive heart failures, stroke, gastrointestinal hemorrhage, pneumonia, and on the surgical side abdominal aneurism repair, coronary artery bypass surgery, and craniotomy. Models were built up progressively forming a family of models beginning with age only. Then in standard administrative data, administrative data that was complemented by an imputed POA flag created from clinical data which in my mind may represent an upper bound for that performance. Then laboratory data, vital signs, clinical elements, and compound clinical elements such as the level of consciousness and left ventricular ejection fraction were progressively added.

An example of the results can be seen in this slide which were presented at the NAHDO meetings. Despite it deceptively simple appearance, I find this slide usually takes some time to understand. The goal was to examine the degree to which enhanced systematic bias associated with inadequate risk adjustment could result in misclassifying hospitals. Since hospitals are often compared with standardized rather than absolute differences, the effect of observed expected differences for each model of level against the gold standard are shown in terms of standard deviation units. The Y axis represents the percentage of hospitals that would exceed any selected upper boundary of standard deviation. It could be thought of as fraction of hospitals that are subject to misclassification as a result of systematic bias in the models.

For example, consider what is meant at the there standard deviation level. This would represent a hospital that was actually one standard deviation from the mean as a positive outlier and move three full standard deviations in the other direction, to become a negative outlier. Even raw and adjusted data can handle that.

However, as you move that difference back, small differences, particularly toward one standard deviation, where a hospital is in fact at the cusp of the inter-quartile range, you can see that to keep the bias below a 10 percent level, we need to add the green line or a laboratory value, to get it down from unadjusted age, administrative, administrative plus DRA(?), vital signs for clinical.

It is also clear that if you wanted to keep it below 10 percent at any level of standard deviations you would have to add vital signs as well. Full clinical models perform across the board at all levels. The question here is no longer one of whether they work but how practical it is, and Dr. Elixhauser will speak to this point I think at some point.

Since my assisted in the generation of this dataset, but we did not know how the study would take form, we chose to ask a separate and somewhat different question. Namely, what are the relative values of the various blocks of data? We collaborated with Dr. Jeffery Silber at the University of Pennsylvania who was instrumental in introducing us to a method called the Omega Statistic, which measures the relative contribution of two different groups of explanatory value to the overall power of the prediction. This is of interest because it could be argued, and rather than adding clinical data to claims data, one might ask, why not maximally use the objective of least damnable laboratory data first and add other data elements above it. We studied six conditions -- ischemic and hemorrhagic stroke, congestive heart failure, pneumonia and sepsis – and there is considerable overlap with the Pine(?) work. Hierarchical models using random intercept logistic(?) progression were constructed and the relative contributions of the various blocks were compared, using this omega statistic. This work has been accepted to Medical Care and will appear in the August issue.

You can see the results here. If we compare the relative contribution of the laboratory data to ICD-9 you get 7, 2.6, 3.6, 57, 14 and 8. Interestingly enough, the vital signs across the board, usually around a level of three, also seems to hold true for situations where one might expect vital signs to move values such as cardiac conditions. One of the last elements that are difficult to obtain, assessment of mental signs, are terribly important for neurologic situations, where those events carry the day.

One of the ways to look at this is these results should not be too surprising. Laboratory data are commonly used clinically because they support robust estimation of the function of critical organ systems. For example, the kidney is typically evaluated by BUN and creatinine and to a lesser extent sodium and potassium, standard liver function tests. Where do my lab values need some help? The answer is perhaps in the heart and brain. BNP and Cardiac enzymes are valuable. A lot of clinicians would argue that a few changes might occur for impact from that time and neurologic issues of laboratory data are really quite limited.

With all this said, let's go back to my cylinder, using congestive heart failure as an example. This shows what one of our actual models looked like in terms of the distribution of the data that are actually used. If you can see, it is really quite clinical. The distribution tends to be a large number of laboratory values with smaller contributions coming from the other three groups.

One final point before I summarize that there is distrust amongst physicians rightly or wrongly for results that occur exclusively from claims data. By providing models with greater transparency and consistency, we may be able to do a better job of recruiting a full clinical committee into the quality agenda. This has certainly been true in Pennsylvania where the Pennsylvania Medical Society is an outspoken supporter of the public reporting done in Pennsylvania and largely because of the coupled method of using claims data in combination with clinical data.

In conclusion, clinical data has several advantages. They are objective, precise, time stamped, they suffer from few missing data, they are not susceptible to gaming, they are easily verified in the medical literature, much accepted by the clinical community, and they have a tremendous opportunity particularly in terms of the laboratory data for automated data collection. In my mind, at least for data collection, we should be moving the discussion no longer from a question of whether to a question of when and how. Thank you.

DR. CARR: Thank you very much. Questions? Go ahead. Well, I wanted to ask on following up on Mark Wynn's testimony where he talked about the 80-20 rule and the 20 percent of the frail -- how would this synergize or be applied to the CMS measures or have you done that?

DR. JOHANNES: We have not at this time. We would love to do a comparison of several of the methods to look at just that question. The low albumins and congestive failure would represent cardiac ataxia and that puts them at one far end of the severity spectrum. It is one of the reasons that I think the lab data are so helpful in identifying particular type of patient that you are trying to severity stratify.

DR. CARR: In terms of the issue you raised about physician buy-in, can you say a little bit more about that in terms of having this clinical data added to the claims data and what that response is.

DR. JOHANNES: As you might imagine the hospitals in Pennsylvania, since they are publicly reported annually across a variety of these diseases, yet they happen to fall into an outlier range particularly on the negative side, they are quick to contact PHC4 and ourselves and first go through that whole process of questioning the data, questioning the method, questioning the coding, and then eventually getting to a point where you do or do not recognize that they may have a care issue and effect a change. It is much easier to get the clinical proof on board if they argue that the cases were or were not sicker than their neighboring hospital, to show them hard data regarding the proportion that had renal failure and those that do not and how that goes compared with their peers. It just makes that part of the argument go smoother.

DR. CARR: Just one other question before we get to Carol is that you have been collecting for 20 years and were there data elements that you were collecting initially that you ultimately excluded?

DR. JOHANNES: That is a great question. Thank you for asking, Justine. I almost put some comments about that into this presentation. The answer is unquestionably yes. Not only some, but many, which is what surprised me so much that people organizing things such as core measure data went to what we consider the most difficult data to collect, KCF data first.

DR. CARR: What is KCF?

DR. JOHANNES: Key Clinical Finding elements such as ejection fraction, something that really requires chart extraction, because those are the ones where maintenance of the glossary hospital-to-hospital inter-rater reliability which we measure in Pennsylvania, that is difficult to do -- the clinical elements that I believe are ones that are closest to being ready for widespread use are the laboratory data. They do not share some of those problems that others do. Yes, we used to collect data in 1996, all of the data was collected through abstraction at that juncture and there were almost 200 data abstraction elements. We are planning to get that down. If you say vital signs represent four or five if you call blood pressure two numbers. We are probably going to take that down to no more than five others. It used to be, when I got there, it was 210. Five years ago it was down to 67. I think we will take it down to somewhere around 10.

DR. CARR: Thank you. Carol?

MS. MC CALL: Thank you for sharing the information. It was absolutely fascinating. I for one, being kind of a data dog, could spend all afternoon learning about things like that. I will try to hold back. A couple of questions. A lot of this was focused around the topic of risk adjustment and getting things apples to apples. My question would be, are there additional uses outside of or in addition to performance measurement where there could be intermediate measures related to quality or evidence-based medicine where the measure itself is used for more than making sure that they are comparable but can be used to make decisions or provide clinical guidance to a physician? Can you talk about some of these?

DR. JOHANNES: I can tell you a couple of them. We have been examining a number of disease states to look at what could be more bedside methods that you could use to affect stratification to affect care as well as on the backend than using it to do observed and expected ratios for comparison. We have been looking heavily at pancreatic disease and that area, and we just presented our results at the spring HEA meetings where we came up with a 5-element approach that we call DISAP. It is a wonderful double entendre because it stands for death(?) index of severity in acute pancreatitis, but it also means DUN, frame of mental status, serum albumin, advanced pH and pulmonary findings(?). That's just easy to remember.

Using that, it is a score between 0 and 1, and each one is scored between 0 and 5 and used in that manner. In addition, we have used data to answer a variety of clinical questions surrounding quality over the years. I mentioned two of them. Another one we had looked at is the relationship between hemoconcentration and mortality in a variety of diseases. Once you have these data, there are a variety of clinical questions you could ask beyond immediate quality measures.

MS. MC CALL: I know a lot of this was around inpatient. How much would extend to an ambulatory setting? Part two of the question is, can you get into a frame where with a person or a patient that becomes kind of a know your numbers challenge?

DR. JOHANNES: Well, for people with chronic diseases, I think if you have been educating them well, they do know their numbers. The idea that you can get people closer to these numbers I think is a very sound idea. I think it is tractable.

DR. TANG: Thank you so much for the presentation.

It was very cogent and well presented and compelling. I do have one quick question. If you could innumerate the ten you think are going to, or the 5 additional besides the vital signs you are going to hold up and the other is, recognizing that LBEF or cardiac dysfunction is so central to a lot of the cardiac measures, is there some other substitute for that which is right now uncoded and not easy to abstract from electronic systems?

DR. JOHANNES: I will try to answer both of those.

First the five, I think we would keep evidence of level of consciousness at the time of admission. We would still collect left ventricular and ventricular fraction to answer your question because I do not think there is a substitute. I do not even think BMP is yet a totally good substitute even in congestive heart failure. The presence of acytes and the presence of certain drugs in ambulatory settings, particularly MLS suppressives, anticoagulants, insulin for hypoglycemics.

DR. TANG: Did you answer Carol's question about how this would apply to the ambulatory?

DR. JOHANNES: I will answer that question. I think that is a reach. I think vital signs in any of the clinical data in today's world, collected broadly, are a reach. I think it needs to move in that direction. I think there are great opportunities there. I have not seen any data, and I have very limited data in the ambulatory setting, to be able to answer that. I am now becoming more interested in looking at laboratory data as a function of time through the hospital stay in an effort to understand a potential complications mid stay. I must say I still remain heavily focused on the inpatient side.

DR. CARR: Bill?

DR. W. SCANLON: I think this is fantastic in terms of the potential here. Where I am interested in going is whether we are drawing an artificial distinction between administrative and clinical data and whether we should be redefining the administrative data to include some of these clinical measures as long as they are not ones that you have to abstract or ones that are subject to a lot of variability. It seems that the power that is there is something that we should not be forgoing. I guess I am wondering if this was proposal for hospitals you are dealing with where we were going to require this, how would that be received? Maybe you have to think back in time because these hospitals we deal with mostly have become accustom to reporting.

DR. JOHANNES: I think if you constrain it to the laboratory data, you will have a substantially different discussion. I fully agree. I actually hate the term administrative versus clinical data. I think they are all clinical. It is a question of what information is in each of those buckets that the other bucket does not have. In no way would I argue that a purely physiologic measure of severity would supersede one that was also coupled with classical administrative ICD-9 data.

I think that ICD-9 can and should continue to be probed, but I would argue that I would rather see more time going into improving understanding how to recognize lower gastrointestinal hemorrhage from different clauses and get those codes right then try to code hyponitremia. Given that it is not coded well, I think that the pendulum may be tipping to the point where attainment of the lab data is less expensive in both time and money and accuracy than improvement of the ICD-9 codes.

DR. CARR: Thank you very much. With Bruce's permission, we are going to switch the order and ask Anne to present now because of the synchrony with Dick's presentation. So thank you Bruce and thank you Anne.

Agenda Item: Performance Measurement and Quality Improvement - AHRQ

MS. ELIXHAUSER: Okay, thank you very much. My name is Anne Elixhauser and I am a Senior Research Scientist at the Agency for Healthcare Research and Quality. I am going to be talking about an initiative that we have been working on for the past several years on improving the value of administrative data, specifically for the purpose of reporting quality of care at the hospital level.

Here is what I want to be talking about for the next 20 minutes. I will first provide you with some background on the issues, and then I am going to summarize a research study that was conducted by Michael Pine who was also referred to by Dr. Johannes earlier. The study that was conducted by Michael Pine and his staff at Abt Associates. I was the AHRQ lead on that, but Dr. Pine was the principal investigator. Then I am going to describe where we are going next to actually implement the results of these research studies.

Hospital administrative data as we have talked about or hospital bills or claims data are available for a near census of hospital discharges from currently about 45 states in the United States. These data provide information on every hospital stay in those states including basic patient demographic information, how the patient was admitted -- was it a routine admission or a nursing home, what happened to the patient during the stay, what sort of resource use occurred, and how were they discharged, routine, to a facility, or whether they died, whether they dies in the stay. Now, AHRQ has been working in collaboration with 38 of these states to collect all of these discharge abstracts in each state to convert them into a uniform format and to make them available for research. This is what is called the Healthcare Cost and Utilization Project or HCUP. As part of our work with the HCUP data, AHRQ has sponsored the development of a set of measures that use solely hospital administrative data to assess quality of care. These are the AHRQ quality indicators. There are currently four modules. Prevention quality indicators are measures of inventory care sensitive conditions. Inpatient quality indicators look at mortality, utilization, and volume of services. Patient safety indicators look at potential safety problems that occurred in the hospital, and the newest module of the pediatric quality indicators which are focused solely on children. Since these measures have been released for public use, a number of organizations have adopted them for purposes of quality assessment. About nine states currently use the quality indicators for publicly reporting quality of care for hospitals in their state.

It has long been known that despite the wealth of information that is provided by hospital administrative data, but the data do have some critical limitations. One is that while the administrative data do contain some clinical information, it is limited to what is contained in ICD-9-CM codes. So, while we may know that a patient has uncontrolled diabetes, we do not know how badly out of control their diabetes is. We know the potential, but we do not know the blood pressure reading.

Furthermore, while we have a list of diagnoses, we do not whether those diagnoses were present on admission or whether they developed during the stay. So, we know the patient had pneumonia in the hospital, we do not know if they were admitted with pneumonia or if it occurred as a complication of their care. Algorithms like the AHRQ quality indicators can do a lot of good with this information. For example, if a patient had a major elective surgery on day one or day two of a hospital stay, if they had a diagnosis of pneumonia, we can assume that they were not admitted with the pneumonia since elective surgery would presumably not be performed on a patient with active pneumonia. The AHRQ QI's in this situation would assume that this case has a hospital-acquired pneumonia. Nonetheless, because of concern about inadequate risk adjustment, because of concern about penalizing providers who have the sickest patients, questions have been raised about using solely administrative data for quality in reporting.

How do we get more clinical detail? We have been talking a little bit about that already. Two states, California and New York already are collecting information whether the diagnoses are present on admission. Now CMS is mandating that this information be collected for all Medicare patients starting in January of next year. One state, in Pennsylvania, clinical information is being manually abstracted from the medical record using outcome systems. Questions have been raised by hospitals about the cost and the burden of such manual data collection. Although EMRs are being more widely adopted, we are still years away from routinely being able to rely on EMRs to provide us the information that we need for quality assessments. That one exception is lab data which are available electronically in about 80 percent of all hospitals.

Given this context, AHRQ sponsored a study that was conducted by Michael Pine and his associates to asses the impact of adding clinical information to the administrative record for purposes of quality reporting. We examine more complex and more expensive to obtain data to identify the most cost effective enhancements to administrative data. Now, because POA information is collected at the same time and by the same personnel who would abstract a medical record for the claim, and who code diagnoses for ICD-9 codes, we added POA information early in the modeling process. We then added lab values at the time of admission assuming that numeric information from a single point in time would be relatively easy to obtain and relatively inexpensive to obtain, given that lab data is available from the majority of hospitals.

We also assessed the impact of simply increasing the number of diagnosis fields to see what kind of impact that would have. We then examined the impact of improving the documentation of diagnostic information using ICD codes. Currently, coding rules stipulate that when there is a final diagnosis, for example stroke, that symptoms like coma would no longer be coded. We wanted to see what would happen if these findings were coded in addition to the final diagnosis.

We then added information on vital signs and admission. Again, numeric values at one point in time but that are less routinely available electronically. Finally, we added more conical information that was more difficult to obtain. Then, through cost effectiveness analysis we assessed the most cost effective enhancements to administrative data. These studies have been reported in a number of manuscripts already. One that Dick mentioned that I didn't include here – but there has been one published in JAMA in January of this year, one that came out in June in the Journal of Patient Safety. Another one will be coming is the Annals of Surgery shortly, and we have submitted a fourth for publication.

The results I present here will highlight some of these findings and will go a little bit beyond what Dick provided as well.

The data that we used for the study was supplied by the Pennsylvania Health Care Cost Containment Council. We really do appreciate their generosity in providing the data. They provided us with all administrative data from 188 hospitals over a three-year time period spanning 2002-2003. In addition, for all of these records, they also supplied detailed clinical data that were abstracted from medical records using this outcome systems which records a hospital day corresponding to each data element. We also used New York and California claims data, which identifies those conditions that are comorbidities – that is, present on admission versus complications that originated during the stay. We applied what we learned on the New York and California data to the Abt data so that we could model a POA modifier for the Pennsylvania data. We studied eight mortality measures and four patient safety measures. There were three surgical measures, five medical, and four patient safety measures that you see here.

As I mentioned, we developed incrementally more complex models. The sequence that I outline today is actually one of a number of different sequences of models that we tested that were reported in the various manuscripts. The ones that I report today are illustrative of the findings throughout.

We began with a model that was based just on routine administrative data and up to eight diagnosis fields. We then added POA information. Third, we increased the number of diagnosis fields to 24 to see if more diagnostic information was helpful. Fourth, we then added information on conditions that were present in the Atlas(?) data, but did not appear in the ICD codes because of coding rules. This includes conditions such as coma, pneumosuppresion, chest diffusion, and history of chronic lung disease. It also included some coding of numeric values like hyponitremia.

We then added numerical laboratory data that were obtained on the day of admission to the model that included POA and both 8 and 24 diagnosis fields. Then we added lab data to the model that assumed improved coding of the claims data. Finally, we added full clinical information, vital signs, other lab data, key clinical findings, and composite clinical scores like the ASA classification.

Now, other analyses that were done as part of this study broke out these clinical models in more detail. We ran separate models for vital signs and the lab results for clinical findings and for composite scores and details are available in some of the other papers that I mentioned.

Here are the C-Statistics for the Mortality Models. These are mean C-statistics across the eight mortality models that we looked at. The C-statistic measures the discriminative ability of a model. So .5 is a pure guess and 1 is perfect discrimination, 0.7 to 0.8 is good, 0.9 is pretty excellent. So the C-statistic for the curative administrative model was about 0.79, which is really quite good. When we added the POA information we increased to about 0.84. When we added lab data, that is about 0.86. The full clinical model was 0.88 for C-statistics. So, what we see here is the biggest jump in the C-statistic when you add the POA information. The next biggest jump is when we add lab data or when we model a revision in ICD-9 coding to allow for coding of symptoms like coma or immune compromised data. We got another smaller jump when we combined improved coding and lab values together. The full clinical model added relatively little additional discriminative ability. I do not present the patient safety models here, but it is a very similar sort of pattern.

We used another measure of model performance which we termed Hospital Level Bias. Basically, what we did here is that for each hospital, we calculated the difference between the number of adverse events that were predicted by the full clinical model where we had the most information and the number of adverse events predicted, each alternative model that had less than full clinical information. Then we expressed this difference in terms of standard deviation units by dividing the difference between the number of adverse events in the full clinical model and the number of adverse events in the less complete model by an estimate of the standard deviation of the number of events predicted by the full clinical model.

So, the use of these standard deviation units basically takes into account variations in the number of cases at the various hospitals as well as the predicted events of those cases in the full clinical model. So we are basically always comparing less than complete model to the full clinical model and seeing what kind of bias we are introducing by using less than complete information. Okay?

We then reported the number of hospitals with differences greater than 0.5, 1.0, 1.5, two standard deviations, just in order to provide readers different thresholds to see how much error they can live with. Then we examined improvements in model performance in terms of reduction in the percentage of hospitals with unacceptable bias. These hospital level predictions really are the most relevant measures for this study because they size how hospital rankings will change under various models. For simplicity, lets focus on the first column that corresponds to a threshold of 0.5. What we see here is that the mean percentage of hospitals with bias compared to the full clinical model, was just under 70 percent for the raw data – that's the red line at the top. That means if we just use raw data, 70 percent of the hospitals are classified inappropriately compared to the full model.

For the basic administrative model, the light blue line, this is the basic administrative model based on just eight diagnosis fields. About 45 percent of hospitals had bias exceeding 0.5 standard deviation units. When we added POA to the model, regardless of whether we used 8 diagnoses or 24 diagnosis fields – so this is the yellow diamond superimposed on the blue square, about 38 percent of hospitals had bias exceeding 0.5 standard deviations. What this is telling us is that adding POA is really the most important factor here. It is not the number of diagnoses that we are adding,

When we added improved coding, which is the pink line, about 22 percent of hospitals still had an acceptable bias, and when lab data was added to the POA model, about 18 percent of hospitals were still biased. Then when lab data was added to the POA model, with improved ICD coding - that is the very bottom line, only about 5 percent of hospitals still had unacceptable bias.

We see a very similar pattern for the patient safety measures, but we did not perform the same depth of analysis in terms of improved ICD coding for patient safety.

Which specific variables were of most importance here? What we found was that results of 22 lab tests entered at least one model. The results of 14 of these tests entered 4 or more models. These are the lab tests that were important and the number of models that they entered into. All vital signs entered 4 or more models, but these vital signs were the most important. Ejection fraction and culture results entered 2 or more models. The composite scores entered 4 or more models.

In terms of abstracted key clinical findings, there were 35 clinical findings that entered at least one model. Only 3 findings entered more than 2 models and that was coma, severe malnutrition, and immunosuppressed. What was really interesting was that we found that 14 of these clinical findings actually have existing corresponding ICD codes associated with them. It is only because of coding conventions and coding rules that those symptoms are no longer coded.

We also looked at the marginal cost associated with incremental additions of clinical data. The top line summarizes the hospital level bias that we just saw. The bottom two lines are two different ways of looking at cost effectiveness. We did sensitivity analysis on the cost effectiveness analysis across 3 different cost assumption scenarios. One scenario which was our high cost scenario, we interviewed clinicians and medical record abstractors and got information on what were the costs associated with abstracting specific types of data elements.

A low cost scenario was based on studies that were sponsored by PHD4 in order to see what were the costs associated with abstracting data through the outcome system. We did a midrange scenario as well. What you can see here is that cost remained relatively low for collecting administrative data, for collecting the present on admission information, for collecting the lab values, and the ICD codes but increased dramatically once we added the full clinical information. These findings really held across all cost scenarios that we tested.

So what we found was that administrative data can be improved at relatively low cost by adding POA modifiers, by adding numerical lab data on admission, and by changing our coding convention and coding rules.

In order to implement these findings and to encourage adoptions of results of these studies, AHRQ this month released two RFPs aimed at expanding the data capacities in the statewide organizations that currently participate in HCUP. These are the 38 state data organizations that collect data from the hospitals in their states and provide data to AHRQ. These 38 states comprise about 85 percent of all hospital discharges in the United States. In one RFP, AHRQ is going to be supporting pilots in up to two states to add clinical information to their administrative data. The other RFP is going to support planning efforts in up to five states who are interested in enhancing their administrative data who are not yet ready to engage in a pilot. Every state is different. There are different relationships. There are different coalitions that need to be built. These planning projects are really intended to help those states at the initial stages of enhancing their administrative data.

Let me just provide you a little bit of information about what the pilots are going to be doing. The major objectives of the pilot studies are to establish the feasibility linking clinical and administrative data in the field in the hospital, then to develop a reproducible approach that can be exported to other states, and then to set the stage for integrating the clinical and the administrative data streams in the future because we do not really see this as simply the addition of a few additional data elements to administrative data. We really see this as a way of identifying what sort of information from the EMR is going to be most valuable for quality assessment and quality reporting and then helping to specify how the analytic capacity of the EMR needs to be developed to allow for easy access to these key clinical data elements.

The specific activities that the pilots are going to be involved in will be to identify and select data elements, to translate the clinical data from the electronic format except for the POA information. We specifically want to avoid manual abstraction of data elements in these cases. They have to figure out a way to electronically transfer the data from a minimum of 5 hospitals to the data organization, process that data into a multi-house built database and during this entire process, they are going to be required to collaborate with stakeholders, hospital representatives, state government agencies and they are expected to do work with researches and quality measurement professionals with healthcare quality organizations and with regional health information exchanges of a program. Then, they are going to be engaging in peer-to-peer learning, information sharing, and dissemination in order to allow pilot and planning states to learn from one another and then to disseminate what they have learned to other states in the future.

In conclusion, what we found was the judicious addition of just a few clinical data elements can significantly improve our ability to do quality assessment using administrative data by just adding POA information, lab values, potentially vital signs, and improved ICD codings that can get close to the full clinical model from the mortality and patient safety measures that we looked at here. And also through pilot and planning contracts, we hope to jumpstart the process of adding clinical data elements to statewide hospital data. Thank you. I will take your questions now.

DR. CARR: Thank you that was very exciting, very promising, and very logical.

MS. ELIXHAUSER: We were hoping for logic.

DR. GREEN: That was gorgeous. Thank you for coming. Just a moment ago, you said except for POA data we want to avoid manual data collection. My question is why go to manual data collection for Present On Admission?

MS. ELIXHAUSER: I guess because the same people

who will be doing the same abstraction off the medical record for ICD codes, there has to be a process of converting the written text in a medical record into ICD codes. Those same people who do that conversion of text to ICD codes are the ones who will also be assigning POA information for each of those diagnoses on the record. Do you see what I am saying?

DR. GREEN: I understand it. It seems to me that that is the way the world has always worked before. It makes no sense with where we are going, and by the time your pilots are done it will be irrelevant at least if we do our jobs right. The overview of the entire NCVHS and Populations Committee and all this stuff we are looking at, surely we individuals are going to hold some of our own personal health information. Surely it is going to be in electronic format, and surely when we get admitted to hospitals someone will have the human decency to look at it.

MS. ELIXHAUSER: I think you are absolutely right.

I think that once we actually have personal information that a person carries around with them so we know when their diabetes was POA, but there are always going to be some conditions like pneumonia that are not going to be apart of that patient's health record until the time that they may admit or present to the emergency room and be admitted to the hospital. There will still be some need for some evaluation at the time of admission for certain conditions to distinguish complications from conditions that were present at the time of admission to the hospital.

DR. GREEN: My apologies for getting too far into advocacy. I just so hope that AHRQ—this is so exciting. As you move into these other pilots, I so wish that you would look at the opportunities to capture those highly personal data in electronic formats in the mode and mean in which we think we are trying to design a nationwide electronic health information structure and begin to plan and pilot for the future as opposed to planning for the past.

MS. ELIXHAUSER: I agree with you 100 percent. Even in the short term, even before we reach a point of having a real personal health record, we can improve how we link our data. For example, we can link past admissions to the current admission so that we know what conditions were present during that prior hospital stay and apply that to the current hospital study. There is not only—we should not even wait until we get the personal record, but to look at the data capacities that we have available right now and try to link those together. Thank you. I agree 100 percent with you.

DR. CARR: We have Simon and then Paul and then Marjorie.

DR. COHN: Thank you for your presentation. I fear that I might be sounding a lot like Larry. I remember coming onto my administrative roles. I have practiced medicine for many years back in the late 80's. At that point, one of my mentors was a gentleman named Mark Umberg who at that point he was able to establish in California this POA capability in the mid 80's. Now we are talking about is new and different and exciting. I guess he lived long enough to see it happen again. My question here is, are we at the point where we need pilots or are we really the planning contracts and implementation? How long does it take to begin to see some of this happen?

MS. ELIXHAUSER: There are several reasons that we are doing planning and pilots. One is that states are all at different points. They are all at different stages of decision making about these sorts of efforts. My understanding is that there are 10 states that will be adopting POA coding within the next year and we are going to see it. There are other states that are lagging very far behind that. There are still some states that do not have statewide data collection efforts. We would hope to see those are going to develop in a near time as well. Secondly, there is limited funding within AHRQ to support full-scale implementation. The best we can do at this point with the limited funding we have for this project is to provide seed money to get going on something everybody agrees needs to be done and hopefully put together the information so it can be disseminated more broadly in the future. I agree with you. It is taking 17 years to implement POA coding is just unconscionable. Hopefully things will speed up from now on.

DR. CARR: Then we will have ICD-10. Paul?

DR. TANG: I want to thank you also, Anne for such a wonderful presentation. I am also going to come to Anne's defense in terms of this, why do not we just get on with it. It is like saying, why do not we have EHR's now? There is so much to learn and so much culture to get through that we have to learn how to do it in ways that can be scaled to the rest of the country. I think she is on the right track.

I have two questions. One is just a confirmation. When you talked about the bias graphs, as you went along, when you got to lab that meant it was cumulative with the POA measures, etc. Is that correct?

MS. ELIXHAUSER: Not all of them were cumulative.

DR. TANG: Lab in particular I am interested in, is that cumulative?

MS. ELIXHAUSER: You know what it actually varies.

There are two lines for lab, the green line is lab with POA coding and the black line is lab with POA coding plus the improved coding that we are talking about.

DR. TANG: So the one question is whether lab alone would also get you almost as much since that involves human effort of that definition question.

MS. ELIXHAUSER: I do not believe we ever tested lab alone. That is something that we could do. We really assumed that POA was sort of on the threshold of being here and that given how powerful it was--

DR. TANG: Yes, I certainly agree of its values and all those things. The piece is when you talked about the cost effectiveness, I noticed that the ICD coding enhancement and POA had a very low cost, did you include the cost or pre-training all of the folks that have to do this abstraction?

MS. ELIXHAUSER: That was sort of in our high cost estimate.

DR. TANG: So even with that, it was still low?

MS. ELIXHAUSER: It is primarily because the information that you have to get is so targeted. You do not have to go through very difficult parts of the medical records. It is usually in one place.

DR. TANG: It is easy to get, but we have to retrain the entire workforce.

MS. ELIXHAUSER: We did not do a national cost estimate as to what it was going to be.

DR. CARR: Marjorie did you have something to add?

MS. GREENBERG: I wanted to add to the chorus of thanks and not only for your really excellent presentation but for doing this study. I think we heard that this study was being launched in June of 2004 or something. It is really exciting to see the results of it. It is sort of this déjà vu in the sense that in 1992 this committee, based on a recommendation from your mentor, recommended adding POA. I think we need to think about that. It is a long time to move some of these things forward. I hope that what you learn from this not only will enhance administrative data, but will really be factored into the architecture of electronic health records. Those things that really are useful for quality of performance measurement and everything need to be structured in a way that they can easily be put into that. Also, I just wanted to mention that, because you were talking about ICD-9 and I am emailing with Donna Pickett about whether she has discussed some of this with you. She is responsible for ICD-9 CM and worked on the coding guidelines as well. Some of these are coding guideline issues and we will follow-up with you about that. Some of them actually, under current coding guidelines, could be collected. We will follow-up with you on that.

DR. CARR: Thank you, Marjorie. And cognizant of our time, Carol has a closing comment and then we will get two additional very exciting presentations.

MS. MC CALL: Thank you very much. I will try to be brief. It is not a question as much as it is a request. That is, as you go forward into these pilots, two things call out my attention. One is that in addition to building something that is reproducible that it be built to be intentionally dynamic and to be a learning process. What I mean by that is we have heard a lot today about the need for standards and those are important. We also need to have intentionally dynamic processes that recognize the fact it will change. What you introduced today was third level. We have heard about data. It needs to be standardized, but then it needs to change. We heard about measures. They need to be designed and standardized, but they too will change. The third is models. I would ask that as you go through this that you think about the dynamism and the transparency needed in getting them set and then how they will change and how the process will be.

The second was you talked about engaging in a pier-to-pier learning information and sharing dissemination. I would also add collaboration and to think about brand new architectures. Architectures that include kind of the collaborative wisdom of crowd environments, information markets, and take into account what it actually means to discover something and have piers who are truly knowledgeable and essentially aggregate their opinion and consensus and to think about new architecture for that so that things can in fact proceed a pace much more quickly than before.

MS. ELIXHAUSER: Thank you.

DR. CARR: Thanks very much. Bruce?

Agenda Item: Performance Measurement and Quality Improvement – Niagra Health Quality Coalition

MR. BOISSONNAULT: My name is Bruce Boissonnault and I am President and CEO of Niagara Health Quality Coalition. I am also a publisher of myhealthfinder.com which we will talk about very briefly. There are two prongs to what I am going to talk about. The first one is our work from the very beginning as a beta test site with AHRQ QI's, which I think are a great step forward. That is a context for a view that our employers have and our collaboration for a multi stakeholder collaboration. It is probably one of the longest running in the country that is actually data savvy, and that is to move away from the notion of administrative data toward the notion of a data highway that can provide not only quality data, but also population screening. Again, the notion is that in any other industry, if you do something but do not provide the necessary metrics to the appropriate folks – I had a corporate background with Disney and McKinsey -- if you are in an operating area and you do a good job, but you do not provide the data, you did not do a good job. We are in an industry where we are not providing the data is the norm. We assume it is still a good job even though we have to take your word for it that it was okay. So, that is the two-pronged approach.

We have 2,000 employers statewide. Some of them are like General Motors and General Electric. We work with 30 health plans, government, state, regional, and federal and hospitals and health plans. We also do disease management. For those of you who start to see the estimated GFR calculation, that was our project. They piloted it in our region and expressly mentioned in media that it was us who sort of helped them sort out which of the 32 measures aught to be on the lab.

Let us jump forward. We are in a new era. People are looking at data. As long ago as 1998, we started publishing individual hospitals, risk adjusted mortality rate with the Ford General Motors hospital-profiling project where I was one of the National Advisory Panel members.

I want to launch right into this for the sake of time. I often am asked the question, do people actually use health care performance reports? It is sort of funny because while that debate was raging back in 2001-2002, we were getting up to 15,000 discrete users per hour to our website. That is up to 3 million hits a day. That is for a report on one state's statistics. Keep debating the question. I am not challenging you about that. I just think it is a funny thing to be talking about.

Public performance measurement systems must be judged by the degree to which they affect positively the status quo and as an illustration, we had the longest running patient survey project. We have 100 percent voluntary hospital participation. The hospital paid for their own surveys. We went when aggregated as one of the worst regions in the United States, and our region consistently has hospitals that are among the worst financial performers in the United States. We did not throw a lot of money at this, but I think we did this right. We did understand Pay for Performance. We went from one of the worst regions in the United States to one of the best. Our numbers have not moved much since 2004. We are continuing.

New York State Hospital Report Card, most of you know the coverage is widespread. I am going to zip through. You can pretty much go to it at myhealthfinder.com, but we typically are the front page of every regional newspaper. We get a lot of electronic when we release the report. We have credibility with the media, which was established before we were involved in health policy. We do not sensationalize the report, and on the other hand we do not let politics get it so obtuse that no one can understand it. I think we found the right balance of stakeholders.

All of you know who Don is. I believe ours was the first public report that Don had ever endorsed that had specific outcomes measures by provider up until our report.

One of the questions I think you were searching for was, why did we select administrative data? It is not based on secret input factors. It is superior for public reporting because it is tied to something and is therefore more difficult to gain. That is, it is tied to the billing information. So, if you cheat too far you are committing fraud and you are in danger of having more consequences than just a talking to. It is a sustainable data platform because it is exempt from what I consider to be the extent of secrecy provisions of the Patient Safety and Quality Improvement Act. I would caution you, it is a risky venture to not define data as billing and discharge data because it looks like provider identifiers remain a little bit at risk, especially for those outside of government. I do not think state governments have to worry, but you have to look at folks like us. We have value too. We have been publishing these data for five years and we were used by the NQF as what should not happen. What I am saying is, there is room for diversity of thought and the key to the database is not the definition of perfect measures. It is very cost effective, so I am not going to spend a lot of time on things you know.

What resources were needed to make the project worthwhile? I will just say that money is not the key factor. Because administrative data are being used, I think the key factor is integrity and independence. I do not remember who said it, but someone said it is hard to be flexible in your thinking if you are paid to think one way. We are designed to be outside of that paradigm.

When I was at Disney one of the partners that reported to me was a large segment of the data management function and so I use come of the – from industry to make sure that we don't publish mistakes in the New York Times. It is not hundreds of thousands of numbers that out there over the course of the year.

How do you use the data? We use it for all of them. I was on the Institute of Medicine, I was one of the external reviewers for the IOM Pay for Performance Report. One of the things that I am a long-standing advocate for is understanding the difference -- Pay for performance is part of the spectrum of how you change behavior, but it really is the 600-pound gorilla because it is expensive, sort of politically difficult, so you do not want to do pay for performance on everything. So, we used a spectrum on what we recommend from attaboy letters at the light end to public reporting all the way over to pay for performance. The key for us has been getting CPO's compensation tied to it. So, I would add one other comment, which is, if you can get the CPO's compensation tied to a measure, you no longer need to do Pay for Performance for it.

What interventions were triggered? I am just going to use one illustration. Again, when we published these measures I feel a little like rock and roll because when we started this, we were the cause of the end of civilization according to some of the provider community. Today, we are very mainstream. We aren't quite elevator music yet, but we are definitely mainstream. This kind of thing still happens weekly, at least. When we looked at the data, I did not go to the media, but I noticed that one of the regional stroke centers had terrible results. We were driving ambulances up to 20 minutes extra to get to this hospital that appeared to have risk adjusted results and were statistically significantly below the norm. We always approach this with some caution because notwithstanding everything else, we know we do this in a confidence interval. Meaning, even if the underlined data are perfect, this could be due to random chance. So, we put the data up, and we invited the employer leaders and all the hospital leaders and all the health plan leaders and some community leaders to a closed meeting. We said, these are the data and we are concerned about the stroke mortality statistic, because the physicians and all the people who knew the situation who were in the room became apparent that 24-7 radiology for stroke had sort of fallen off the cart due to a negotiation problem with radiology. This was a longstanding on-again-off-again problem. Long story short, it was fixed within three days via offsite reading radiology at this hospital, and never again has surfaced and the number improved. I am not going to go through them, but I could give you hundreds of anecdotes of people calling me and saying, our system is ten times more expensive than yours seem to be, designed to tell us what we want to hear. You sometimes seem to be telling us what we need to hear.

We are seeing a drop in overall statewide mortality. Obviously some of that is related to the improvement in clinical care. We also have indication that some of it is due to transparency, however the volume problem hospitals doing triple AAA or corroded artery surgery at low volumes, because New York is a place where there are too many hospitals these folks just don't want the beds to be empty. I am not as happy with the progress even though there is some on the hospitals getting out of procedures they should not do at low volumes. We have done some research and the issue is more real than I think people realize the difference between those performing a threshold. I know people realize it is big but our data suggests that it is bigger in some instances.

I will just get into one example of the savings. Again, for the sake of time, I want to make sure we hit part two. We get calls from hospitals saying, you put several stroke patients as mortalities charged against us and they were hospice patients. My comeback still is, okay if we rerun the whole report card, are you going to reimburse the people that you over bill? They said, we did not over-bill anyone. I said, this is the billing system. You bill them as ICU patients. How could they have been ICU patients when you are billing them as hospice patients? Nothing sanitizes like sunshine, even if you get missed data. I think the key here is even if the data are a little misleading sometimes, and I do believe these times and good people about their data. Sometimes good comes out of that too.

I want to move on to the macro suggestions.

Again, this I have a unique background in that I am not really affiliated with a hospital or an association or government. We are truly independent. We are structured to be somewhat independent. This is just my view as someone uniquely familiar with the details of risk adjustment and how the data come together. We do the computations ourselves, so we don't outsource this. I am not talking as someone who hires someone. We actually are hired by states asking us how to do this.

Winston Churchill was one of my favorite characters in history and a great man because he lived in a great time. The first lesson that you must learn, Winston Churchill said, is what when I call for statistics about the rate of infant mortality what I want is proof that fewer babies died when I was Prime Minister then anyone else was Prime Minister. That is a political statistic.

We do not want to run healthcare based on political statistics. This is not a—this is something that I think we all can embrace. Government, when it operates in the dark, I think it is susceptible to undo special interest pressures. So, the policies that you all create, I hope will focus on what you are doing, but I hope you do it transparently because at least on behalf of our employers that some of the government statistics that were coming out when we double checked them and understood how to run the numbers had some problems. So, measures which cannot be replicated independently based on public data sets, I believe will remain in doubt and deserve to.

Current US healthcare policy is – failing is a strong word. Not living up to expectations did not fit, but anyway I know there is a lot of different sciences and argument about who was right Denning or Sigma or whatever. Denning observed that if you focus on quality first, then overtime quality will improve and cost will go down. If you focus on cost first, then cost will go up and quality will go down over time eventually leading to loss in trust in the system. Again, having worked in corporate strategy I remember when everybody was into the lets get the cost cutting down. It is a terrible metaphor but it will work when I consulted for the tractor industry. I remember at one point for a tractor that we sold 200 of we had 36 different foot pedals. No one was counting the cost of all the inventory and all the other things it was just a short term thing.

Quality equals result of effort times cost. I think sometimes in our measurement, we confuse effort with results. That can drive up cost.

Where are we today? Today, I think what we have is this disconnected—we do not have a US Health care data highway, which I think is where we are headed. We think about creating a data highway the same way we did in the forties and fifties our national highway system. What we have is dirt roads and some paths and some unexplored woods. Each of the groups on the left provide some information of some sort into some components of what is on the right into this sort of disjointed thing, many of it after the fact. That has been discussed today at some length. What we really need is data to be a byproduct of care under the assumption that without adequate data, you are not providing adequate care. I would add that it is not the EHR and performance reporting, it is also population monitoring.

In the interim, I think there is a place for a hybrid system. How many of you are familiar with normalization? What you are talking about is building databases. What you are talking about is building databases. Normalization is to the science of database management what supply and demand is to economics? You cannot study economics. You cannot implement economic policy without understanding supply and demand. Third form normalization, I think should be required now. If you do not know what that means, if you do not know what that means just talk to one of the mathematicians involved in your data base management. That needs to be a policy thing. Needs to be billing and discharge data. I think data should be defined as billing and discharge data first. A case needs to be made that it should be secret rather than reverse. Sometimes what I see in our work is a willingness to compromise things that should not be compromised. The system of data should support the system of care, not the other way around. We are willing to accept what data exists as a real limitation. As a data guide, it is political, but the data issues are within our grasp quickly.

What could we do today? I hope there is a charge that will decide to take a look at what we really have now. I will end with this. I think by the middle of next week we could sit down with the VA's in electronic medical record and say, which of these do we have now and yellow highlight those we do not. I am not saying the VA is the end of the road, but it is a starting place. It is public. The week after that, look at population screening and do the same thing, yellow highlight what we are missing. I think you would see a lot of yellow missing data elements but at least we would sort of start on the road. In week three, I think you could take week three off. I don't get any funding from ARC but I really felt have built the best possible performance reporting system for hospitals and some community wide measures that can happen with the available data. So, there is more in here, but I will stop now for the sake of time and just end with I really went back and forth on whether to get into some of our data issues because we are very data savvy but I thought this was more productive.

DR. CARR: Thank you, Bruce. I appreciate your very detailed slides. They were very helpful in your overview. I think what we will do is if there is one question or if we follow your segue way right into VA, I think that why do not we go ahead and get right back on track. Thank you.

Agenda Item: Performance Measurement and Quality Improvement - EHR

DR. EISEN: While Cynthia is getting this set up I will talk a little bit about myself. I am an internist and rheumatologist. I have been at Washington University in St. Louis since the late 1970's and have been at St. Louis VA since the early 1980's. My areas of research interest are focused on physiatric epidemiology. Because I started at the VA in the early 1980's, I was there when the VA was still an all paper system. I have lived through the transition from the all paper system to an all-electronic record. The difference is really quite remarkable. Because the VA is an all-electronic medical record system, the databases are used extensively by our researchers for a wide variety of our research projects.

So just a little about the VA, itself. The VA is one of the largest health care system in the United States. There are 1300 facilities across the US, 153 medical centers and 105 of them are affiliated with academic institutions. You can see that we train 81,000 professionals, support 9,000 residency positions. There are some freestanding counseling centers particularly for individuals who are associated with emotional problems from the Vietnam War and from the first and second World War, all of the wars. I think the one aspect about noting the large sizes of the VA system is the potential for having substantial influence, if only by frame.

There are about 6 million veterans, maybe 20 percent of them still come to the VA as users. They are getting older although the recent wars is decreasing the age. They are primarily male with a small percent of female. It is predicted that by 2015 it will be 12 percent of female users. Those have a lower sociodemographic characteristics. Our patients tend to be complex with many psychiatric as well as medical comorbidity.

The name of our record is called VistA, Veterans Health Information Systems and Technology Architecture. A brief background, it was actually the when the VA started getting computers in 1969. In the late 1970's investigators started creating VA databases for their medical center administrators. There is a big advance that occurred in 1982 when the congress endorsed the development of the VA development computer systems. In 1985, a hundred VA's started using computers through various administrative functions and by 1999 what we call CPRS, Computer Patient Record System, was introduced. The major events of this was that it had an expanded demographical user who we interface which really made it much more widely acceptable. Remember when this was first introduced at the St. Louis VA. The physicians were initially told they could use it or not. It was up to them. It was never said that at some point they would have to begin using the computers. I think everyone knew what was coming. After a year or so, it happened. Paper records stopped appearing in clinics. Since I am a pretty good typist, I was one of the early adopters. Within the VA system, there are some of the older physicians aren't typists so there was a lot of resistance for many many months. It is just now you do not hear of it any longer. The younger physicians all are used to typing. It can be time consuming, but that is the process that is used.

Also initially, when use of the system was mandated, there were times when the computers were pretty slow. They crashed, and so I would say there was maybe a year and a half of period where there was a lot of unhappiness for the computer system. Eventually, a better equipment was installed. Rarely it crashes.

Virtually, it has all of the medical record, all vital signs, all hospital discharge diagnoses and procedures, all outpatient diagnoses by ICD code, progress notes by all the health professionals, all orders, all lab test results, all radiology reports, and also all x-rays are digital. The quality is not as good because of the lower resolution of the monitors in the individual offices, but it is commonly satisfactory for what individual physicians do and it can be useful in illustrating to patients of what is going on and the concerns. All consultation requests and results, all pathology reports, all medications since 1997, procedure consents, Medicare data (for research), and all data is available from any place in the VA system. This was dramatically demonstrated with the event in New Orleans and the Southern Coast when people were displaced including many veterans. Their records could be picked up almost as if at they were at their home medical center.

So, this is a screenshot. Some things to note are, these are the tabs. This is the face page. These are the active problems. This is allergies where there is a specific note citing a description of what happened. This indicates that the patient sometimes has behavioral problems. The VA has an extensive list of clinical reminders, most of the patients I took care of do not have many of the clinical reminders. The physician would click on this and see what procedures are necessary. This indicates that there are lab results. These are appointments, both past and future. This is in the so-called coversheet.

This is the list of problems with some detailed information about the activity of the problems when they were first diagnosed and when they were most recently dealt with medically. I should point out that there is this remote data available. This is one of the ways that the clinician can gain access to data that is remote. Something like a 20 percent of our patients are seen at more than one VA medical center. Patients do travel to Florida and spend winters there and summers in other climates. We have patients who use medication abuse and they go to different medical centers to obtain pain medications, and this would be available in this mechanism.

This is the medication, the list of medications and the status. This is non-VA medications and this is in-hospital medications. When the patient is hospitalized, this window expands and this window is narrow. These windows are moveable with a mouse. This is the order window. There are also summary windows.

DR. CARR: In the interest of time, I want to make sure we get to a little bit about how we use this and how we are better for it. Obviously, it is phenomenal. I do not want forgo the kind of what got better with this as we are describing the many goals.

DR. EISEN: These are some of the laboratory

windows and how it can be presented. One of the things that has been implemented as a result is these reminders that I described briefly earlier. This is an example of reminders for people that cult blood. So, for example with the implementation of this reminder, the appropriate follow-up for colon cancer screening has decreased substantially. This is our VA corporate data warehouse. This is a bit of the ideal. Eventually, this will be implemented on a national basis. It has been implemented regionally. Several regions, and you can see the sources of the data goes into a data warehouse which is made available for both research and non-research purposes.

The data has been used expensively to address a number of issues. For example, rational disparities within the VA so that changes and programs can be implemented to improve them. There are a number of disease based studies that have been published, for example evaluating the prevalence of psychiatric disorders, the quality of care associated with psychiatric disorders, psychiatric comorbidity, particularly depression are major areas of interest to the VA.

Quality of care standards in terms of utilization of medications for treatment for hypertension and appropriate control of hypertension. Pharmacal vigilance is a major area of investigation within the VA because of the VA's integrated system and because almost all of our patients get their medicine through the VA because of the economic advantages of doing that. We are finding more of our patients are receiving medications outside the VA system. There have been studies demonstrating the use of reminders in improving immunization, success, and particularly with the flu immunization implementing improved methods for monitoring dermatologic disorders. There is a major issue within the VA of access to care. One of the specialists in short supply are dermatologists and the issue is how do you get the dermatologists and patients together. It is typically very challenging. VA researchers have evaluated using teledermatology. Teledermatology is relatively easy to implement within our electronic medical records system with the photographs becoming part of the electronic medical record. Psychiatric consultation has also been a remote psychiatric consultation that has been effectively implemented as well.

The strengths of the data is that for both organizational evaluation efficiency certainly facilitates medical care. I think that the major issues related to the VA's electronic medical record is that because it is so large now, there is an inhibition on innovation. Terms of improvement of the medical record are in part because of data security issues that have occurred with the VA's and the database over the last couple of years. There is some research we consider that they watched some control over. The electronic medical record and responsibilities are increasingly transferred to the technocrats. Depending upon what your interests are that can be considered to be a really good thing from a researcher point of view and from innovation. I think it is a difficulty. I think another major issue relates to the fact that a lot of the data in the electronic medical record is in free text form. Investigators have certainly done simple stream searches with the free text data. That is quite feasible but it is very limited. One of the research programs that are now being developed within the VA's research division is to develop and encourage a programmatic goal of encouraging researchers to do sophisticated text searches because of the incredible amount of information that is locked away in medical records. Not only in progress notes, but there is no ready way of getting access to radiologic report results, pathology, etc. Finally, because of the existence of large data sets, it increases our vulnerability to data loss. Just a simple hard drive you can carry in your pocket can contain millions of pieces of identifiable information.

DR. CARR: Thank you. I have a question which is, what advice would you give today as you are building electronic health records outside the VA system? What would be the three things that you would advise to make this system flexible and usable in ways that the VA has not.

DR. EISEN: I think that from the very beginning that a mechanism has to be developed that so that the data that characterizes the patient which goes far beyond just the administrative data is readily available to access. It can be done within a way so that it is putting it in a formatted process which makes it easy for the programmers. The health care providers and the health system have this sole variable that I think that there is really no single—it is not feasible to create a formatted method to collect all data that researchers and organizational individuals would want in order to be able to adequately evaluate the quality of care. I think it is necessary to develop some sort of sophisticated pre-text search methodology as an integral part of the record.

DR. CARR: Okay, as opposed to structured fields?

DR. EISEN: I think structured fields are also relevant, but the structured fields will not cover everything that is going to be necessary, some of which we know already.

DR. CARR: One other question, from your problem list, do you use ICD-9 codes?

DR. EISEN: Yes, the healthcare provider is presented with—the healthcare provider cannot close out the record until he or she goes out of the problem list. The problem lists were narrative in structure. Behind that is ICD-9 codes. The quality of the diagnoses varies typically with the diagnosis. There are some that have very low validity and others that have high validity. There have been a number of studies that examine the validity of ICD-9 diagnoses by doing detailed evaluations of the medical record with reasonably predictable results.

DR. GREEN: Process question, I want to thank you

Seth for making a point of that. Structure data is not quite sufficient. Bruce, going back to you, you had a slide that you did not show that we have in our handout here. It starts out with, without HHS Leadership—there are several things there. I want to ask you to comment about one of your bullets that says, made populations based surveillance mandatory as a byproduct of care. Could you say what you mean?

MR. BOISSONNAULT: All I am saying is everyone is

focused right now on electronic health records. I think as we define what a data superhighway might look like, we should think not only about quality measures, cost measures, and the EHR, but also the population based screening. What brought it to light for me was work I did with the IOM on safety medical devices for children. That system does not need to be totaled. It could be a byproduct of care feeding the information when you have a device that fails should be a byproduct of care.

DR. CARR: I think we are not too far behind schedule. We will break for lunch now and resume then at 1:30 back in this room. Thanks very much.

[Whereupon, the meeting adjourned for lunch.)


AFTERNOON SESSION

DR. CARR: Good afternoon. Welcome back.

We have an equally exciting afternoon panel of speakers. I would like to stay on time as much as possible. We are starting on time now, at 1:30. We will the same as this morning -- look to have 20 minutes of presentation, followed by 10 minutes for questions.

Seth, thank you for coming back this afternoon, wearing another hat -- same hat, but different topic. I will turn it over to you.

Agenda Item: Performance Measurement and Public Reporting - NSQIP

DR. EISEN: These slides were provided to me by Bill Henderson, who is the director of NSQIP, based in Denver, Colorado. I made some modifications, so I share responsibility for them. Bill Henderson has been affiliated with NSQIP for at least the past 15 years.

The basic goal of the National Surgical Quality Improvement Program is to develop a standardized methodology that will permit evaluation of the quality of surgical care within a medical center, within a medical center across time, and across medical centers. The key points of that are that it is a standardized methodology that is collected independently, for the most part, of the surgeons and the other staff who are actually doing the procedures, and it can be used to effectively measure the most critical outcomes -- surgical morbidity, mortality, length of stay, complication rates.

It provides patient risk-adjusted surgical outcomes to surgical programs that permit evaluation with other programs. The data collection has to be reliable and believable. That means that, for the most part, it is collected by individual nurses who are trained and committed to collecting the data in an unbiased fashion.

It empowers surgeons to review the quality. This is presented to the surgeons in a supportive manner. It's critically important for them to use this information, of course, to identify where the problems might be in terms of the surgical quality and to figure out what to do about it.

The NSQIP data is primarily intended for programmatic uses. It's not really intended to provide feedback to individual surgeons. In that way, it hopefully is less threatening, at least to individuals, although it could certainly be potentially threatening to programs.

NSQIP develops performance measures for surgery used by the program administrators. It also maintains a registry of major operations and makes this data available to researchers. One of the important byproducts of the process -- not only is it internal for use by the program surgeons, but it's also external and contributes to the overall knowledge about what is necessary to provide high-quality surgical care.

Just a brief history. In the mid-1980s, Congress mandated the VA to compare their surgical outcomes to the private sector. The problem at the time was that there was no methodology for assessing surgical quality, not only within the VA, but there was no method for assessing surgical quality outside the VA either.

A couple of years later, the VA awarded a couple of health-services research investigators to develop risk-adjusted quality outcome data for cardiac surgery. Initially, they focused on administrative data, but it became clear that administrative data just wasn't sufficient. The administrative data they wanted the most commonly didn't exist, and the data that was available didn't provide the outcomes that they were interested in.

In 1991, there was the start of what we consider the National Surgical Risk Study. By 1994, there were 132 VAs that were participating. AHRQ also joined the group, and now there are a number of non-VA hospitals that are participating, and the American College of Surgeons is encouraging surgical programs to participate nationwide.

The primary groups involved -- the greatest interest is in major operations that require general, spinal, or epidural anesthesia. But minor operations are also of interest, although those that are well-known to be associated with very, very low morbidity are generally excluded. For some of the more common operations, such as TURPs and inguinal hernia repairs, while this data is collected for these procedures, the number of patients on whom data is collected is limited.

Finally, for high-volume programs, there is a limit to the number of cases for which data is collected. But within the higher-volume ones, an appropriate systematic sample is taken, with the intent that the data collected is truly representative of surgical procedures.

I have been told that some surgeons can game it a bit. But I have also been told that it is difficult, but possible.

A number of risk factors are collected in a standardized fashion. Nurses who are committed to the program collect the data independently from the medical records. These are the variables that are collected in preoperative risk factors and the basic laboratory values. Also data is collected about variables associated with the operative intervention and postoperative outcomes -- vital status, length of stay, whether or not the patient has to return to the operating room, and complications. All the laboratory data is automatically downloaded without further intervention. It's downloaded into the database.

There are statistical programs that have been developed. The most important are the mortality and morbidity associated with the surgical procedures. It's put into sort of a standard observed-and-expected-events kind of evaluation.

This is an example of some of the data analyses that have been performed. This is a 30-day predictor of mortality and overall morbidity. This is a ranking of the risk factors. You can see that for both mortality and morbidity, serum albumin is actually the most significant. This is a surgical class evaluation. You can see some of the characteristics for mortality -- disseminated cancer, emergency operation, age, renal insufficiency, et cetera.

Several feedback mechanisms have evolved. There are quarterly reports, which focus primarily on the observed/expected ratios for the various operations. There are more summary annual reports. Because they have more data, they are more accurately representative of the surgical morbidity and mortality. Chart audits are standard by the NSQIP nurses for patients who experience unexpected adverse events. Occasionally, surgical programs ask for onsite evaluations to help figure out why certain goals are not being attained.

This is an example of the unadjusted 30-day mortality rate for major non-cardiac surgery. This is a function of time from 1996 to last year. You can see how there has been a progressive decrease in programs that have been participating in the NSQIP process.

This is unadjusted 30-day morbidity rate. You can see, shortly after the program began, there was a marked drop, and now it seems like it's a fairly flat line. There seems to be a more dramatic impact of the program on mortality than on morbidity.

NSQIP, as I mentioned, is also available for research use. There are now over 1 million cases. So even rare surgical procedures can now typically be found in the database. There are standardized procedures for applying for access to the data. Over 100 scientific publications have so far resulted.

The publications are in a wide variety of areas. This is an attempt to summarize the scope of the publications. Of course, mortality and morbidity is a primary outcome, but also the relationship between volume and outcome, surgical outcome, and teaching versus non-teaching hospitals, and modeling risk factors for various operative procedures.

There have been a number of articles published about specific risk factors and various complications and surgical outcomes in certain comorbidities or subsets. Again, because of the large number of surgical procedures that have been collected over the last 10 years, it's feasible to do this sort of evaluation.

This is a summary of some of the highlights of some of the research that has resulted. There is good evidence that a strong level of feedback and programming to the surgical participants indeed does have an impact on morbidity, but there is not a significant impact on mortality. Laparoscopic cholecystectomy has been evaluated. With the introduction of laparoscopic cholecystectomy within the VA, the indications for actually performing the surgery apparently have not changed, because the volume has not increased, whereas in private care, the number of laparoscopic cholecystectomies has increased.

Serum albumin has continued to be by far the single most important predictor of surgical morbidity and mortality. Surely that reflects the underlying health of the individual who is undergoing the procedure.

There is no relationship between surgical volume and risk-adjusted outcomes in eight major types of operations within the VA system. Not surprisingly, administrative data is not nearly as useful in terms of evaluating surgical risk factors as the systematically collected NSQIP data. This helps ensure that NSQIP will continue. They have proven their initial justification.

NSQIP reasonably predicts postoperative morbidity and mortality, both in VA and in non-VA hospitals. Also of interest, postoperative morbidity and mortality is higher early in the academic year compared to late in the academic year. So try not to have your surgical procedure in July or August; wait until March or April, at least if you are going to an academic facility.

It does cost money to collect this data. Whether you think it's costly or not depends on your point of view  about $40.00 for a major surgical case done within the VA. Compared to other surgical costs, this is a small amount. Presumably, the data that is collected and its impact on quality of care offsets this cost, although, as far as I know, there has been no analysis to demonstrate that that is actually true. Of course, how you cost the benefits depends on how you cost the outcomes. If you include patients being able to go back to work and their satisfaction with care, et cetera, the cost of data collection relative to the benefits decreases.

This is the surgical cost at an anonymous VA, but it's a real one. This is another example of the kind of data that can be collected and analyzed from NSQIP. This is one modest-sized VA, the accumulated data over the last several years. Not surprisingly, there is an increased hospitalization cost associated with postoperative complications -- infection, cardiovascular complications, thrombotic complications, and respiratory complications.

This is an article that either was just published or is about to be published in JAMA. It is looking at the collected data. The investigators looked at the relationship between hematocrit at the time of surgical procedure and the subsequent postop 30-day mortality. I believe surgeons try to transfuse to a hematocrit of 30. But the data from these investigators indicates that there is an improvement in surgical mortality transfusing preoperatively at least to a hematocrit of 35.

Of course, this is a retrospective study and would likely not be the basis for introducing national policy. But it's this kind of longitudinal data that provides the basis for justifying more expensive prospective randomized studies.

Overall, NSQIP has been around for a long time now. It's a well-established, well-oiled machine. I think there is good data that it has been effective in helping surgical programs evaluate their quality of care, and when it doesn't match their own standards or standards by comparison to other surgical groups, to encourage them to try to figure out what the problem is.

Another indication of success is its widening use nationally.

DR. CARR: Thanks. That was a great summary.

I think it has taken on very much nationally. Many programs outside of the VA are now taking it on.

One thing that strikes me is that we heard this morning about the importance of physician involvement in the Pennsylvania project, that without that buy-in, it doesn't have credibility. Now we are hearing it from the other side. It began with the physicians developing what they wanted, and they have tremendous buy-in.

I feel, as we hear the different themes of the day, here is a great example of an incredible embrace of this within and outside of the VA, and very, very important data coming out of it.

It's interesting. What you were talking about as sort of administrative data would be other things added in the labs. It makes me wonder if there wouldn't be a way to ultimately capture some of the things in NSQIP, even though NSQIP began by saying administrative data is not sufficient. But with the improvements that we have heard about, the refinements, you wonder if there isn't a middle ground that we could come to.

Questions?

DR. EISEN: It sounds like you are raising the question -- potentially, with the increasing use of the electronic medical record, the risk factors that have been identified can be automated, rather than designating a person to actually physically collect the data.

DR. CARR: Yes, I think that's right. Also what we heard this morning from Anne is that there is logic that can be embedded into administrative data sets. If you have pneumonia and you are here for elective hernia, you probably didn't have it on admission. Some of that sophisticated logic, I think -- I am not saying we are there yet, but I see two parallel universes beginning to intersect.

MS. MCCALL: First, a comment. What a lot of this suggests -- you made a comment at the very end, with the new paper that -- did you say it's soon to be published?

DR. EISEN: I have seen the preprint. I don't know whether it has been published yet.

MS. MCCALL: There is a kind of circle of life here. We have been talking a lot about practice. Yet this is something that actually begins as research and, through what might be called a pilot study, might suggest prospective research. It's not enough to actually be used to sufficiently change practice. But the prospective, then, could ultimately get put into an electronic record that had some automatic data capturing, once some of those decisions have started to be made.

What it seems to suggest overall -- and I would like your comment on it -- are different approaches and policies around intentionally creating the research-to-prospective-study-to-decision-to-policy move. I would like your thoughts on that. I would like to know if you think that there are things that you would recommend to take this to the next level, to close that loop.

DR. EISEN: I don't fully understand your question. To some extent, research certainly can drive the elements of data collection in a database like this. But researchers are so creative that I think that the greater the flexibility in a database, the greater the potential for really getting informative results.

MS. MCCALL: Are you talking about a research database or --

DR. EISEN: A database such as this.

DR. GREEN: I would like to ask you to teach us a little more about two of your slides. You showed a slide that showed trends in adjusted 30-day morbidity rate, which, for about 10 years, looked like they are insensitive to anything. It was pretty much a flat line going across there.

DR. EISEN: Yes.

DR. GREEN: While mortality was going down. Later on you talked about selected findings, that in surgical services with a high feedback in the program, you had a lower morbidity observed-versus-expected ratios.

DR. EISEN: I'm aware of the conflict.

DR. GREEN: It sort of looks like it flipped.

DR. EISEN: I can't adequately explain why those two observations were made.

DR. GREEN: Do you know anything about what those high-feedback programming events were?

DR. EISEN: I don't know the details of how the surgical programs went back to their participating surgeons and provided them with feedback information.

DR. STEINWACHS: I will take you off in another direction. Since you show data like this to researchers -- back when Congress mandated that the VA do this, which was probably the era when Jack Winberg was producing all the small area variations, which he still does, around surgical procedures and so on. Today the issue is talked about in terms of comparative effectiveness of alternative treatments.

Has there been anything done using this -- you are looking at a set of severity measures that could be applied to a person who doesn't get surgery, even though they are potentially eligible, or could be potentially applied to trying to -- if you could put a denominator population on it, in terms of high surgical rates adjusted for specific risk factors versus low. Has anyone tried to use this severity measure to broaden out and deal with the concern about whether you should or should not have operated or whether it's timely, and so on?

DR. EISEN: As far as I know, the data collection does not permit examining that kind of issue -- that is, broadening the basic denominator, having a surgical procedure. No one has looked at the advantages or disadvantages of undergoing a particular procedure versus not undergoing a particular procedure.

MS. GREENBERG: Thank you for your testimony. I am going in yet a different, more techie direction, back to your pre-lunch presentation. You mentioned how there is a lot of free text in the records. I just wondered if you could tell us whether the VA is in the process of implementing a structured terminology or some interface to that, what the status of that is.

DR. EISEN: There have been some initial attempts to address the issue of free text. In one recently published article, an analysis of free text was performed to evaluate the quality of examination for posttraumatic stress disorder. So there are those skills and interests.

This has not been a focus of VA research, but I think it's a very important one. The VA has researchers in informatics scattered in various parts of the country. One of the locations is Indianapolis, also in Salt Lake City and in Ann Arbor, and other sites as well. One of the issues is, can we somehow join this intellectual and experiential group into a coherent and focused research effort?

So I would hope that that would go forward beginning in the calendar year. How long it might to be to have some initial useful results I don't know. But my guess is that this is an area of research that would continue for many years, unless the VA decides to abandon its free-text structure, which I think is doubtful. At least there is nothing on the horizon right now.

MS. GREENBERG: Thank you.

DR. CARR: Thank you very much, Seth, for doing double duty. We appreciate it.

Now we will hear from Michael Lundberg on state initiatives.

Agenda Item: Performance Measurement and Public Reporting - State Reporting Initiative

MR. LUNDBERG: I am Michael Lundberg. It is a privilege to be here today to speak with you.

While the title has to do with consumer health transparency, actually I want to talk to you about the underpinnings of this -- or PSP. For those of you who have teenage sons or daughters, I am not talking about the game console. What I am really talking about is politics, science, and public reporting. Without all three of those combined, it's very difficult to move forward.

In order to do that, I need to give you a little history about the organization. I would like to talk to you about some of the guiding principles that we have had through our years with things that we have done. I would like to talk to you about some of the specific things to do. While I do a little round robin about some of the things we do with HMOs and others, I want to stay focused on hospital reporting. Then I want to talk a little bit about the direction we are moving, as well as what we are seeing on the national landscape.

For those of you who received our PowerPoint presentation, it's rather extensive. So while I am going to try to avoid "death by PowerPoint," we will go through this as quickly as we can and hopefully give you a little background and maybe stimulate some questions.

Our mission statement has been unchanged for 14 years. We are around in order to create health information, in order to help Virginians make more informed health-care purchasing decisions, consumers and business, and to enhance the quality of health-care delivery through the information that we provide to hospitals and physicians and others.

We have been around since 1993, which is something that we are pleased to say to start off with. We are in a political atmosphere. There are all types of things -- not only public reporting, but funding. There is also vitality and a changing landscape. We have been fortunate to have the support of the General Assembly, our health-care stakeholders, and others throughout the process, to help us move forward and do some things that we are proud of and do some things that we want to change and improve.

We do work through contracts with the state health commissioner, private contracts, sales and services, and other government agencies, here in D.C., as well as in Richmond and others.

I work for a board of directors. These folks are nominated by their trade associations. I want to point out, that is a very important thing. While you can have gubernatorial nominees that look and smell like a business representative or a hospital representative, it's critical to actually have folks that are nominated by those associations, because the decisions that are made on this board affect their direction. So these are done in order to honestly give their input and help guide what we do.

When we started out in 1993, we were 100 percent funded by general appropriations, which is a four-letter word, which means taxpayer dollars. We were around 12 percent in the last fiscal year. We will be up a little bit more because they did provide some funding for transparency, which we have been towards for a number of years. They did provide some funding for additional information. I will talk about that.

For the balance, provider fees help support something that is called the EPICS system. It is, I think, the first time there has been a hospital/nursing home mandatory surgical center efficiency and productivity ranking. I have copies of some of those. That is called special dedicated revenue. Basically, they pay for the privilege of us using their data. It's a nice position to be in. But it also supports them in many different ways, both through contractual negotiations with the Anthems and the Medicaids of the world, as well a with their own internal quality improvement and performance improvement.

Products and sales I will talk a little bit more about. That has to do with licensing the databases, special services. We work a lot with Anthem Blue Cross/Blue Shield in their multi-state pay-for-performance program, and there are a number of other things that we do.

The information we get:

We have a series of performance measures that we have put out over the years on hospitals, coronary care mortality, 30-day readmissions. We have actual/expected length of stay and charges for different service lines. We also have information on obstetrical care. We tend to put these things out in consumer guides. So while we do have reports within those, it's primarily an educational process.

I think there may be some copies of this. We are on our third version of this now. This is the second version. It is hospital- and physician-specific. The first ones were pre-Web, pre use of Web. This is a hybrid, in that it was published both on the Web and in hard copy. It has rates of cesarean delivery, length of stay by physician, about 600, and by hospital, about 90 in Virginia. We are in the process of updating that information as we speak.

Nursing facilities, CMS quality measures, which we applaud -- efficiency ratings, private dollars per day, Medicaid participation, profits, HMOs, quality and satisfaction measures, which a lot of things are actually being based on today.

So we have a number of things. What really guided us back in 1993, when we decided how we were going to decide what to do -- again, we were funded at a rate of $300,000 in 1993. That was the big year. We are today a swollen and bloated staff of six and a half. We are good to seven, and we are working towards that.

But what was important was to have things that would be meaningful to folks, and we could translate that as affecting significant portions of the population, which is really why we started with OB as the very first one, because that is about one in four hospital admissions. Actually, it starts out as one in eight where the mother is admitted; one in four where there is a discharge. So it depends on how you look at it.

Cost should either be a high-cost condition or the total cost as far as a burden to the society.

Variation: The outcome of interest should actually have variation, or why bother to look at that? If everybody is all the same, then you just say it's great and you pack up.

The ability to adjust for severity has always been a key and driving force. That is particularly in which we have been limited primarily with administrative claims data and then the tweaks, I think, that someone was mentioning -- the ability to use the data in different ways. We happen to exclude hospice patients in certain cases. We exclude certain transfers that are very, very high-risk, expanding the number of diagnoses, secondaries, from nine to 24. All those things are intended to enhance something that was not intended for measuring outcomes, but is being used nonetheless.

In developing the report, I really want to stress something that we learned, which is that collaboration is important. There have been instances where people have sat in a room and designed a report and published the report and come out and they have done that once. You can usually do that once. But you lose, because you don't have involvement by the folks that are involved with the process, who know the data as much as possible, who have the links to the support that you actually need for this. So collaboration is key to this.

It takes longer, but it can result in better information, because you have the input of all the different folks. They ask questions that you could never think of. They know this information.

Good science: We tend to work with health-services researchers. I mentioned the very swollen and bloated staff of six and a half. That does not include six Ph.D. statisticians or physicians. We tend to contract with folks like that, either at the University of Virginia or Virginia Commonwealth University's Williamson Institute. We find and work with the folks that can lend their expertise, so as to help us make things as good as possible.

Surprises -- just don't have surprises. You can do that by keeping people involved. When you work with physicians, when you work with hospitals, HMOs, nursing homes, whatever, you keep them apprised of the process. It is hard to do this, and it is slower. We are in the process of returning outpatient surgical records to 2,600 physicians in the state of Virginia. Most of them have never heard of this because a lot of those were reported by the hospitals and/or surgical centers. So you have interesting discussions with them when they see this.

But it's important to keep them apprised so they are not surprised. We work with them, and they know where we are going. There are no surprises.

Follow-up: Every time we publish cardiac care mortality rates and readmission rates, we will hear from the facilities. It's not usually the physicians who look better than expected. It's normally the folks that are not as good as they think they should be. There are challenges. They would like medical record listings. They want the statistical properties. They want everything that we have done on this.

Although we publish most of this, we will follow up on each and every one. We will give them the listings. We will give them whatever. We will sit down with our scientists and go over them. We have never had an issue which they didn't understand and didn't appreciate and then didn't respect the approach. There may still be some issues, some concerns, but that helps keep everything level.

I started off with collaboration. It's absolutely critical, especially with a small organization. Here is an example of good collaboration -- people sitting down, looking around, working together. There are no surprises here. They are listening. They are paying attention.

Here is an example of collaboration that is not so good. If you don't work with people, they will work over you. So it's really, really critical that when you work with people, you are honest and you are open and sincere. Yes, I have been the cat; I have been chased from the cake.

The data is easy to get hold of today. That is one of the things that we hear about administrative claims data. It is relatively easy to get hold of. It's true. But it's not so easy to use. Collaboration is important. You can be guided in the right direction by working with people. I have never not had someone work cooperatively with us, in everything we have ever done, if you are honest about what you are going to do and you are honest about your approach.

Very briefly, Anthem has a pay-for-performance. It's in multiple states. We have nurses do the medical review. We have Web tools to collect the data. Nurses do the evaluation. They used administrative claims data in the past. Now they are doing workforce hybrids, some primary data collection.

We have a series of consumer publications. Some of these are over on the desk. We tend to go print and then online. Some of the things are just online -- anything from HMOs to hospitals to cardiac care.

I mentioned this earlier, the efficiency and productivity. Essentially, this is not a consumer product. It is designed for large employers who purchase care. It does have contractual allowances, which is another way of saying discount rates. It has the profits. It has charity care and others.

The reason I mention this is that we use some of this in some of our publications.

We rank hospitals in their area by their cost per day. We are adjusting this using the APR-DRG severity index. We have service lines and other things that I will show you, too.

Cardiac care: Again, it's an open process. We work with researchers. It took a number of years to get people to accept this process. We are currently working towards expanding to 30-day mortality by linking data from final records. We use a modified approach to the APR-DRG risk of mortality and then severity index for the readmissions.

That is about one in seven hospital admissions, so it is significant. Cardiac care, medical cardiology, is the single largest. There are about 6,000 in Virginia. There are about 860,000 admissions. There are about 30,000 angioplasties. So we are looking at volume here. By wrapping all this under cardiology, cardiac care, we are providing information that is more significant for the population.

Essentially, when you go through there, you would pick it by the service line and the region and the hospital, make a report. Essentially, this is a consumer focus, where you would start off -- what you see is really what the readmission rate is. If you click on "Show detailed view," it will then take you to what the actual and expected LV ratio is. That's what is reflected in the consumer report thing.

We also take the medical cardiology and break it down into subgroups that people can understand a little bit more, like heart attack or AMI and other things.

The service line report grew out of what we do for our industry report that has length of stay and charge information. Consumers have said that they like to know what percentage of hospitals do -- they are looking for areas in which the volume is significant.

I think we all know that volume isn't always important, but it sure doesn't hurt.

So they have that. That is the consumer version. We put that up because the consumers said they wanted it, which surprised here.

Here is a version from the CD. This actually started out as a 1,200-page report. Now it's 34 pages, and everything is on CD. That is not consumer information, but it's something that researchers and providers and businesses use that takes the information and gives you the length of stay and the charges.

I just wanted to show you the different flavors.

I mentioned briefly outpatient surgery. We collect seven procedure groups that are based on their volume, their cost, their actual and perceived risk, their likelihood of moving to the outpatient basis -- things like colonoscopy and laparoscopic surgery, facial surgery, which actually gets you into the retail market of physician services, as well as liposuction and others, knee surgery and others.

A focus group that we held in a private company asked the five most important things they would like to know. So we geared the way we will be flavoring this thing based on those things. An example is laparoscopic, which is a wide variety of procedures. This is written at a sixth-grade level, which is not easy. You can't use the word "abdomen." You use the word "belly." How many people cringe when they hear that?

The point is, if you get down to those levels for consumers, you are tailoring it to something that is very easy, very short sentences, few syllables. We will tend to allow people to get more information than they want.

They are interested in the risk. They are interested in how it's done, why it is done, and others, and recovery.

If you were looking for it, we tend to use this flavor. In this case, you would pick information by an area, pick your procedure, and then from there, in this case, physician's office. I skipped a step just to try to make this a little shorter. This is the type of thing. You would have been presented with a list of physicians that met your criteria. You would pick the one you want, and you could hear how many they did in their office and their average charges, compared to the minimum and maximum for other physicians. If they did them in a hospital, it would actually let you know that they did them in a hospital. Then you could link over to their performance within the hospital. Now you would compare the charges with hospital charges. They are very, very different.

I mentioned the EPICS. We have information on contracted discounts. That's the difference between the gross charges, which you typically get in administrative claims, and the payment amount. So the gross minus that is the discount rate. We have that for Medicare. We have that for Medicaid. We have that for all commercial wrapped together. We have that for all other, which happens to be primarily self-pay. We don't know what United's discount rate is. We know all commercial together. But we separate those.

It's an important addition to what we are doing. It's the first time we have done it. We have been working with hospitals and others about using this information. We just try to do that carefully. But we are trying to get consumers better information.

That discount is based on their entire book of business. We don't really know that their discount rate is 46 percent for inguinal hernia. We know for the whole book of business.

When we field-tested this with consumers, we really didn't know what they would think about it. They said this was a lot better -- they understood the retail world. They understood the difference between a car sticker and what you pay. If they know the discount, then they can do the math and they can use it. It gives them information to discuss further.

We are working to change our website to a consumer health portal. Right now we serve business, consumers, and providers. We try to do everything for everybody. We have information that is geared to all those. We are focusing primarily on consumers. We will port the other folks off, not to another site, but another section. So it's primarily with consumers.

Consumer interest in price and quality goes well beyond hospitals. A sixth-grade reading level makes the information easy to find. Integrate information from other sources -- as far as I mentioned, with contractual allowances, but also the CMS websites and other websites that have been vetted for different groups that we have.

Other languages -- are they compliant? That is another whole deal all by itself, a lot of effort associated with it.

For those folks who don't have access to the Web there are still quite a few of them -- we have always had, for a number of years, a toll-free number.

Later this year or early in 2008, we will have another consumer guide to obstetrical care that will be all this and more. It does have years in practice. It has information on board certification for physicians. There is a pretty detailed survey on hospitals, on their capabilities, on what they can do as far as taking care of different levels of babies, their educational process, and their breastfeeding and others. We are looking to update that. We will also have rates of cesarean delivery, length of stay, and charges.

We are looking towards incorporating some of the AHRQ trauma indicators, which are also very similar to the JCAHO indicators. We are also looking at episiotomy rates to include on this. That would be hospital- and physician-level.

That takes just a little while to do. There are a lot of issues on episiotomy and some of the other ones.

Cardiac care and 30-day mortality -- we are linking that information. We apply the risk-adjustment factor.

We are looking to test independently AHRQ indicators for public reporting. We are waiting for the feedback from the NQF. We are looking for them, hopefully, to do a great job of doing some more work for us.

We are looking for the portal.

The other thing we are very excited about is the present-on-admission, lab values. We think those are realistic to be working on. The other clinical information is going to be more difficult, because it's harder to get those electronically.

I think you are probably aware that the all-payer data, primarily administrative, is available in 48 states. This actually complements an organization that I have worked with in different capacities called the National Association of Health Data Organizations. They represent organizations such as myself, as well as the other groups that do this work.

A number of states do produce at least one sort of quality report. Medical error reporting is becoming a big deal, the adverse events, like they have in Minnesota, as well as the hospital-acquired infections.

Speaking of that, here is some information also that is hospital-acquired infection legislation in the states. Those that are in the gray are looking to use the CDC NHSN system for reporting. Virginia is one of those. We are also looking towards other things.

This is an example of an AHRQ area-wide indicator on selected infections due to medical care, which has actually done a lot of work to try to adjust out infections that were not brought from an outside source. Without accurate POA, it's still hard to do.

We can very well say that we can see that this hospital treated this number of people with infections, but we really don't know if they acquired it. You can do some things with nursing homes and others. POA, with the right training of physicians and hospitals and follow-up, will be a good thing.

Certainly, legislating it is a good start. We will work towards doing that. If you don't have the training, if you don't have the follow-up, it's not going to be very good.

Here is a little information, just simply taking similar patients with and without infections, looking at length of stay -- a fourfold increase for length of stay with infections -- percent died, as well as the total charges.

The same thing by hospital. You can see the variation. The red bar shows the statewide rate. You can hospitals up and down, and you can see their volume.

Postoperative sepsis is another one that has had a lot of control from the folks at AHRQ working on it. They are doing what they can with what is there. It's showing similar differences.

The fact is whether or not you want to hang a hospital for this, there need to be ways to control infections before they get into the hospital or before they occur in the hospital. It's clear what happens when people have infections.

Again, just showing hospital variation. We have seen some volume relationships.

Here are things you have all been hearing about today.

The other thing you don't hear about is, it's actually mandated to be accurate. There are conditions of participation that speak to some of these. I had a business rep tell me that one time. He said, "At least here you have some underlying sense of accuracy," which is interesting. We like that because it supports what we do. But, honestly, there is something to be said for that.

In Virginia, it takes about six months from the close of a discharge to where we have the data up and ready. That lag time is something that bothers -- it's a lot better than some states, but it's worse than others.

I saw this quote recently. It's clear that moving to clinical data will be like this. A hybrid is a good way to start. I think a lot of people here have been thinking about that, and the audience and others.

NAHDO has come up with a vision, that by 2010, all states would have some form of POA or lab values and others, and works very hard with its stakeholders, sponsoring workshops, sponsoring discussion groups, sponsoring legislative forums, and others, to help support this.

National standards are important. Very little of what we are seeing started out because of national standards. They started out because of individual efforts, with the exception of the UV-02. But the idea is, there is a lot of innovation that takes place in the states. It scares some folks when they see that there is going to be someone who is going to come and help us by having a national standard for everything. It's good to have standards. But the fact is, you need to have local innovation.

This is intriguing. Those of you who know Dr. Goldbeck know that he has been big with the Washington Business Group on Health for many, many years. The statement that he said here was actually in 1985, about assumptions about waste and variations and the importance to do things right and to have standards, and how important volume can be and how social disparities drive different health-status indicators. This has been found to be true. About 12 years ago, he came back and reminded us about that. It continues to be true.

The simple fact of the matter is that health is something that has been a concern to many for many, many years, including Thomas Jefferson. It's something we need to pay attention to, and it's the reason I came here and you were here, too.

Thank you.

DR. CARR: Thank you very much.

May I ask, this wonderful work -- we didn't talk about what happened when you publicly reported it. Do you have a couple of stories or highlights of what got better with this when it is published?

MR. LUNDBERG: It's always so difficult to say. We know that when we published the 2005 cardiac care overall mortality rate, it dropped 12.7 percent in the last three years. Should we take full credit for that? Of course not. There are so many different things that are going to address that.

I would fall back on the growing body of research that is showing that pay-for-performance -- that public reporting enhances pay-for-performance. I would say public reporting enhances everything. People want to look better. But I cannot give you a measurable response to that.

I can say that length of stay has dropped since we published that. But can we take credit for that? Of course not. We are not scientists ourselves, so it's very hard to tease out this.

Just the cardiac care stuff itself, beta blocker use and all these other process measures have certainly had an effect.

MS. MCCALL: I know it's difficult to measure, but have you had a chance to look at patterns of information consumption behavior, and what people are looking at, what they are paying attention to, and if those are the things that, in fact, seem to be changing? Is there a relationship at all?

MR. LUNDBERG: I can tell you more about patterns in what people look at. What people love to look at is -- we have physician information on education and years in practice. We have about 1,000 visitors a day. Sometimes up to 40 to 60 percent are looking at physician information.

Right behind that is the hospital information and nursing home information.

MS. MCCALL: Is there a relationship between the specific things, and pages and metrics they are looking at, and that drop in mortality?

MR. LUNDBERG: I can't answer that.

Of course, we see, whenever we have a good press release or something like that, everything goes way up.

What is in the news is what -- physician information is huge. So is nursing facility information.

DR. GREEN: Michael, could you go further with your future slides and what you are anticipating and say a few words about your thinking about public reporting and performance of the insurance industry?

MR. LUNDBERG: Public reporting by the insurance industry?

DR. GREEN: No, of the insurance industry.

MR. LUNDBERG: We are thrilled that NCQA is stimulating PPOs to participate in performance measurement. In Virginia, there are still 1 million people in HMOs. There are 70 measures there: Are you happy with your doc? Are you happy with the HMO?

So we think that the direction towards PPO information is very exciting. We will be embracing that.

DR. GREEN: I'm not talking about PPOs. I'm talking about the rate at which an insurance company pays a claim, the amount of money that they pay compared to another insurance company -- the very same sort of information for public consumption about the payer side of the health-care industry that you are now doing such a nice job of reporting about the provider side. What is your thinking about where that goes?

MR. LUNDBERG: Where we have what would actually be paid? We currently have per member per month, which is as close as we can get right now to the data we have.

Are you talking about provider reimbursement?

DR. GREEN: No, I'm not talking about providers at all. I'm talking about the payers. I'm talking about the performance of Medicare, the performance of Virginia's Medicaid program, the performance of Anthem in Virginia, and what its performance measures are for its members. Are you doing anything about public reporting about that?

MR. LUNDBERG: So United's performance measures for its clients? I'm sorry. I'm having a hard time.

DR. GREEN: I think that's the answer. I got it.

DR. STEINWACHS: I like very much your focus on consumers and trying to provide information that consumers might use. It sounds like you have at least been successful in getting them engaged in your website. I am not surprised that they like to know something about physicians, since it's hard to get any information most places about physicians.

As you look to the future, are there ways to make the information more relevant to consumers? From what little I have done in the area, sometimes consumers -- at least with mental illness -- tell me they would like to know about the treatment or the outcomes for people like themselves, and some way to be able to go into a database, possibly, or to be able to characterize your database: Here are people who have certain sets of conditions or problems. How their outcomes look is different from someone else who doesn't have that when they go into a surgical procedure.

MR. LUNDBERG: By payer right now for mental health outcomes, you can see information, at least for HMOs, on how well they do at any medication management or other things, which are actually very important process measures. Crossing those with readmissions and recidivism is something that -- we do readmission rates for mental health care and other things like that.

Does that mean a lot to the consumer now? I don't know.

DR. STEINWACHS: I was trying to push you out to your plan for the future. Do you foresee being able to structure things, certainly with a database, and sometimes structure inquiry, where if I sat down and told you I had schizophrenia and I also had diabetes and congestive heart failure, and my physician has recommended that I have X procedure, what the outcomes would be for other people like me having that procedure? You might not be able to do it by hospital, because the numbers might be too thin. But you can at least go in the aggregate and say, "You are likely to have worse outcomes in certain ways or better outcomes in certain ways."

MR. LUNDBERG: Given that psychosis is the leading mental health reason for admission, I think it would be very easy to show differences in length of stay and things like that, without jumping backwards into the outpatient arena, talking about the different therapies and interventions and combination of medical versus psychiatric.

The only thing we are really doing specifically on mental health right now is developing this psychiatric-bed registry to help folks who place folks know where there is an empty bed.

DR. CARR: Thank you very much.

Betsy Clough is on the line now. We will present her slides.

Agenda Item: Performance Measurement and Public Reporting - Public Reporting, WCHQ

MS. CLOUGH: Just briefly, the purpose of my presentation and discussion today: I will go over a really quick background about WCHQ, talk a little bit about how we collect and compile the data for public reporting, how organizations are using the information.

This slide just briefly gives an outline of our mission. We are a voluntary consortium of organizations -- hospitals, physician groups, health systems, health plans, and various employers and purchasers from around the state working together to improve the quality and cost-effectiveness of health care for the state of Wisconsin. We do this by developing and publicly reporting measures of health-care quality.

The four buckets on the bottom half of the slide depict our strategic priorities for 2007 and 2008 -- primarily a focus on performance measurements and reporting, continuing to develop our portfolio of measures, as well as using the data for improvement and then having other stakeholders use it, whether that's consumers, purchasers, or payers.

Just as a reminder, the collaborative was founded in late 2002 by nine health systems from around the state of Wisconsin, with the goal of being transparent, publicly reporting outcomes, as well as improvement.

Since our inception in 2002, we have grown to represent about 40 percent of all of the physicians in the state and 21 hospitals around the state. Many of them are in competing markets. It is our goal by 2010 to have over 75 percent of the primary care physicians in the state represented.

As we were founded, each of the health systems that founded the collaborative brought with them a business partner. The purpose of bringing them into the mix was really to have them aligned with our effort, rather than having multiple initiatives occurring within the state regarding transparency and improvement. We thought that it would be best to get everybody aligned, rather than having separate or competing initiatives. Most of these business partners are thoroughly engaged in our work. Many representatives serve on our various board groups. There are two business partners on our board of directors. They all regularly attend our monthly meetings.

This is just a brief history of WCHQ. I won't go through that.

As I mentioned, the first meeting of the CEOs to create the collaborative was in the fall of 2002. By the fall of the following year, we had released a public report.

As we think about the catalytic sparks that really spurred the development of WCHQ, there are a few things. One was just transparency overall. We knew that hospital reporting was coming sooner or later for us, so we needed to do something about this, to be part of the solution rather than part of being part of the problem. There was also a lot of internal pressure, as well as market pressure to improve and be transparent. We had our business partners kind of pushing us -- every other industry is transparent, and we know how everyone performs, but we don't know much about health care.

Also during that time, there was a state-mandated physician claims database, where physician groups had to submit data regarding outpatient visits to a claims database.

So that was occurring, as well as simply the vision, that physicians had to know that unless they did something and created it, someone else would do it to them.

I would say that the physician leadership and vision really was key to this. That it was created by physicians and has complete physician engagement has been important. We started small, representing geographically distinct and separate markets, and have grown to represent most of the state of Wisconsin.

As I mentioned earlier, it was about a year from when the first meeting occurred to when our first report was released. It wasn't until about February of 2003, when the CEO pulled in the quality folks and said, "Okay, we're going to release a report. Go figure it out." Initially what we did was to develop a set of criteria to be able to evaluate which measures to use. The criteria included feasibility of data collection or harvesting it, the impact on populations, the potential for improvement for the measures, whether there was clinical evidence, and then the value to various stakeholders, including employers and consumers and providers.

So we used those sort of as the focus point as we evaluated the measures. We also realized that we had about seven months to get a report ready. It was about what data and what information we had in place that we could look to.

After our release of the first report in the fall of 2003, there was a fair amount of pushback from physicians. They didn't necessary believe that we were using the best data. So we set about developing a methodology that would represent all physicians, all payers.

Also there was a desire, at that same time, from the medical community, as well as the employer communities, to begin to look at some efficiencies, in addition to effectiveness of care. They didn't really define efficiency. It was a hard thing to define.

As we started developing those measures, it was really about kind of engaging the volunteer army, if you will, the data staff and quality-improvement staff and medical directors and other clinical professionals to begin to develop these measures.

What we found was really important was to make sure that no matter what we were talking about, whether it was the efficiency measures or developing the ambulatory care measurement, we had physicians engaged in the process all the way along the measure selection and development.

This just depicts that public reporting that occurs with WCHQ at the physician group level. The methodology that we developed does enable the physician group to be able to harvest the denominator administratively and then really go on a treasure hunt for the numerator, to complete the clinical data. The net result is online, using a secure Web-based tool. They have aggregated results at this time, but we are working towards moving to an individual patient-level submission.

At that same time, once data are submitted, everyone goes through a data-validation process to make sure that we are all measuring apples to apples, and results don't go live until that process has been completed.

The next portion of my set of slides is really focused on the improvements that we have seen and the results that we have seen over the last four years.

If you go to WCHQ.org and click on "Reports," you will be taken to the most important part of our website. Here you can view our measures by either the type of provider or clinical topic, and we also have them separated by category.

Just a quick overview for each of the measures that we do display. We publicly report at the system level, when we are talking about a physician group, and then at the hospital level when we are talking about the inpatient side. For each measure, whether we are talking about inpatient or outpatient, we make sure that we are displaying the name of the system, obviously, and then the population for each measure we are talking about.

If you click on the historical data link, you will be able to see historical results for as long as we have the data for.

We made a decision early on to publicly report at the group level. Initially, we just thought that were simply too many political issues, as well as scientific issues, with reporting at a more granular level publicly. Due to how we measure, groups are able to report internally at the individual provider level. But until we have more empirical evidence about whether or not we should publicly report at an individual level, we have chosen not to do so.

I would say that there really haven't been any complaints or any issues with reporting at the higher level.

This slide just depicts Bellin Medical Group. They are a group based out of Green Bay, Wisconsin that has seen quite dramatic improvements with diabetes groups, who are under good control regarding their hemoglobin A1c. As we have spoken with them and have begun to understand how they are seeing such improvements, it was really about using the data that they have. They used the WCHQ measurement methodology sort of as a framework to begin to really understand their practices around how they are taking care of patients with diabetes. Rather than just measuring for measurement's sake, they are really studying it, understanding why these patients didn't have a hemoglobin A1c and then why they weren't under control. They are feeding those results back to the individual providers.

Another example of a dramatic improvement in diabetic care is Advanced Healthcare. For them, the story really starts even before the results went public. This really started when they decided to participate in the collaborative. It's about a strong commitment to transparency and also a strong commitment to improvement. They really started to align their board, their leadership, and their quality-improvement staff, as well as their medical staff, around this transparency and improvement.

Similarly to what Bellin has done, they have also used the WCHQ measures as a framework. Instead of kind of floundering around saying, "Where do we start? What measures do we use? What disease do we start on," they have said, "This is what we have decided with WCHQ, and here's the list," diabetes, hypertension, preventive cancer screening measures.

Regarding diabetes, they simply used the data to, number one, build a registry using the WCHQ methodology, and then are publishing internally the comparative reports for each physician and then are starting to prioritize follow-up lists for patients. They have developed patient-notification processes.

What also developed as a result of this was a way for them to better understand their data, understand how to better use the DHR for both collecting and reporting data, and then also the issues around documentation by providers.

We will release another round of results for our diabetes measures in the fall. I am quite confident that we will continue to see an improvement in the results that we are reporting.

This slide just depicts the population focus. A really big, important focus for us is the impact that we are having on the overall population of Wisconsin and those patients being treated by providers. For every measure that we report on the ambulatory side, we give population results.

If you click on the historical link, what you will see is the improvement that has been made over the last three years. When you look at the entire WCHQ population, I must say, it's quite impressive. Even with the fact that we have continued to add physician groups and providers, we see an improvement.

This is just another example of overall population improvements.

This slide just depicts the pneumonia composite score summary. I just want to talk for a minute about the work that we have done regarding efficiency measurements on the hospital side. As I mentioned earlier, there was a desire by many of our stakeholders to begin to understand not only how effective we were at taking care of patients, but also how efficient. So we convened a workgroup with a lot of different stakeholders. They met for over a year, trying to decide how we would measure and then subsequently report the information publicly. We looked at EPCs and different risk-adjustment models.

What we concluded was that, due to the fact that we are just physician groups and hospitals, we are missing some data. Where we landed was what we referred to as our attempt at efficiency measure.

What we start with, as a first step, is the Joint Commission's measures for different conditions. We report for three conditions: congestive heart failure, pneumonia, and heart attack.

Using the composite score methodology developed by Premier, we calculated a composite score for each participating hospital. What shown on this slide are the features that make up the pneumonia composite score.

We partner with MetaStar, our state's QIO. They actually harvest the measured results for each hospital. They submit that to WCHQ, and then a composite score is calculated in our database automatically.

The second step is to harvest the length of stay and charges. A business partner does that for us in a similar way. They have access to our state database, and so they harvest that information. Then the results are risk-adjusted by one of our business partners using APR-DRG risk-adjustment methodology.

Both sets of data are then combined to create a dot for each hospital. I would also add that both pieces of data are validated. We rely on the core measures to serve as our validation. We also do validate and audit the process that our business partner uses to harvest and risk-adjust the length of stay and charges.

On this slide what you see is the quadrant for pneumonia and the resulting dot, if you will, for each hospital, their composite score plotted against the risk-adjusted length of stay.

Because we don't have a good way to display the results historically, I have shown the improvement for one hospital, Gundersen Lutheran, the improvements that they have seen from 2004 through 2006. Really, what happened for this particular hospital -- on their composite score, they were in the low 60s, and they said, "This isn't acceptable." So they convened a multidisciplinary team and really began to understand the processes around taking care of a patient with pneumonia when he or she comes to the hospital. By understanding the data, they realized that not much was standardized. They implemented changes and were able to show improvements.

This group continues to meet monthly to evaluate outliers and patients that don't meet the criteria, and then they work on various improvements.

This is just another example of the quadrant for the heart failure.

Then an example for heart attack as well.

This next set of slides just kind of wraps up and talks a little bit about some lessons learned that we have been reporting over the last four years.

I would say our lessons learned are in a couple of categories, one being about the data and the information that we are reporting:

• Data must be equally available and accessible, and if we are already capturing it for something, if there is a way to tap into information that is sitting in a state database, to get it from there.

• Whatever we are going to be reporting must be supported by sufficient evidence.

• Similarly, if we are going to report something, it must be something that can improve upon.

• It's also important that we are thinking about different audiences when we are publicly reporting, so if they read it, they can interpret it and get some meaning out of what they are seeing without significant explanation.

I have said a couple of times that multi-stakeholder involvement and buy-in is key. Obviously, physician engagement has been important for us, as we have moved throughout the last four years of our evolution.

One important thing that we learned, probably the hard way, regarding data that we are reporting is that before anything goes live on a website or in a paper report format, it's incredibly important that physicians have the ability to see the results in context with everybody else's. otherwise, if they haven't seen it and they thought they were on the top and they are really on the bottom, people get nasty phone calls. We don't want that to happen.

There are lots of issues with display that I think we have had to work through. Nomenclature is important, that we are defining what we mean by efficiency, what we mean by charges or the costs. There are lots of issues there.

The most important takeaway message here is really about making sure that we have credible, reliable data. If we don't have that, then we lose the engagement of all of the different stakeholders.

Lastly, just a constant reminder of vision, that we are doing this not just for measurement's sake, but for improving the quality of care that we provide to our patients.

Just briefly, some plans we have for 2007:

We are continuing to refine and further reengineer our audit and validation process. That is kind of ongoing, and we continue to work on that.

We continue expand our measures portfolio. We are currently working on measures for CAD. Then we will move it to asthma and depression, on the physician group side. We are trying to align, on the hospital side, with what our hospital association is doing in the state of Wisconsin in terms of measuring and reporting on that.

Obviously, we will continue to update measures

We are working really hard to develop a formal quality-improvement infrastructure. What has happened so far has really been more organic, if you will. We have had some quality-improvement workgroups kind of formed on the side. We have fostered networking and collaborating, but we haven't focused on projects. So it's our goal to figure that out this year.

We do have plans for a number of research projects to be implemented around this idea of a regional coalition, as well as looking at ways to begin to understand the physician-level reporting piece.

In summary, I would say that the work we are doing is really all about improvement and making sure that we have all of the right stakeholders engaged, particularly the physician community, and making sure we are all about collaboration rather than competition.

With that, I would be happy to answer questions.

DR. CARR: Thank you very much.

We are running behind. I think what we are going to do is move on to Dr. Yandell and invite you to stay on, if that's possible.

MS. CLOUGH: That's fine.

DR. CARR: Great.

Agenda Item: Performance Measurement and Public Reporting - Public Reporting, Norton Healthcare

DR. YANDELL: I'm Ben Yandell. I work at Norton Healthcare in Middleburg, Kentucky.

I want to talk to you a little bit about some work we have done in public reporting. Every now and then, I have to sort of stop and think about the path I have walked to where I am. Those of you who have worked in hospital settings probably have an experience somewhat like mine in quality, where you go to a quality meeting and something gets passed around -- maybe the copies are numbered -- and at the end of the meeting, you pull the copies back up and you make sure you have accounted for them all. That's the world I grew up in, in quality.

I see a lot of smiles. You guys have lived it.

I knew things had changed when I was starting to work on our public report. Toward the end of 2004, we decided that if we knew all this information about clinical quality, about ourselves, the public that we talk about all the time had the right to know it, too. Without doing collaboration and without doing extensive work with stakeholders, we basically told a couple of key folks we were going to do this. We contacted the local newspaper and told them we were going to do it, which then committed us to do it when we started to get cold feet, along about January of 2005.

I knew the world had changed when I had pulled together into a report all the indicators I could find that claimed to be saying something about clinical quality, about our hospitals. I shared that internally. I made one change to that report before it went public. I took the words "proprietary" and "confidential" off the piece of paper.

For me, that was a defining moment, to suddenly realize that I am in a very, very different world than I was in 20 minutes ago.

With that said, what I want to do is tell you a little bit about what we did, first of all, so that you have a feeling about that. I think this is a report from maybe August 2008 that I am giving you, because I think we are a little ahead of the curve; that's about how far ahead of the curve I think we are.

I believe we have learned a little bit about trying to live with these measures. We don't invent indicators. In fact, it's a rule of ours that we do not invent indicators. I cannot tell you how much grief that saved me. I agree with you, it's stupid. It's not my rule. But the conversation is over.

What I want to do is start by acknowledging some folks whose primary role in this, I guess, was, when everybody was telling them this was a dumb idea, to say, "Okay, do it anyway." One of those is Steve Williams, who is the president and CEO of Norton Healthcare. He has been interested in quality in the hospital setting since the late 1980s. he had the courage and the leadership to say, "I know nobody else is doing this. I still think it's the right thing to do. Let's do it." I just want to acknowledge that, because all the fun I have had since then would not have occurred if he had not said okay.

Bob Goodin is a physician and the chairman at the time of our board of trustees and the quality committee. I said we did this without collaboration. We didn't do it without approval. These are the folks that said, "Yes, do this." Even when the going got a little bit tough, as we got closer and closer to our launch in March of 2005, they still said, "Do this."

In particular, I want to recognize the work of Dan Varga, who was a physician. I think I probably learned more about leadership in working with him than practically anybody I have worked with in a long career in health care. It comes down basically to deciding to do the right thing because it's the right thing to do, not because everybody else thinks it's a good idea and so forth.

With that said, a quick background on Norton Healthcare. We are tiny. We are a little hospital system in Louisville, Kentucky. We currently have three adult hospitals. We are building a fourth. We have Kentucky's only designated children's hospital.

By the way, I haven't heard it said today, so I'm going to say it while I'm thinking about it: As bad as all of the work is in the world of adults, it's terrible in the world of pediatrics -- in fact, shamefully terrible. The work that has not been done in developing indicators and doing background work has to get fixed, for a population that everybody agrees we need to be paying attention to.

We have both owned physician practices and an independent medical staff.

What we did: We were determined to publish an objective evaluation of our performance and make it public. We initially went public with about 200 quality indicators. By the way, I don't know how to count some of these. Is that one indicator or six? I don't know.

Being conservative, we are currently putting data out on about 400 quality indicators. We are in the process of adding some others. I will tell you why so many.

I guess the thing that surprised people so much was that nobody made us do this. We did this voluntarily. Part of the reason we did it was some of the same frustration that I have heard around the room today. Let's get on with it. We have been talking about this forever. We thought, what can we do? The one thing we can do is, with the part of the world that we control, which is our data and what we do with it, why don't we go on and do our part to move this agenda along?

By the way, one of the things we have managed to do is to be kind of an object lesson for doing this kind of work: We are still in business after having done this. We were told this was going to be a field day for plaintiff's attorneys, that we were putting this out there, that this was a terrible thing to do and so forth. We are still in business.

Another thing is, I know how Neil Armstrong feels. I have been in meetings about this topic where some speaker said, not knowing I was in the audience, "If little Norton Healthcare in Louisville, Kentucky can put all these indicators out in the public, don't tell me that the logistics of this are such a barrier that you can't do it." It used to be, "If we can put a man on the moon"; now it's, "If Norton Healthcare can put quality indicators out in public." Okay, I may have delusions of grandeur.

Anyway, what does it look like? If you go to nortonhealthcare.com, you get our flash page. One of the tings I am really proud of is that we have real estate on our home page. For those of you who have ever lived in the world of Web, to have any real estate on a home page is a pretty impressive thing. We live there permanently. You can always get straight to us from our home page just by clicking on "Quality Report."

When you click on it, what you come up with is a page that is obviously designed by a statistician, not by a graphic artist, which is a list of all of the different areas that we publish things about.

I remember when we were trying to convince the local newspaper that what we were about to do was something interesting. We actually, with the reporter in the room, called the National Quality Forum, for the first time ever, and said, "We're about to publish every single one of your quality indicators."

There was this long pause, and the person said, "Every one of them?"

I said, "Yes."

He said, "Just a minute." I heard shuffling paper. "So you're going to do" -- and he literally went through every single one of their groups -- "you're going to do every one of the hospital consensus standards."

"Yes."

"Every one of the cardiac surgery."

"Yes."

"Nursing-sensitive?"

"Yes."

He just worked down the list, and we said we were going to do every one of them. He was just flabbergasted. That was the first time I actually saw the reporter believe we were doing something other than a PR kind of thing, that we were about to turn this thing loose.

So we do all the NQF stuff. We do the AHRQ quality indicators. CDC doesn't have a lot of guidance about what to do in infection control, but they do have a position paper about what states should we. We took that position paper and tried to do what they had to say about that.

In the world of pediatrics, I was desperate to do something. We have a children's hospital. So we put some ORYX indicators out there, and so forth.

We ended up adding patient satisfaction. We actually have our antibiogram, our antibiotics susceptibility chart, on our public website, which is kind of interesting.

By the way, CDC really does say to do that. I can show you where they say to do that.

Patient satisfaction is now out there. We put our balance sheet out there for the public to see. We do the ambulatory indicators, cancer survival rates -- you get the idea.

Basically, what has changed in the time that we have done this, starting in 2005 to the present -- when we started, the conversation was, why would we put that out there? The conversation very quickly became, why wouldn't we put that out there?

It's very interesting. After I launched the website, the first phone calls I got from affected parties at Norton Healthcare were not, "You idiot, what have you done?" They were, "Where am I? I can't find myself in the report," which was fascinating to me. I thought I was going to get the phone calls, "How could you put this out there." "I'm trying to market this service line. Why would you put something out there like this?"

Just to let you know, a lot of the doomsday scenarios that you hear about public reporting are not true.

What does it look like? This is a cardiovascular procedure page. We report our SDS data. We report our ACC data. We are now members of NDNQI on the nursing-sensitive stuff. We report that.

It's interesting, by the way, to be where we are. One of the things you run into with this is finding something to compare yourself to. That's tough. The second thing that's tough is being allowed to tell anybody else that you have something to compare yourself to. A lot of the databases do not allow you to publish anything out of their database. You are allowed sometimes, with a lot of coercion -- I have had some unusual phone conversations with owners of databases: "Can I at least put my data out there? If I don't put anybody else's, can I put my data out?" "I don't know. I'll have to talk to our attorney." It's my data. Why do I have to ask permission?

What we can't do is display anybody else's. I can show you that we are red or green, but I can't tell you why and you can't audit my books. That really bothers me, that I am not allowed to show you the national average.

There are a couple of things that I want to point out that I like about what we do. We use some interesting words, like "better" and "worse." "Worse," I think, is an interesting word for a hospital system to use about itself, but we do. When we are significantly worse than the national average, we say so. In general, what we have tried to do with this report is be blunt and not pretty it up and not put spin on it, but -- Joe Friday -- just the facts. Put it out there and show people.

A couple of things I am proud of. You are looking at a page that takes you two clicks to get to. Some folks who publish their data -- I felt like I needed a machete to get through all of the marketing, to get to any actual data. Sometimes I gave up and went to the next website.

It takes two clicks.

The other thing is, it's data; it's not text. I am not telling you what you are seeing; I am just showing it to you. I think that's valuable.

There is text. There are questions that people need to have answers to. We put that in pop-ups. If I click on something and I want to know the definition of it, for example, it tells the definition. We have tried to divide these definitions into a relatively publicly oriented description and a technical definition.

Why would you do that on a public website? I just want to use this chance: Public reporting isn't just about the public. Our own staff did not know these statistics until we made them public. I actually consider the public our third audience. Our first audience is our own staff. Our second audience is our medical staff, who also did not know these numbers until we went public. The third audience is the public.

I get challenged a lot by somebody who is looking at this for the first time, who says, "How much does the public really care about this anyway?" I don't know. Do they care if they live or die? Do they care if they get an infection or not? If they do, then they care that we are doing this, whether they ever look at the report or not. I do think that's what public reporting does for you. It moves the agenda along. It gives it a kind of urgency that it doesn't have if you are not public.

By the way, one of the things that people struggle with -- I struggle with it -- composite versus a bundle versus an index. How on earth do you combine indicators? I get told, usually in the same sentence, "That's way too many notes. Four hundred indicators is way too many. And, oh, by the way, I can't find what I'm looking for. You don't have any indicators about it." And that's usually in the same sentence.

One of the things that is interesting, if you do what we have done, where you have this confusing and terrible way of presenting data to the public, which is this matrix -- if you traffic-light it and it's red and green, you can kind of blur your eyes and not even read the words, and it begins to create a kind of composite, a kind of bundle score. If you flip around our website a little bit, it's interesting.

I also want to point out the bottom line down there. Because as soon as we have it and we trust the data we make it public, we are already ahead of releasing our 30-day mortality. We happen to have a single provider number. Unfortunately, I can't separate the data by our hospitals on this, which is kind of frustrating to me, because when CMS does its analysis, it does it by provider number. So we are all one group. So when the report comes out, I am told that you are not going to be able to tell individual hospital performances, but in our report you can.

We have developed some principles about the report as we have gone along. I talked with our quality committee of the board about this. I think that you have to principles something like this, or what you are doing is advertising; it's not transparency. The principles are these sorts of principles:

For example, we don't decide what to make public on the basis of how it makes us look. I actually have a standing order from the board of trustees: If the National Quality Forum endorses an indicator, I have a mandate to measure it, get the data right, and get it on our website. I am not supposed to ask anybody, including the board of trustees. That's the kind of commitment that, to me, is about transparency as opposed to bragging about how good you are. We have been doing this for a little over two years now. We update it at least once a month. We don't know yet some of the things that are going to be coming out, and we are already committed: When they do, we are going to publish them.

We give equal prominence to good and bad results. I have warned the board. One of the things that is frustrating to people who don't understand significance testing is to be really, really good at something and not be significantly different from the national average. That's very frustrating. How can I not be significantly different? I didn't have any deaths. I had no deaths. How could I not be significantly different? Well, it's a really rare thing to have deaths in this particular area.

So we now give them a quality ribbon. If you are the best possible outcome -- if 100 percent of the time you get the right medication to somebody, or 0 percent of the time you get an infection -- we give them a quality ribbon.

I have warned them that this second bullet point about equal prominence means that if we ever kill everybody or fail to give anybody the right drug, there is going to be a little bomb or something like that in the report.

You see the other principles. They come down to not picking and choosing. I get a lot of phone calls from other hospitals about trying to do a public report. A very common first question is, "How do I pick which indicators to report?" I say, "You know what? As soon as you start picking indicators, you are in danger that you are into the world of marketing and advertising, not in the world of transparency."

You are going to find that the indicators that don't make you look good are also the ones that aren't that valid. That's why, even though there are so many of these indicators that we don't agree with, that we do find fault with the definition of, we report them anyway.

That's how we ended up with a big report. People ask me sometimes how this report got so big. Because we report whole lists of things. We don't pick and choose. That's why it's big. It's big to be unbiased, not because I really like a big, long report -- although, by the way, I do like a big, long report, and we are going to make it a little bigger.

You can't read this. You don't have to. This is a list of SPC charts, statistical process control charts, that are part of an internal report. We have spent most of our time since we launched the public website on internal reporting, which is interesting to me. Supposedly, the report is too big, and that's one of the issues with it for the public, and, mainly, it's nowhere near big enough to do the internal work that you have.

So what we routinely do on all of the indicators is statistical process control charts, a patient listing. It's not a special report. It's available every month. It's a patient listing of everybody who hit the numerator of the indicator. We always break things down by physicians. We are about to launch a physician intranet site so that every physician can get to their own data on everything that is in our quality report.

I also get asked, has this made any difference? What you are looking at is a slide that is not from our quality report. I am trying to condense it even more. You see a lot of red on this slide. Remember, red means we are significantly worse than the national average. This slide is our data from the last half of 2003 compared to 2005-2006 statistics. I am about to show you what it looks like now, and I want to make it apples to apples.

There was actually a version of this before this that I call "the Bloody Mary slide." We actually started collecting these data in the middle of 2002, and our data quality was just awful. Most of the changes from that data collection to the slide that you see now -- it does have a spattering of green -- were not changes in care, but changes in data quality. I want to show you things that I think might actually have something to do with what happens to patients, not just what happened on the database somewhere.

So this is what it looked like then. Watch this. I think this is very cool. It's mostly green. That is 2005. I will tell you -- not because they turned green, but because I know what we did -- that's real. We really did change what happened with patients, at least in part, because the data were public.

I had the experience of sending out reports before and after the data were public. I will tell you, sending the exact same report to the exact same set of managers, it was a very different reaction in the world of "it's not public" versus "it is." In the "it's not public," it's, "Let me get this straight, Ben. You're the only one looking at this?" "Yes." "Thanks. Nice report."

This is the last half of 2005.

Now I want to show you both good news and bad news. We have maintained, though we have not improved since 2005. You see a little bit of random variation, but it's essentially the same thing. I liken this to being on a diet and trying to lose the last five pounds. With a lot of this stuff, that's where we are. We have done the stuff that -- "you moron, you didn't have something in place to accomplish this?" So we put that thing in place and we get better. What we are working on now is tougher stuff.

By the way, we still struggle with making this stuff stay that way. Our folks have kind of gotten used to public reporting and the Hawthorne effect that comes from the initial public report. They are used to it now. In fact, they are more surprised when something is not public.

Limitations: Obviously, because this is just our self-report, this is not a model that we can use for quality improvement for the whole country. I don't think every hospital putting out its own personally developed public quality report is the way to go. I wouldn't advocate that.

One of the things that I do want to point out is that we compare ourselves to the state of Kentucky and to the United States in our report. What we don't do is compare ourselves to competitors, because I am not going to decide for them to publish their data, obviously. But that obviously limits the usefulness of the report, if what you are trying to do is comparative shopping.

By the way, we seem to keep forgetting that the risk-adjustment methods that we use do not allow hospital-to-hospital comparisons. We keep forgetting that. At their root, they are indirect risk adjustment, which means we are all being adjusted to our own personal population of patients. If I adjust to a different population from you, I can't compare my results directly to yours. So rank orderings and the things we want to do with these -- we are actually a little outside of what the science says you ought to do.

Some quick thoughts about some concerns: I think these are the wrong indicators. We have 400 of them, and I think probably 380 of them won't be here five or 10 years from now. They are mostly the wrong ones because we spend all of our time trying to define the indicator into nonexistence. It's like, "But you haven't thought about this. We need to eliminate this. It's not 100 percent yet. It's not zero yet. We obviously have some definitional work to do. We know we have hit nirvana when it becomes like transfusion reactions or ‘left surgical instrument unintentionally in a patient' or we gave an aspirin to somebody who had a heart attack that shows up in the hospital. Those are great indicators, because they are zero or 100 percent."

No, they are not. Those are lousy indicators. They are lousy indicators. They don't distinguish among anybody. You might as well say, "How many fingers does your surgeon have on his right hand," as an indicator.

You need indicators that are much closer to 50 percent. That's a good indicator. The things that we currently call known complications of care -- when I hear that, I don't hear, "Exclude it." What I hear is, wow, that will make a great indicator, because that is where the quality frontier is. That's where the difference is between a decent hospital and a great hospital, managing those things that are, quote, known complications of a procedure.

I don't know how much we can trust these definitions yet. A lot has already been said about the loose definitions. I will tell you, as the guy who is trying to live by them, in a very obsessive-compulsive way, you just go nuts trying to figure out what they really mean here. It's very hard sometimes to know who is in, who is out, what to measure.

Incidentally, at the root of it all is physician documentation, about which there are precious few standards.

I want to say that I think we have the wrong mental model for a lot of what we are doing right now. We think that what we are doing is building a comparative shopping guide for consumers. Someday, yes, that will be wonderful. We are not ready. Do you really want consumers deciding not to go to a hospital today on the indicators that we have today? I am not sure they should be using them that way. We tell them not to, in the very first page of our report. I don't think it's ready for that yet.

I do think it's very important to think about a model like Consumer Reports. You do not have to ever have read Consumer Reports to be able to buy a better microwave, because Consumer Reports exists. You drive a safer automobile whether you have ever read an automobile crash test result in your life or not. To me, that's the right model to think of. Yes, publicly reported, a lot of attention to the science behind it, but not, "Kill it because the public doesn't understand it." The public doesn't understand crash test results, but the public benefits. I think that is the key question.

By the way, how about the people who actually deliver the care? Do they have the data, the feedback loop, to tell them it's working or it's not working?

One last thing. Some of the things that we are all worried about -- everybody worries about the unintended side effects of this stuff, "teaching to the test," if you want to call it that. If I only measure these six things, which, by the way, I think are relatively trivial, does that mean I now ignore all the things that are really, really important? We haven't found that. I would say our performance-improvement efforts at the hospital are probably only -- maybe a fourth of them were driven by the quality report and three-fourths of them were driven by the same things they were always driven by: We think we can do a better job clinically on this, so let's get to work on it.

I think we all worry about how real this is. Are we measuring real quality yet or are we just improving the indicators? I think that's a legitimate concern. My gut but I have no data to offer -- tells me it's both. We are both doing better data -- some of which isn't trivial data, by the way. Some of the important improvements in capturing the core measures were capturing contraindications. We act like that is just a data improvement. That's an improvement in what is in the chart about that patient for the next caregiver who encounters that patient. That is not just data improvement. That's quality improvement.

I guess the thing I worry about the most -- and I will stop with this -- is that the problems that we see in this will make us kill it too early. I think this has incredible promise. It has already shown some early returns that I find very promising, in terms of informing the public, informing the people who deliver care, and improving care. I don't want things like a concern about administrative burden or that sort of thing -- or the science isn't quite there yet -- I don't want to kill it too early. That's my biggest concern about all this stuff.

I am very optimistic about what we have done so far. I think it won't look anything like this a few years from now, because it's so embryonic. But I think it's really, really important to stay on the path we are on.

I would be happy to answer any questions.

DR. CARR: Thanks. That was inspiring, as well as informative. I have to say that because in my institution, Beth Israel Deaconess, our CEO led the charge by beginning a blog called "Running a Hospital." On any given day, anybody's outcome, project, or initiative might appear. So we start the day reading that.

But, in fact, last Friday, we also just went to our report. It is empowering and it changes the culture.

A question I have is, although I know you don't want to be concerned about the administrative burden, could you say a little bit about what it takes to get all this done?

DR. YANDELL: Sure. It's a whole lot easier than it seems. It really is. I won't say that it doesn't take some money and staff, because it does. We launched our initial report -- we launched it and we maintain it -- with probably two to three FTEs. It's hard to quantify that, because this isn't anybody's only job. Everybody involved in this is doing something else.

The reason for that is, most of this we were already doing. It's either administrative data, like the AHRQ stuff, or it's core measures that you had to do if you were a Joint Commission hospital, or it's infection-control data that your infection-control nurses are already collecting, so forth. So most of it was data that we already had, and what we needed was somebody to put it into some sort of shape and put it out there.

We literally did this in a giant Excel spreadsheet. That's how we launched this.

DR. CARR: Carol?

MS. MCCALL: Absolutely fabulous. I have not so much a commentary on the specific content, but a question for you generally. You have three wishes for things that we might be able to help with, to help you further your cause, whatever that is. What are they?

DR. YANDELL: Number one is to keep the pressure up. If there isn't the external pressure that we have to do this, "we" being hospitals -- and other folks, too -- then we will quit. So whatever it takes to keep that pressure going, I think that is absolutely number one on the list.

The second thing, I guess, is all the things that you have heard about already in terms of very clear specifications, trying to get things aligned, so that I am not having to keep a slightly different variation on this analysis, because it's organization A versus organization B. That's an unnecessary addition to the administrative overhead.

If I have a third one, I guess it's to help get the message out that this does not have a monolithic purpose, that this creates all sorts of benefits beyond public reporting, beyond pay-for-performance. Keep always tying it back, which, I know, anybody clinical does, to the fact that a real human being got a beta blocker that they might not have gotten otherwise and had a better clinical course because they got it, because some bureaucrat somewhere said, "I think you ought to have to report on whether or not you give people beta blockers."

It doesn't have to be that the public understands it. It doesn't have to be that physicians buy in to 100 percent of it. It creates databases that people live off of. It has so many desirable side effects that I think it's important not to come up with any criterion to judge it against and say thumbs up or thumbs down based on that one criterion.

MS. MCCALL: I also heard you say in there to personalize it, so it's the Consumer Reports. You may have been in a car wreck, but it was yesterday, not 20 years ago, and you walked away and you still celebrated Father's Day.

DR. YANDELL: Right. Here is what I would love to do with ours, for example. In fact, I am doing some work right now to try to figure out how to do this. I would love to create a front end that is essentially a natural-language search, á la Google, where I come to the website and I don't have to have Dan Yandell's vision of how you display quality. I type in "knee surgery" and up comes anything relevant I have on the website about knee surgery. Better yet, I then start asking you some questions that you can answer, or not: How old are you? What kind of knee surgery? What have you been told? Here are some drop-downs about it. By the way, do you happen to be diabetic?

That all sits on top of a database that then gives back to the person, "You know what? At our place, you have a 50-50 chance that you are not going to be any better off six months from now, with your particular set of" -- things that are helpful in the clinical decision-making process. We are a long way away from doing that.

But we have an internal agreement to try. In case you want to know where we are trying to go next, that's where we would like to go with it. But, my gosh, do we have some work to do to get from here to there.

By the way, just so you know, we do not display and don't intend to display anytime soon physician-specific data. That is not just because we are wimps. That is because when we try to even produce internal reports that are physician-specific, the logistics of that are just maddening. It has nothing to do with politics.

DR. CARR: Thank you very much.

Our next two speakers are going to tell us about nurse-sensitive measures. We will start back at 3:50.

(Brief recess)

DR. CARR: We are ready to reconvene. Isis, please.

Agenda Item: Performance Measurement and Public Reporting - NDNQI

MS. MONTALVO: Good afternoon, Madame Chair and members of the Quality Workgroup. It's a pleasure to be here to be able to share with you the rest of the story regarding acute-care setting and measures that affect patient outcome.

I am Isis Montalvo, a registered nurse and manager of Nursing Practice & Policy at the American Nurses' Association in Silver Spring, Maryland. I provide oversight of the National Database of Nursing Quality Indicators, NDNQI, which collects and reports on nursing-sensitive measures.

Thank you for the opportunity to share our experience that we have had over the last nine years in collecting and reporting nursing-sensitive measures nationally.

The ANA is the only full-service professional organization representing the interests of the nation's 2.9 million registered nurses. Our members include RNs working and teaching in every health-care sector across the entire U.S.

The ANA's work in nursing quality measurements really predicts the best words that we hear frequently in this day and age regarding quality of performance improvements. In 1994, ANA launched the Patient Safety and Quality Initiative to evaluate and explore linkages between nursing care and patient outcome. ANA fully funded the multiple pilot studies that were done across the United States, in seven states, to evaluate those linkages and subsequent nursing-sensitive measures. Multiple publications were generated as a result of this work. The Nursing Care Report Card for Acute Care proposed 21 measures of possible performance within an established or theoretical link to the availability and quality of nursing services in acute-care settings. A final 10 measures were recommended as being nursing-sensitive.

In 1998, ANA established NDNQI, which currently is administered by the University of Kansas Medical Center, under contract to ANA. The database is the only national-level database that provides nursing data and patient outcomes at the unit level. Data collected are structure, process, and outcome indicators, which is based on Donabedian's quality framework.

As of June 11, over 1,100 hospitals of all sizes participate in NDNQI in all 50 states and the District of Columbia. We also have international hospitals participating in the database.

NDNQI mission is to aid the registered nurse in patient safety and quality-improvement efforts by providing research-based national comparative data on nursing care and the relationship to patient outcome.

We started off with 30 hospitals which joined when we established the database in 1998 and continue participating, up until 1,100 hospitals for 2007. So you can see the growth.

The NDNQI participants are voluntary. They are interested in quality, or they might be interested to satisfy the Magnet requirements related to the reporting of nursing-sensitive measures. Participating in the database is a primary quality-improvement tool, can aid the hospital and the hospital and the nurses in facilitating Magnet requirements, and also help to aid in meeting regulatory requirements.

Forty-eight percent of the hospitals in NDNQI are academic teaching, 86 percent of them are not-for-profit, 20 percent of them magnet, 80 percent of them urban, and we have good distribution across all bed categories, from fewer than 100 to greater than 500 in bed size.

The NDNQI program is multifaceted. It is database participation, which includes indicator development, Web-based data submission, and other significant data, a high level of accuracy in reporting, on-time electronic report, acceptability of many NQS-endorsed nursing-sensitive measures. There is an optional RN satisfaction survey for all RNs.

The program also includes pilot testing. Because the indicator development is research-based, the hospitals have the opportunity to participate in the development and implementation of an indicator. Not only do we want to ensure data validity and reliability, but we also want to ensure feasibility from a data-collection perspective.

There is education and research that is ongoing with NDNQI. There are quarterly conference calls that are held with all the facilities to support them in their work. There is an annual conference that we have started. In January 2007 was our first conference, which I will talk about momentarily. There are publications where we have started publishing best-practice exemplars, sharing experience of those hospitals, and those hospitals that had a sustained improvement, and how they did it.

There is also internal and external research done on NDNQI via NINR, NIOSH, as well as internal studies that are done by internal researchers.

The NDNQI measures that we include are multiple. Indicator development and implementation is ongoing in NDNQI. Currently, data is collected on 13 indicators, with four more scheduled for implementation in 2007. Several of the NDNQI indicators were submitted to the National Quality Forum and were accepted as part of their consensus measure process in evaluating nursing-sensitive measures, and they collaborated with the Joint Commission, via a grant they received from the Robert Wood Johnson Foundation to develop the micro specifications of the NQS measures.

Other NQS measures have also been included in the database. The indicators that we currently have:

Indicators in development are voluntary turnover, which is scheduled to be implemented in quarter three of 2007; and the three nosocomial infection indicators, which are scheduled to be implemented in quarter four of 2007.

NDNQI requests that hospitals provide data from administrative record systems or form special studies. Some data elements come from medical record review, and hospitals with electronic health systems may pull some of the data from those systems. Usually we ask them to take a look at our definitions. A lot of times they are already collecting the information. So it's just to realign their processes to be able to meet the reporting mechanisms for NDNQI.

Some examples:

What we did this year, actually, to facilitate the acquisition of this information is, when the quarterly prevalence study is done, we encourage the hospitals to then do the restraint prevalence at the same time. That way, it minimizes the frequency of data collection that they have to do. They do it once and then they can report it at one time.

Data submission to NDNQI is done via a secure website. Hospitals may enter their data by hand in Web forms or upload their files via XML. The programmers will work with their IT people at the hospitals to give them the necessary coding.

These data sources were selected for two reasons. They contain the standardized information required by NDNQI, which facilitates data reliability and validity when it comes to data-collection processes, and they have a known level of reliability.

Specific processes were established in order to attain the project goals of collecting standardized reliable data from hospitals across the nation, in order to provide the hospitals with comparative reports that they can use in quality-improvement initiatives and use in analyses of the relationship between aspects of the nursing workforce and nursing-sensitive patient outcomes. We use standardized definitions and data-collection guidelines to collect comparable data from each hospital. Tutorials need to be completed prior to any data collection, so that we can ensure consistency.

We also use in-person interviews with hospital site coordinators to correctly classify units into unit types. The reports are done based on patient and unit type, and hospital size or academic teaching status. To make sure that we are collecting information on a medical unit, the hospital will actually have a conversation with NDNQI liaison, to ensure that they are allocating those units appropriately.

We solicit input from hospitals about data that they would like in the reports they received from NDNQI. We appreciate that they are the end users of this information. How is it meaningful? How is it relevant? What information are they looking for? What is on the horizon? What are those adjustments that we need to make?

We guarantee the confidentiality of data so that hospitals are motivated to provide accurate data.

The resources that are utilized are pretty extensive:

Ascertaining data reliability: We initially used the ANA indicators that we had identified because of the research that had already been done. Subsequently, we have been incorporating the NQS indicators, which have been through expert review for reliability and validity.

There are annual reliability studies that are done on the indicators that include a survey on data-collection practices, rater-to-standard reliability assessments or audits of reported data against original records. There was a pressure ulcer reliability study that was done recently that actually demonstrated moderate to near-perfect reliability when it came to the data collection. The question is, how can you be sure that what a hospital reports is reliable compared to another hospital? That information was published.

We also learned that certified wound ostomy continence nurses demonstrated better reliability in wound assessment. That is kind of common sense -- more certification, more education, greater expertise when it comes to assessing your patients and wound surveillance. What is meaningful about that is, if you can assess the patient more accurately, you can then report those findings and intervene more appropriately.

We also recognize that there is an opportunity to educate the everyday staff nurse. So we actually created a computer-based learning module on pressure ulcer assessment and evaluation. We disseminated that tutorial to all the hospitals, so they could incorporate that into their own educational medium. Then we posted it on the website publicly, so any staff nurse could go on to the NDNQI website and learn more about pressure ulcers. It was meaningful to educate the nurse.

The response has been very favorable. We have had over 5,000 to 6,000 nurses who have already completed this tutorial. It has been very well received.

Data use: Hospitals primarily use the data for quality-improvement purposes. We provide them quarterly reports on the indicators. It provides them trend data. There are eight rolling quarters with an average for those quarters. Depending on the hospital size, it depends on the size of the report. It can be anywhere from 50 to 200 pages. There can be 26 tables all together if you are reporting on every quarterly indicator.

The quarterly reports are separate from the annual survey report, because that is administered annually.

The reports provide statistical significance, mean quartiles, and national comparisons at the unit level where care occurs.

This is a significant finding. In the research that we have done, it does make a difference on the different types of units, when you look at workforce, when you are looking at skill mix, when you are looking at those structural indicators that need to be considered when you are evaluating patient quality, as recommended by Donabedian's quality framework.

So there are details on structure and process measures in the quarterly report.

The annual survey is an annual report, again, and there is a lot of pre-work that is done for the administration of the survey.

The reports really help to aid the staff, the nurse manager, the CNO, the CEO in the decision making and help them measure sustained changes and improved quality.

We also provide specialty and system reports, as a separate service. We are also contracting with states to provide statewide reporting for public reporting. If there is a state that has mandated public reporting and there are hospitals that are participating in the NDNQI, then we will facilitate the reporting of that information to the state subsequently, so there isn't dual reporting. The hospitals only have to enter that information once. It minimizes the burden that they experience.

The national comparison data, again, is at the unit level where care occurs. The reports are provided by unit type. It can be critical-care, step-down, medical, surgical, combined medical/surgical, rehabilitation, psychiatric, pediatric. It is grouped by hospital size or teaching status.

The database has really grown. When we take a look at RN satisfaction, when we started off with our pilot, and then in 2002, we initially had a response rate of 55 percent, with 64 hospitals participating and close to 20,000 nurses responding to the survey. It's a confidential survey. When it comes to the unit reporting, they don't get the unit-level information if there are fewer than five staff in that area, so it is not easily identifiable.

Our response rates have been pretty stable, 63, 64 percent over the last few years, with 494 hospitals participating in 2006 and 176,000 nurses completing the RN satisfaction survey. There are some hospitals that have a 95 percent rate in their response rates. They do work for that and they are very proud of that.

The number of units reporting: This is just a sample of the number of units reporting that we have for the RN satisfaction survey, over 7,000 adult units, over 1,000 pediatric units. When you look at the quarterly indicators, in any given quarter, about 9,000 nursing units are reporting on any given indicator.

The outcomes: The research done on NDNQI has demonstrated significance at the unit level. Studies done related to falls and pressure ulcers demonstrated which staffing or workforce element was statistically significant at the unit for the patient outcome. These are a few examples:

The difference is, when you are looking at higher nursing hours, take component takes into account RNs, LP and LVNs, licensed practical nurses, licensed vocational nurses, and unlicensed assistant personnel.

You are also being able to drill down to skill mix. With the higher percent of RN hours, it was statistically significant for the step-down and medical units in helping them to have fewer falls.

In another study that we did, there was higher reliability with certified nurses assessing wounds. For every percentage-point increase in percent RN hours, the pressure ulcer rate declined by 0.3 percent. This is just a sample of some of the findings. Staffing does make a difference. The workforce does make a difference.

When you look at quality indicators, you need to look at the entire package. You need to look at the structural elements related to nurse staffing or certification or education, as well as skill mix.

Other outcomes: The program has grown. As I mentioned previously, we had our first national conference in January and had over 900 people attend. It was just fabulous and exciting to feel the energy in the room, with all the nurses being able to walk away with helpful hints, with tools that they can walk away with: This is what I can implement on my nursing unit to make a difference today when it comes to patient outcomes.

That focus was transforming nursing data into quality care.

Our second conference is going to be scheduled for January of 2008. The call for abstracts is open currently. It is workforce engagement and using data to improve outcomes.

The other thing that we did is, we published a best-practice exemplar, which profiled 14 hospitals in the database that had a sustained improvement for a specified nursing indicator, and they shared their stories. They shared with us how it was to use the data, what those things were that they need to incorporate to get staff buy-in, how successful they were, what those lessons learned were. We published a lot of the helpful tools that they had to use within their own practice settings.

Future plans for NDNQI: Methodology development is one of them. We believe that we need to develop methodology for unit-based acuity or risk adjustment. This information is needed to include mixed acuity units and universal beds, critical access hospital and hospital rollup. We appreciate that there is a difference at the unit level based on the different type of patient population and unit type. But for these other areas, there needs to be some further stratification to make that comparison more comparable. It also gives the opportunity for other types of facilities and hospitals to participate.

Hospitals like to have a hospital rollup. We appreciate that the statistical significance is at the unit level where care occurs. But that's the reality. So it's something that we want to be able to provide.

Indicator expansion: We have been adding indicators every year since we started the database. What we are focusing on over the next 18 months is to really expand the current indicators to other relevant units. Since it is research-based, indicators developed and implemented are based on the appropriateness for that particular unit. For example, it would be highly inconceivable to think that you would implement a fall indicator on a neonatal unit. No one should be dropping babies. So that is not appropriate.

It's a bad example, but I think it makes the point.

So when we take a look at other indicators, it's really looking at the appropriateness of that particular unit. For example, the assault indicator is very appropriate for the emergency department. That is what we are looking to implement over the next 18 months.

Report enhancement: As the database grows and hospitals grow, they like to be more sophisticated in their reporting. Currently, you can download the reports via PDF or X file. So you can actually take your information and put it in whatever medium graphics that you need to internally for your organization, which is very meaningful. But what we hear from the hospitals is that they want to be able to be more granular with their comparisons. They want to be able to compare a coronary care unit with a coronary care unit, not a medical ICU. So that's something that we are looking to work on in the database over the next 18 months.

Lessons learned: You can never underestimate the level of staffing required to operate a national database. Accurate data collection requires a high level of technical assistance and diligence and monitoring when it comes to managing the database. There needs to be ongoing quality monitoring checks.

Indicator development and implementation requires time and resources to ensure data validity and reliability. It takes time to develop an indicator. We just can't implement it in a month. It takes time because it does go through its process.

The significance and importance of implementing and evaluating indicators at the unit level where care occurs cannot be underestimated.

NDNQI is in a state of continuous quality improvement:

Collecting structure, process, and outcome indicators provides a comprehensive means for evaluating the quality of nursing care and patient outcomes. There is good distribution and representation of all bed sizes in the database to provide meaningful comparisons at the unit level. With that, it is very important to have a definition of a hospital to maintain data comparability and validity.

Thank you very much for this opportunity.

DR. CARR: That was a great presentation, very informative.

It, in a way, harks back to one of our opening speakers talking about bay care and how a data element prompted a look back at what was going on. It wasn't just the care; it was the systems of care. I think that you have really driven that home very much.

I like the blended aspect of what you have. It's not just ulcers; it's ulcers in a care unit with this staffing. I think it takes us to a new level. I commend you for it.

Questions?

[No response]

Sharon Sprenger.

Agenda Item: Performance Measurement and Public Reporting - JCAHO

MS. SPRENGER: Thank you. I'm Sharon Sprenger from the Joint Commission.

I should note for all of you that in January I can't even use that acronym. We are the Joint Commission. That's one thing I want you all to think of today.

I did want to comment, when Dr. Yandell started, he talked about, "Remember in the days of quality assurance or improvement, we would pass the report around." While I started in quality improvement in grade school, I remember the days when it wasn't just passing the report around, but it was important, the amount of paper that you had, because the more paper, obviously, the better you were doing. I think that's important for today's conversation, as we look to electronic health records.

What I would like to talk about is the nursing project we are working on in alternative data sources. But before I do that, I just want to talk a little bit from the perspective of the Joint Commission and hospitals, what we see as some of the barriers and challenges to electronic data.

I am sure you have heard many of these things, as you have had different presentations today. First of all, one of the challenges is the fragmented health-information exchanges that we really need to address, looking across different physician practice areas, different settings, et cetera.

We also need to be very concerned with the privacy of health information. We need to be sure that we effectively protect privacy, while assuring broad access to meaningful and relevant performance-measurement data, and ways to provide information that provides longitudinal views of quality and safety across the continuum of care.

Data quality is a very important issue at the Joint Commission. I will speak to that in just a few moments.

Also there is a need for national measurement priorities, with a standardized data dictionary with common data elements and definitions across multiple venues of care. In the time I have been here today, I have clearly heard that message.

I would really like to applaud the work of the American Health Information Committee and the commission that they gave to the National Quality Forum, who, at the end of May, convened an expert panel that Dr. Tang chaired that I had the opportunity to sit on, to begin to identify core data elements for the electronic health record and to begin to prioritize those performance measures that we should start with. So I look forward to the work coming from that group.

In many ways, the current measure specifications are not designed for an electronic health record. I think we have heard clearly today, for example, that as we move to electronic health record, identifying patients concurrently will be very challenging in terms of how we are going to identify the population that we are going to measure, and that we need automatic exclusion of all data issues.

Then there are measure-construct issues. I could spend days talking about this topic. We have heard from some speakers on the issue of measure exclusions. They are all over the board with the different measure developers and how you identify them, et cetera.

But again, there are some efforts that you may be aware of that are going on. Just this past Friday, there was actually a meeting at the National Quality Forum with a workgroup that NQF has convened of the major measure developers that included the AMA Physician Consortium for Performance Improvement, CMS, the Joint Commission, and NCQA. What we were trying to do was to advise them in terms of measure construct: What should be some of the rules of the road that every measure developer needs to follow? What is the minimum information on a measure that should be submitted to NQF? Wouldn't it be interesting if every measure submitted to NQF looked the same way, so that as you went to the form to find the denominator or the exclusions, you could see that.

We even discussed some guidelines, if every measure had an algorithm. It's one thing to standardize on paper. It's another when you bring a measure through a calculation algorithm. If you use a different sequence of how you retrieve that data, you can, in fact, end up with a very different measure rate.

So I think there may be some very important work.

With respect to exclusions, I think there is some hope on the horizon. We actually agreed, as the major measure developers, that there needed to be some other people in the room to really talk about this issue of exclusions, which is important. So the NQF will look to actually convening a separate meeting, and the whole topic will be the issue of exclusions and seeing if we can standardize that. Hopefully, there is some sunshine on the horizon.

Then there is the ability to capture and link various data sets. Hopefully, as I talk about the nursing measures -- I think you have already seen some of that from the last presentation -- looking at not just clinical, but financial, administrative, or human resource-type data.

There is also, I think, a need for process changes in terms of documentation. For example, isn't it interesting, the measure in heart failure that looks at left ventricular ejection fraction and receiving appropriate medication. But in many EHRs, we can't even identify what the patient's ejection fraction is. So again, some opportunities for some process changes.

We need to, as much as we can, minimize human error associated with manual worksheets, record review, data abstraction.

We cannot forget the technology and implementation costs, in terms of developing the functionality of an EHR, but then the tools needed to capture performance data.

Also I think we have to keep in mind that the hospital environment is all over in terms of their sophistication. For some hospitals, it may, for example, be easy to collect some of these measures, but for others it's very difficult. They don't have the resources. So we have to keep in mind that health-care organizations will need to adopt IT before the electronic health record can support performance measures.

To date, the pace of change to electronic data is slow. I think some people think that everyone will be automated tomorrow. It's not happening quite that fast. I think we are seeing some very positive things happening, but it is a very slow change.

I also think we need to think out of the box for future needs. The one thing I would really like to share with you and leave with you today is that I think we need to be really careful as we move to the electronic data -- even if it can really help us, on the other hand we need to be careful that we don't go backwards. We used to always tell people that a really bad way to develop measures is to say, what data do I have; thus, what could I measure? We need to be very patient-centered, and we need to look at the measures that are important to improve care, and thus what data we need. We cannot lose sight of that as we move to the electronic record.

I always like to illustrate all of the activities that are confronting hospitals with respect to quality and patient-safety efforts. This is a slide that Nancy Foster at the American Hospital Association put together a couple of years ago. I think it's really important to think. I often tell audiences at particular hospitals, if you are tired, this is why.

These are all extremely important efforts, but we need to find more efficient ways to manage our patient-safety efforts.

Just to give you an example of how fast this environment is moving, just look at CMS. When they started out with the Medicare Modernization Act in 2006, there were 10 measures. Now, if we look in 2007, there is a total of 21 measures. There are more coming in 2008. In 2009, we will have a total of 32. Right now we are looking at adding hospital outpatient measures. We are not sure what that number is. But that is just one initiative facing hospitals.

In terms of the Joint Commission requirements, I am sure that many of you are aware that the Joint Commission has done a lot of work in the hospital environment with CMS. We right now have four measure sets that we are aligned on. By aligned, I mean we have the exact same measure specifications, whether you go to our website or theirs. We also have two measure sets that we have unique to the Joint Commission.

Currently, for our accredited hospitals, we require them to collect data on three full measure sets. We will actually be moving, in 2008, to four sets. Currently, we have more than 3,800 hospitals that are collecting and reporting data to the Joint Commission on a quarterly basis.

I just want to talk for a moment about data quality, because data quality is extremely important to the Joint Commission. To date, in terms of collecting our measures, we use what we call a core vendor measurement system. They are really the data intermediary between the Joint Commission and the health-care organization.

I just want to highlight some of the things that we have done with respect to data quality.

The Joint Commission has always been attentive to data quality and the growing national dependency on the quality of performance-measurement data being used for accreditation, for payment, for public reporting. But as there is a substantial dependency on the functions of our Joint Commission vendors for hospital data -- if you are not aware, our vendors are data intermediaries currently for the data that is reported through the Hospital Quality Alliance. Ninety-two percent of that data comes through the Joint Commission vendors.

So we have a number of activities that we look at in terms of data quality. But the thing I really want to stress today is that we are actually ramping up our efforts this year. We do a lot of education. We do webcasts. We do monthly phone calls. We have always done vendor audits, but starting this year, we are actually beginning to watch the data on a quarterly basis. We will actually be assigning points to our different vendors. Depending on how many points one has in a quarter, you could get a call from us, we could do a desk audit of you, or we could do an onsite audit. So while we have always been attentive, we are really ramping up our efforts.

I just want to stress that even as we look to electronic data, we cannot lose sight of how important the data quality will be in the electronic record.

In terms of how the Joint Commission uses the data that we receive through our accredited hospitals with performance measures, first of all, we use it in our accreditation process. We have what we call the priority focus process, where we look at data that we have available to us on an accredited organization. Right now, for hospitals, about 50 percent of the data that we look at to really help us focus the survey is coming through our core measures. We are actually able to use the priority focus and our performance-measurement data to really drive surveys. If you have heard about our accreditation process, we are doing patient traces, where we actually identify patients that they follow through from where they entered and follow their care through. We are actually using our ORYX data or our core data to do that.

We also have what we call the ORYX performance-measure report. When the Joint Commission moved to unannounced surveys, four times a year -- because, again, we get our measure data quarterly -- we post the data on a secure extranet site for the hospitals, approximately a month after we receive all the data we posted, so that the hospital should know how they are doing and there should be no surprises. This is also the data that our surveyors are given about two weeks prior to an unannounced survey.

The Joint Commission also has what we call quality checks. It is a report that is available that tells you about our various accredited organizations, as well as how they are doing on our measures for hospitals and our national patient safety goals.

Then we have a new report that was just published in March that contains data for 2005, which shows you how our hospitals have been doing since we implemented our core measures, as well as how they are doing on national patient safety goals, et cetera.

DR. CARR: Sharon, I'm just keeping an eye on the clock, and also on this material, which is terrific. But if you could look to highlight the most important things --

MS. SPRENGER: Okay. What I want to do -- and I'm just going to piggyback on what Isis said -- I am just going to give some real-world examples. If you do not have an electronic health record, while the nursing set is an extremely important set, it can really be challenging to collect.

Just quickly, the nursing-sensitive care is a unique approach in assessing the quality of care. This is a very interesting set, as we do not have a single common population in the set. We are looking at patients, we are looking at nursing care, and we are looking at system factors in the set.

Isis already mentioned that we have funding from the Robert Wood Johnson Foundation to actually test this as an integrated set.

Just to highlight, there are 15 measures in the set. They came from eight different measure developers. What the Joint Commission did was to create standardized specifications and really put the 15 together as a set.

There are the 15 measures that we are going to test. Again, I think we have talked about how they have a different focus -- clinical nursing, system. We have different populations.

Then we have different data-collection approaches. Some are aggregate counts, some use survey data, and some use clinical assessment, as noted, looking at pressure sore prevalence.

The other thing that is really interesting about this set that is different, too, is the frequency of data collection. Some measures are monthly, some are quarterly, and one is annual.

I just quickly want to give you some examples to think about why it's so challenging to collect this data, depending on the systems that you have in place.

If we look at the measure, falls with injury -- you can look later at the numerator and denominator -- just looking at patient days by type of unit. For the numerator, we need falls and an injury level.

Just think if you are in a hospital and you have no automation and you are collecting this monthly. First of all, you have to keep track of these five different units for every patient that falls. You have to keep track of the injury level. If you had an electronic health record, it could all be embedded in there -- the definitions of the fall, injury level.

This would be information that isn't used to calculate the measure, but to start thinking about, if you had a DHR, the supplemental information you could collect to understand and use your data. For example, if you knew that the patient who fell had had a risk assessment prior to fall, if you knew that patient had been assessed for the fall risk, did you implemented any type of fall-prevention protocol?

I just want to highlight the infection measures that are in the set. They are all derived from the CDC. There is urinary tract infection, the VAP, ventilator-associated pneumonia, and the UTI. But keep in mind, again, if you are doing all of this manually, you report these by IC location. So depending on how many ICUs you have, just think about the data-collection effort to do that. Again, if you had an electronic health record, it would be transparent.

Again, you have to keep track of the number of patients every day if you are doing this manually that have a urinary catheter, a central line. Then you use that so that monthly you can determine, by adding up your log manually, how many patient days you have, cath days, et cetera.

Again, I think you can see the inherent issues with some potential for human error and data-quality issues, et cetera. Think about going back and auditing this. Quite a challenge.

I just want to point out to you that within these measures, the infection measures, we also looked at babies in NICUs. So we not only look at the type of device, but in this particular measure, the CDC, the way they designed the measure, wants the devices broken down by five different ways. So again, just think about if you are doing this manually and think what an electronic record could do for you.

Then I put in here the example of the nursing-care hours per day. Again, you are collecting this manually -- some of the worksheets that one would have to use to do this.

Again, the pressure ulcer prevalence. Again, we are assuming that everyone is using the National Pressure Ulcer Advisory Panel definition of what a pressure ulcer is. I think that is one of those opportunities that we have been hearing about throughout the day for standardization and everyone using the same definition, et cetera.

Again, just to give you an idea, if you were doing it manually, how you are keeping track of all these things.

Again, Isis mentioned the voluntary turnover. This is to give you an idea of, again, if you were doing this manually, what it starts to look like.

I think, by the last presentation, they certainly demonstrate that these measures make a difference. Wouldn't it be great if these measures could really be rolled out on a national level? I think, in order to do that, they certainly would be supported by electronic data.

So one of the things that I want to leave you with, just kind of a different way to think about an electronic record and what you could do -- just think about the opportunities that a health-care organization could have to assess and improve, with a standardized electronic data system. You really could start moving to systems thinking and really understanding the relationship among a system's parts, rather than the parts themselves.

I just give you a couple of little examples. Within this measure set there is a measure that looks at whether the patient was provided smoking cessation advice and counseling. If you had the opportunity, through electronic data, to explore some of those relationships, you could look at your smoking measure, but you could see, did you not do well because of your nursing skill mix? Was it due to the nursing-care hours per day? Is it because you have a lot of turnover? Is it because your nurses are dissatisfied?

Then if you start thinking longitudinally, again using a smoking-cessation measure, if the patient was readmitted and you had the opportunity to see that they were still smoking, you would have the opportunity to look and say, who provided the counseling on the previous admission? Should we use the same person? Do we need a different person? What type of counseling did we provide? Do we need to reassess our education plan?

I think one of the things that is so exciting, and the thing that we have to be really careful of -- you can see that there are so many things facing a hospital in terms of what they have to collect. But as we continue to add measures, we really need to have efficient systems so that they have time to use the data, because it really is all about the patient and improving the care.

DR. CARR: Thanks very much.

How many people who are collecting this data have this in electronic format?

MS. SPRENGER: For our test -- and we will actually be starting data collection July 1 -- we are looking for approximately 54 sites. Of the 54 sites that we selected -- we had over 200 volunteers -- we asked questions of different characteristics in terms of the electronic health record. Did they lab system, patient assessment, et cetera.

Of those 54 sites, approximately 20 percent have 50 percent or more of the things we ask automated. It will vary in terms of what they have electronically. Some may have a full electronic health record. Some may have certain pieces. But within their institutions, at least 50 percent of the things that could be electronic are.

DR. CARR: So that means that they are findable electronically and that they are in a relational database?

MS. SPRENGER: If you ask me a year from now, I will be able to tell you. One of the things that was interesting, when we were asked to present today, is that we are just beginning the data collection. But one of the things that we want to assess during the 12-month period of data collection is the use of electronic data sources and impact on reliability, whether databases are linked, et cetera. Data collection ends next June, so we should have more information.

DR. CARR: For now, all the data goes to you and you have it, whether they got it electronically or manually.

MS. SPRENGER: Right.

DR. CARR: Then you put it in a relational database and report back.

MS. SPRENGER: Right. For purposes of the test, keep in mind that one of the things that we are doing is assessing how they work together and the reliability of the data. Then we will report back during the 12-month period on how they did. But we will have organizations in the test who are doing it electronically. That's one of the things that we want to assess and also, through, that have the opportunity to see, whether they are doing it manually or electronically, any changes that we would have to make to the classifications, et cetera.

DR. CARR: Very interesting. Thank you for updating us on this.

Actually, Dick, what we chatted about at the halftime there, what you were saying about mortality and publicly reporting hospitals versus others --

DR. JOHANNES: The question arose as to whether there were data that related to the efficacy of public reporting as it relates to mortality rates. Many people showed slides today that multiple disease rates are, in fact, seeing that.

We presented some work at the Academy Health meetings last year. I am going to have to take you back to this morning. Think of the cylinder I showed you that described the clinical parameters. Those are the parameters that are used to build the risk assessment. So you plot a particular patient's values and calculate that person's predicted mortality from the time of admission.

We have hospitals, both in and out of Pennsylvania, and hospitals in states besides Pennsylvania that do public reporting. By matching patients with equal risk of mortality on admission, you can create something called a multivariate matching. What will happen is that the clinical parameters for those two populations of patients will appear very similar. They will have the same pulse rates, the same blood pressures, the same BNT levels, and all that.

We did that for states in and outside of Pennsylvania and then did the pre/post analysis using a difference-of-difference method, and showed that the hospitals with intensive public reporting had lower mortality rates than those that did not.

We have submitted that as a manuscript to the American Journal of Medical Quality. But I can make the presentation available.

DR. CARR: All right. I will take the chair's prerogative and just try to summarize some of the themes that we heard today and ask for comment on things that I didn't include.

Agenda Item: Wrap-Up

We began with, where are we today with the hybrid medical record, partially administrative, partially in some cases electronic, some of it data abstraction? What is the state of affairs today?

The things that I heard were:

We began the day, however, hearing about the administrative burden. It's still large, and growing. We also heard about the financial commitment, which is large and growing, both for FTEs to do data extraction and for development and implementation of electronic health record.

Those are highlighted themes. I will just say a few more words about some of these things.

I think, number one, in terms of achieving quality, public reporting helps the consumer, but, more importantly, it informs the people who can actually do something about the quality we hope to measure. With Virginia, a lot of times the physicians are seeing their own data for the first time when it is publicly reported.

Bay Care, who spoke this morning, said how they had had great success in their first year of the Premier initiative, only to find themselves falling behind and then having to look at themselves, realizing that they didn't have the systems and the infrastructure to support achieving the outcomes that they had sprinted to in the previous year. As a result, they had a major cultural change.

In terms of achieving quality, I think we heard that transparency has given credibility. Actually, this is something Simon and I heard last week. The commitment to transparency by institutions, regardless of what those measures are, demonstrates that there are measures and that an institution is committed to improvement.

So that was achieving quality.

The second thing is, there is no perfect system now and there is none in immediate sight. So we need to tweak our existing systems. It's not about all administrative or all electronic.

I think we heard very strong support of administrative data today. First of all -- something I hadn't thought about before -- administrative data is bounded by coding rules and billing rules, ensuring reproducibility and some structure.

We heard about the use of the AHRQ patient-safety indicators and quality indicators that have been very helpful in institutions looking at safety.

We heard about very elegant research on risk-adjustment models, I think strongly supported by publications in peer-reviewed academic journals.

Let me go on to convergence of approaches. We are getting more sophistication in the manipulation of the administrative data. We are making it better by supporting information such as present-on-admission, labs, vital signs, and timing.

Importance of acceptance -- I think we alluded to it before, having a CEO accept and have a willingness to report performance, as was the case in New York and also in Bay Care in Florida. In Kentucky, the same thing, including the board of trustees.

I think a key thing that we heard today throughout was clinician engagement and acceptance. If we think about the NSQIP data, that began with physicians who walked away from administrative data, frustrated by that, and made their own data set, which turned out to be very powerful.

But what we are founding is that many of the measures that NSQIP includes are now probably achievable through this blended hybrid and, in some cases, administrative data.

In Virginia, the emphasis for success was collaboration, science, no surprises, open process, and follow-up on all inquiries. We heard that also from Wisconsin. When clinicians whose data is being reported are questioning it, it calls for a sober, thoughtful response that reassures the clinician or, in the case of some error, improves the measurement, because it has been looked at from all sides.

Things that have gotten better:

On to administrative burden: I think it was completely outlined this morning by MGMA. I think one burden that goes across all these areas is the challenge of uniformity of definitions, exclusions, as we have just heard, and uniformity of the reporting format.

Another issue is financial burden. Hackensack Medical Center -- we heard about their costs in 2004. This year they found that their costs are rising.

Bay Care told us that switching to an electronic health record throughout their nine hospitals, they expect, will be a $200 million investment.

One thing that I think was very interesting is that I think we are hearing about a new job description related to this -- data abstractors. These are no longer just clerical people looking at a particular note. There is really a growing sophistication in terms of understanding the data definitions, understanding exclusions and nuance, knowing where to look in a medical record and how to find it. That was perhaps a hidden thing. We think someone in medical records will get this, but it's not just someone; it's someone very sophisticated.

A second thing was IS expertise. With some store-bought electronic health records, you have to create your own internal hacking to get in to find the reports that are not canned reports, but reports that you want. We heard that programs that have IS resources and decision-support resources on site have a much easier time -- also building that IS-quality interface, knowing where the data is and making it query-able in the data set.

We talked about the resources needed for data validity, reliability, review, and revision, and then decision support for evidence-based care.

So I think one thing that is important as we move toward an electronic health record, as we refer to the work that NQF is doing, the conference that Paul Tang chaired, is that we are not going to have all query-able fields. An electronic record is not going to have everything in a query-able field. As we heard with the presentation about the VA, perhaps the technology that we need is the software for word finding in documents. That may get us from here to there.

Some of the next steps I heard:

I think I will stop there and invite any others who might want to add anything.

DR. GREEN: That was terrific. Congratulations.

One thing I would like to call attention to that I didn't hear you mention was the relative neglect of children in the use of administrative and clinical electronic data for quality assessment. It seems to me we should draw attention to that in some way.

To just reinforce where it seemed to me you came down with your last sentence, the abiding message for me, from virtually everyone who testified today, was that we can make progress right now with the administrative and clinical data sets we have. We should not be using their imperfections and the problems that they have to delay.

I heard several people ask the National Committee on Vital and Health Statistics to do whatever it could to motivate action.

DR. CARR: Don?

DR. STEINWACHS: Since I was only here in the afternoon, I probably missed some things. But it seemed to me that there were some very useful comments about how much of this information is useful to consumers, and to think about where we are in that process and maybe where we are going. I heard less about where we ought to be going than where we are.

In Virginia, they found a great interest, as you might expect, in learning something about individual physicians. Largely absent from public reporting systems, in my experience -- and others probably know better -- is information that helps you know a little bit more about a physician, other than asking your neighbor, your relative, or your best friend. That might be one thing that is worth taking away.

I think the question is, what else is really valuable to a consumer? Even more so, as you think about the possible growth of consumer-directed health plans that are asking consumers to make choices in economic terms, as well as quality terms, what should we be holding out there?

I think all of us see public reporting as a way to drive the provider system to be more accountable. But I'm not sure public reporting is really connecting that well with the consumer, who ultimately wants to know, what is this going to do for me, and what are the risks I'm facing? They may also be interested in the economic consequences, in some cases.

DR. CARR: I agree. I was struck by the comment that it seems like it's about the consumers, and ultimately it is about the consumers, because care will get better. But the actual public reports do more to change providers and administrators than to change consumers.

DR. STEINWACHS: I would argue that part of that is because the construct, the way in which we collect information and provide it, fits into the medical paradigm. It doesn't necessarily fit into, "I'm a person. How well am I doing? What is this going to do for me as a person?" That is different from a disease, an operation, a disorder.

The only reason I am raising it is that it seems to me that we ought to recognize that there is a challenge out there still to make this useful to all stakeholders. Particularly, what can we do to try to lay a path to help the consumer?

DR. CARR: I think we heard an example of that, where you could go on to a site -- was it the Virginia program -- where you would type in "hip replacement" and then get to a field and then be interactive, ask you some risk factors, and then say, "In our institution, this is how you would do."

MR. SCANLON: I think that the initial burden is real. It's partly related to the fact that we are, in some respects, starting a process to improve in terms of what we know about care and how we can use that to effect better care. Whenever there is a transition, probably the most painful is starting the transition. Maybe it's a physics phenomenon, that it takes the most amount of energy to overcome inertia.

We shouldn't let that be the barrier and cause us to abandon this. But at the same time, there is a real question of whether some of this burden is totally unnecessary. How can we, in the short term, eliminate some of that unnecessary burden?

All those different reporting requirements, the fact that different requesters are demanding different things -- and many of them are in a position to demand it and various actors have to comply -- it is an issue.

We have had things like VHS versus Beta. It served no purpose, in some respects. It's unclear exactly how we can cut through this to try to create some kind of standardization that is protective of innovative, but at the same time, does not tolerate a lot of waste in the process. I think that's very important.

The other thing is that I think we need to really try to maximize what we can get out of the electronic record, in terms of what is retrievable. It actually makes me nervous to create an occupation of data abstractors. If you look at the demographics of the country, we are going to have labor shortages in the future. Health care is already sopping up an incredible amount of labor. We need to figure out ways to make health care less labor-intensive and more capital-intensive. I think there is going to be a lot of restructuring of jobs. Part of that restructuring of jobs in the future is going to be how you use capital like IT. It's going to start at the top with the most highly trained physician and it's going to work its way down, through different occupations.

I think we would like to avoid having to have too much labor support, as opposed to capital support. That's going to be an important thing for us to think about.

DR. CARR: Carol?

MS. MCCALL: I thought today was great. I loved everybody's comments.

A couple of themes, just to build on what you said and what Bill just said.

I think there is a paradox about how we go about this, and it leads us to a nation of data abstractors. I think sometimes how we think about this is maybe 10 or 15 years behind technologically and what is, in fact, possible. I think it's important to kind of bring what is here today into what is possible around health care.

But I do think we are at a new point. I think we are entering a new age. I think the pain of not doing something is going to become greater than the difficulty of not doing it. I think we may finally be there.

I would echo that I think one of the biggest roles that we can play is how we can limit the unnecessary burdens so that we can, in fact, move forward.

I heard Denise this morning talk about systems. I heard a lot about "systemness." But there was something that actually kind of disturbed me about the discussion this afternoon. It was around some of the nursing-sensitive things. I think there is so much opportunity there, it's just exquisite.

Systems, to me, have three things. If the system is big, it has a way for me to get from A to B to C to D, and all the way around. It also has feedback loops. It also has explicit and built-in means to learn and adapt. Those are systems.

Back to the first one, these hooks that allow us to get from A to B, I think we need to make sure that our databases have hooks, common data elements that allow me to move from one view of the world to another. For example, these exquisite databases around nursing-sensitive processes are not built from the bottom up today -- not all of them -- with the patient as a key point of that unit of analysis. I would love nothing more than to take the nursing data and turn to Ben and Norton, can we marry that stuff up? I don't think today that we can. I would love to link it to socioeconomics.

So we have to think about hooks -- not just the measures, but how we then link it.

That was a big theme.

The second theme that I heard today was the hidden power of transparency. It does have hidden power. It's not just about turning people into good shoppers. There is a hidden power that comes by just simply revealing this to the people that are creating it. So there is a desire for everybody to find themselves in the data. Whether I want to find my twin, as a consumer -- a 44-year-old woman with kids, some woman in Illinois -- or whether it's for physicians to find themselves, who say, "Hey, I can't find my data" -- because we heard that, too.

I also heard about transparency. I think we need to push to have it extended, not just data, but also measures. Also when Anne spoke, I found myself wanting to make transparent the methodologies, the analytics, the algorithms, the models themselves. I think that becomes an important theme, because I think they need to become standard. We can have the same measure, and yet I can run it through a bunch of weird science and come up with a different answer.

I think we can also harness more open models, á la Linux. I think there is an opportunity to engage the experts that today are not connected and to do so in an open and innovative form. I think that could be how we preserve standardization and innovation. It is an operating system of a type, and at some point, there are updates to it, at the right time and in the right way. But there are ways to innovate in that world and there are models to get that done.

I think it taps the engagement of the people whose world they are creating. Engagement with something, whether you are designing a metric or a measure or an algorithm, or trying to change care, fundamentally changes your relationship with that thing. If providers and nurses and the people who are delivering care had some form in which they might have influence on the next generation of measures or whatever, they would feel a different relationship to that thing. It just changes everything.

With that, I will turn it over.

DR. CARR: Please.

DR. BICKFORD: Carol Bickford, American Nurses Association.

In the conversations that we have had this afternoon and the summary session, I would like to invite you to consider that there is another population that is not well represented in this conversation, and that is those in skilled nursing facilities and long-term care. We have no information-technology support for them, except in very rare cases. We are not taking a look at the indicators from the acute-care environment that could improve the quality of care in some of those settings.

The second thing is, we are focusing on pathology and what is broken. We have not taken a look at the significant valuable commodity known as prevention and health promotion. If we put that in our thinking caps, it perhaps creates a new framework for what our quality initiative is. Rather than dealing with the broken stuff, let's look at how we can make sure that it doesn't get broken.

DR. CARR: Thank you. Very good.

Thank you, everyone. I have learned a great deal. I think we have had a great collaboration and sharing. I thank you for taking the time to come today.

(Thereupon, at 5:08 p.m., the meeting was concluded.)