Overview of OASIS Measure Properties


Review Process

Dr. Bartlett explained that, at the end of the meeting, panel members would be given a form on which to indicate their top 12 measures for the NHQR and CMS public reporting. Prior to this exercise, panel members would discuss each of the 41 measures. They would have the opportunity to raise questions and seek clarification.

Ms. Terry asked why the number 12 was selected. Dr. Thoumaian explained that CMS is seeking 8-12 measures, based on its experience with Nursing Home Compare. Dr. Sangl added that AHRQ is seeking a robust set of measures according to the IOM criteria that will provide an overview of quality of care.

Shulamit Bernard, Ph.D., R.N., asked whether a composite of the measures could be created. When Dr. Shaughnessy indicated that this was possible, Dr. Bernard asked him to indicate which measures could behave well enough as composites. This could help panel members identify measures or composites to add to their lists.

Linda Scott, M.S.H.A., R.N., asked to see an example of the information in Nursing Home Compare, and this was provided.

Dr. Bartlett noted that the meeting workbooks included a list of the OASIS measures in plain language, domains, categories in each domain, and measures in each category. The workbook provided a summary technical sheet for each measure and a criteria worksheet to help panel members take notes. It also included a large spreadsheet of all of the measures.

OASIS Measures: Download in PDF format (78 KB) PDF Help
Statistical Properties of Individual Outcome Measures (Zipped Files)

Outcome Measure Properties Sheets

David Hittle, Ph.D., explained how to interpret the outcome measure property sheets for each measure. The explanations for each item in the worksheet are as follows:

Category—This is based on the IOM patient needs criteria.

Measure Name as it Appears in OBQI/OBQM Report—Each measure has an official name and one that appears in the reports sent to providers.

Operational Definition of Measure: How the Measure is Calculated—Improvement measures take the value of 1 when, at discharge, the value on the OASIS item that underlies that measure has shown improvement from the start or resumption of care. Stabilization measures take the value of 1 when, at discharge, the value on the OASIS item is the same or has improved. For the three utilization outcomes (discharge to the community, emergent care, and hospitalizations), all that is measured is what happened at the end of the episode. The measure is coded as 1 if the event in question occurred and 0 if it did not occur. Emergent care and hospitalization are negative outcomes, whereas discharge to the community is positive.

Operational Definition of Measure: Patients for Whom Measure is Calculated—Improvement measures are not calculated for patients for whom improvement is not possible, because they are at the least disabled end of the scale. Similarly, stabilization measures are not calculated for patients whose condition cannot deteriorate any further.

Clinical Relevance—This section describes the process used to develop the measure.

Statistical Properties: Reliability of Underlying OASIS item(s)—Reliability is based on independent assessments by two different clinicians carried out within 24 hours of each other. A level above 0.6 is adequate, above 0.8 is quite good, and 0.9 is almost perfect. This number provides a sense of the relative inter-rater reliability of different items.

Statistical Properties: Observed (Patient-Level) Outcome Rate (Standard Deviation)—This reflects the percentage of patients nationwide who achieved the outcome in question.

Statistical Properties: Risk Adjustment Statistics—This section provides the number of risk factors in the model. The variance explained (R2) statistic for a dichotomous outcome is generally lower than that of a continuous outcome. Some statisticians may regard these rates as low. Dr. Hittle and his colleagues are merely reporting these numbers, and the panel members would determine where to set the threshold for risk adjustment.

Statistical Properties: Outcome Measure Dispersion—This reflects the amount of variability among agencies.

Statistical Properties: Potential Redundancy—The researchers examined correlations between different outcomes to see if one measure reports the same result for each patient as another measure.

Home Health Agency Use as a Target Outcome—This section is related to the clinical relevance and actionability criteria. It provides statistics on how often agencies select the item and, when they do target it for improvement, how often that have an impact on the outcome. The investigators combined the results of two OBQI demonstrations plus the results of the five-state OBQI pilot to calculate these rates, which are reported as percentages of time that the measure improved. Hospitalization is not included in the number of times chosen as a target outcome, because all agencies in the demonstration projects selected this outcome. The percentage of times chosen for outcome ranges from 0 to 15 percent.

Discussion

Dr. McGee asked about the number of agencies included in these statistics, given that they are based on agencies with at least 30 eligible cases. Dr. Hittle said that these rates are based on a percentage of all agencies in the nation. Dr. Shaughnessy added that about 7,000 agencies exist in the United States. Dr. McGee asked why the number of patients included varies from measure to measure. Dr. Hittle explained that the hospitalization outcome is the least exclusive, that is, the fewest patients were excluded from the computation of that outcome. Some agencies do not have 30 cases with which to compute outcomes.

Ms. Ketcham stated that when agencies receive outcome reports, they learn which measures were significantly better or worse by agency. Dr. Hittle explained that the dispersion characteristics provide a rough idea of the extent to which agencies differ on a particular outcome.

Ms. Terry pointed out that the reliability of the underlying OASIS item for improvement of dyspnea is high. Dr. Shaughnessy explained that the study that is the basis for the reliability coefficients was rigorous, and the investigators made sure that follow-up occurred no more than 24 hours after the initial assessment. This was intended to show inter-rater reliability as opposed to concurrent reliability, in which the second rater may pick up on cues from the first rater. Concurrent reliability is not regarded as an appropriate measure of inter-rater reliability for OASIS. It would be wise to conduct this reliability study again in a year or two, when providers have more experience with OASIS.

Ms. Terry noted that when agencies observe risk-adjusted outcomes, they are provided with data on expected and observed outcomes. However, descriptive outcomes provide only the national mean and the agency's results. People who do not see these reports do not know that the reports are quite different for outcomes expected vs. observed than for national mean vs. observed (for descriptive outcomes). Dr. Hittle noted that risk-adjusted statistics are provided even for descriptive outcomes. Panel members can judge whether those will make the grade in terms of risk adjustment.

Summary Tables

Dr. Shaughnessy reviewed Tables 1-3 in the workbook, which provide an overview of statistics for all measures.

Table 1: Download in PDF format (95 KB) PDF Help
Table 2: Download in PDF format (77 KB) PDF Help
Table 3: Download in PDF format (48 KB) PDF Help

Table 1 is a summary of the statistics presented on the second page of the form that Dr. Hittle had reviewed. The table lists all 41 outcome measures, grouped by the 13 categories under which they would be reviewed. The table provided the following information:

Outcome Measure—The range of values in the scale used to define the measure. The improvement measures are defined by answering questions about whether the patient improved according to that scale, except for patients who could not improve and were therefore excluded from that measure.

The Patient—level Mean or Rate-this is the average improvement for this outcome based on all of the patients in the United States who provided these data.

The Reliability Coefficient for the Underlying Variable—this generally ranges from 0.5 to 0.8 or 0.9. The higher the value, the more reliable the item is in this test.

Validation R2—The R2 statistics are presented for both the patient and agency levels. Every outcome measure had a risk adjustment model, even though the risk-adjusted values are not provided for measures for which the adequacy of risk adjustment for the CMS report was insufficient. For the risk models for the patient level, 500,000 out of one million patients served as the developmental sample for the risk model. The R2 statistic represents the percentage of variation explained in that outcome measure by the risk factors. For example, the 51 risk factors examined explain 11 percent of the variation for the dyspnea measure. The second R2 is based on the agency as the unit of analysis, but the patient-level model is more important. A reasonable cutoff for this statistic at the patient level is 0.10. Some models drop below 0.10 for outcomes where the risk factors are not very explanatory. What may really make a difference is quality of care. Dr. Shaughnessy cautioned that 0.10 should perhaps not be the driving force for inclusion or exclusion.

The Validation C-statistic—this is another measure of fit. Many use a cutoff of 0.7 or higher for this.

Number of Risk Factors—These are presented for both the patient and agency level. This figure indicates the number of risk factors that were used.

Table 2 presents statistics for agency-level outcome measures, while Table 3 does the same for patient-level and agency-level adverse events. In Table 2, the coefficient of variation is the ratio of the mean to the standard deviation, based on the number of agencies with at least 30 patients.

Discussion

Dr. Gerteis explained that the focus group results are summarized under Tab 14 of the workbook, but the categories used in her study were reorganized under three domains for presentation of the statistics on the measures by each category. In Dr. Gerteis's tables, the checkmarks indicate measures that respondents selected most often as important and meaningful, although this means that they did not select others. The narrative under each table summarizes the main points of the discussion.

Consumers' and Intermediaries' Response to OASIS Measures of Home Health Quality: Download in PDF format (414 KB) PDF Help

Ms. McCall pointed out that physicians and discharge planners sometimes do not appear to believe that homecare agency can do anything about particular problems. She agreed that if someone is short of breath, there may be little that an agency can do. Ms. Teenier pointed out that this addresses actionability.

Mr. Fitzgerald pointed out that the BearingPoint tables do not address stabilization. Dr. Gerteis explained that stabilization was not tested separately. All three groups found the stabilization measures difficult to understand, which is why few of them were selected by respondents.

Dr. McGee suggested, as a proxy for outcomes that home health agencies can improve, the number of agencies that selected the outcome and the percentage of those that improved. But Ms. Ketcham characterized this approach as potentially misleading. Kathy Crisler, R.N., M.S., pointed out that one of the criteria used to select target outcomes was clinical relevance. Ms. Ketcham explained that agencies identify the items on which their performance differs from that of other agencies. From that set, they select those that are most clinically relevant. They are not likely to choose an outcome in which they are outperforming others. But Ms. Crisler pointed out that if an agency is outperforming others, it can select that outcome to reinforce the care it is providing. In the QIO pilot, many agencies chose outcomes to reinforce, not just to improve.

Ms. Fredland identifies areas in which her agency is not doing as well as it would like on the OBQI reports. She does not address areas where the agency's performance is better than average, because the goal is to increase quality. Her agency performs at least as well as average on almost all measures and selects the measures on which its performance is lowest to target for improvement.

Ms. Scott pointed out that the OBQI reports address patient outcomes as defined by OASIS scoring. For agencies to select a measure to improve, they must understand the definition used for that outcome. Sometimes agencies appear to improve only because clinicians have improved their understanding of the terms used to define the status of the outcome.

Dr. Bernard stated that the fact that physicians see these outcomes as not actionable, based on Dr. Gerteis's findings, presents a challenge. Agencies cannot cure lung cancer, but they can teach families how to address dyspnea. Regulating pain is one of the top issues that should be considered by a home health agency, and the fact that consumers do not see this as important is a concern.

Dr. McGee finds the measures selected by the agencies to be valuable information, because they must select something that is relatively important and amenable to improvement. Agencies have spoken about where to devote resources. Dr. Shaughnessy noted that agencies do not like to choose certain measures, such as hospitalization. However, when they selected hospitalization in the demonstration because they were required to do so, they performed remarkably well.

Ms. McCall does not want to hold agencies accountable for outcomes they cannot control. Ms. Terry asked panel members to keep in mind as they select measures that some of the outcomes to which they will hold agencies accountable have not been seen by the agencies.

Task of the Expert Panel

Dr. Bernard pointed out that the expert panel is faced with two very different tasks in identifying indicators for the NHQR and CMS public reporting. Unless a measure shows variability between agencies, it should not be posted on the Web site. But for providing the Nation with a picture of home healthcare's current status, a measure with little variation might be important. Many people have pulmonary problems, for example, so knowing how to address them is very important. Dr. Bartlett explained that the panel members were asked to advise for both purposes. Dr. Bernard clarified that rather than 12 measures, panel members were actually asked to identify as many as 24.

Dr. Shaughnessy asked who NHQR's audience is. If the audiences for the two purposes are different, then the panel really had two tasks. Dr. Kelley agreed that the two lists of measures developed by the panel members were likely to be quite different. The NHQR is a national report on the state of healthcare quality across the entire Nation and at the state level. The audience for the report is Congress, policymakers, and analysts. In the future, AHRQ probably will develop products derived from the NHQR for consumers. Potentially, panel members could develop two lists, but AHRQ and CMS hoped that some overlap would occur. Dr. Shaughnessy pointed out that the audiences are somewhat similar. Dr. Bernard noted that the purposes are different.

Brian Lindberg asked whether providers are compared to others in their state. Dr. Kelley explained that the NHQR will not provide such comparisons. It will be a brief, 50-page document that offers a national estimate with some breakdowns, including state breakdowns where possible.

Dr. Thoumaian said that CMS is using the same criteria as AHRQ, but for a different audience. Dr. Sangl added that on the CMS Web site, consumers will be able to obtain detailed information on specific agencies and providers. AHRQ will produce a report that provides a national summary and some state-level information.

Mr. Lindberg asked the participants not to lose sight of the importance to providers of publicly reported data. Long before consumers become sophisticated enough to use these data, providers will review them. Ms. Scott explained that providers already obtain this information through the OASIS reports. However, Mr. Lindberg noted, public reporting has a different impact. Dr. Golden explained that the public report is for consumers. Other important audiences will include members of boards and chief executive officers (CEOs) of agencies who will assign resources.

Beth Kosiak, Ph.D., pointed out that the issue of accountability arises in recognition that, in addition to consumers, boards and providers will use the data. Agencies can observe how they are doing, and releasing this information has an important effect. Releasing information on state and national rates is oriented toward public health rather than accountability. The NHQR might be vital to Congress in highlighting areas to which the Nation and the states need to pay attention. It will not necessarily be geared to providers.

Top of Page


Previous Section   Previous Section    Contents     Next Section   Next Section