HHS Logo: bird/facesU.S. Department of Health and Human Services

Alternative Risk-Adjustment Approaches to Assessing the Quality of Home Health Care: Final Report

Christopher M. Murtaugh, Ph.D., Timothy R. Peng, Ph.D., Gil A. Maduro, Ph.D., Elisabeth Simantov, Ph.D., and Thomas E. Bow, M.A., M.S.W.

Center for Home Care Policy and Research, Visiting Nurse Service of New York

July 2006

PDF Version


This report was prepared under contract #HHS-100-03-0011 between the U.S. Department of Health and Human Services (HHS), Office of Disability, Aging and Long-Term Care Policy (DALTCP) and the Urban Institute. For additional information about this subject, you can visit the DALTCP home page at http://aspe.hhs.gov/_/office_specific/daltcp.cfm or contact the ASPE Project Officers, William Marton and Hakan Aykan, at HHS/ASPE/DALTCP, Room 424E, H.H. Humphrey Building, 200 Independence Avenue, S.W., Washington, D.C. 20201. Their e-mail addresses are: William.Marton@hhs.gov and Hakan.Aykan@hhs.gov.

The opinions and views expressed in this report are those of the authors. They do not necessarily reflect the views of the Department of Health and Human Services, the contractor or any other funding organization.



TABLE OF CONTENTS

ABSTRACT
EXECUTIVE SUMMARY
INTRODUCTION
RISK-ADJUSTMENT OF HEALTH CARE QUALITY MEASURES
Theoretical and Conceptual Models
Current Approach to Risk-Adjustment of OBQI Quality Indicators
Other Empirical Research on Home Health Outcomes Using OASIS
Empirical Research on Nursing Home Outcomes
Reliability of OASIS Data
Project Framework
METHODS
Source of Data
Analytic Methods
DEVELOPMENT OF ALTERNATIVE MODELS AND RESULTS OF ANALYSES
Preliminary Analyses
Technical Advisory Group (TAG) Meeting
Final Data Analyses: Risk-Adjustment Models
Final Data Analyses: Agency Impacts
CONCLUSIONS AND IMPLICATIONS
REFERENCES
NOTES
LIST OF FIGURES (separate file)
FIGURE 1: Performance of Alternative and Current Risk-Adjustment Models for Improvement in ADL Outcomes
FIGURE 2: Performance of Alternative and Current Risk-Adjustment Models for Improvement in IADL Outcomes
FIGURE 3: Performance of Alternative and Current Risk-Adjustment Models for Improvement in Physiologic and Cognitive Outcomes
FIGURE 4: Performance of Alternative and Current Risk-Adjustment Models for Utilization Outcomes
FIGURE 5: Inclusion of OASIS-B1 Items in Risk-Adjustment Models
FIGURE 6: Inclusion of OASIS-B1 Elements in Risk-Adjustment Models
FIGURE 7: Mean Percentage Point Difference in Agency Performance Using Current versus Full Alternative Models
FIGURE 8: Spearman’s Rank Correlation Coefficient for Agency Outcomes Using Current versus Full Alternative Models
FIGURE 9: Mean Percentage Point Difference in Agency Performance Using Current versus Core Alternative Models
FIGURE 10: Spearman’s Rank Correlation Coefficient for Agency Outcomes Using Current versus Core Alternative Models
LIST OF TABLES (separate file)
TABLE 1: Outcome Measures Used in OBQI
TABLE 2: Preliminary Set of Core Risk Factors Used in Initial Analyses
TABLE 3: Final Set of Core Items Included as Risk-Adjusters in All Alternative Models
TABLE 4a: Risk-Adjusters Specific to ADL Outcome Models
TABLE 4b: Risk-Adjusters Specific to IADL Outcome Models
TABLE 4c: Risk-Adjusters Specific to Physiologic Outcome Models
TABLE 4d: Risk-Adjusters Specific to Cognitive and Utilization Outcomes
TABLE 5a: “Prior” Risk-Adjusters Specific to ADL Outcomes
TABLE 5b: “Prior” Risk-Adjusters Specific to IADL Outcomes
TABLE 5c: “Prior” Risk-Adjusters Specific to Physiologic Outcomes
TABLE 5d: “Prior” Risk-Adjusters Specific to Emotional/Behavioral/Cognitive Outcomes
TABLE 6a: Summary of Regression Models: Activities of Daily Living
TABLE 6b: Final Alternative Risk-Adjustment Models for Activities of Daily Living Outcomes
TABLE 7a: Summary of Regression Models: Instruments Activities of Daily Living
TABLE 7b: Final Alternative Risk-Adjustment Models for Instrumental Activities of Daily Living Outcomes
TABLE 8a: Summary of Regression Models: Physiologic Measures
TABLE 8b: Final Alternative Risk-Adjustment Models for Physiologic Outcomes
TABLE 9a: Summary of Regression Models: Emotional/Behavioral Measures
TABLE 9b: Final Alternative Risk-Adjustment Models for Emotional/Behavioral Outcomes
TABLE 10a: Summary of Regression Models: Cognitive Measures
TABLE 10b: Final Alternative Risk-Adjustment Models for Cognitive Outcomes
TABLE 11a: Summary of Regression Models: Utilization Measures
TABLE 11b: Final Alternative Risk-Adjustment Models for Utilization Outcomes
TABLE 12: Summary of Agency-Level Analyses: Activities of Daily Living
TABLE 13: Summary of Agency-Level Analyses: Instrumental Activities of Daily Living
TABLE 14: Summary of Agency-Level Analyses: Physiologic Measures
TABLE 15: Summary of Agency-Level Analyses: Cognitive Measures
TABLE 16: Summary of Agency-Level Analyses: Utilization Measures


ABSTRACT

Background and Purpose. The Outcome-Based Quality Improvement (OBQI) program provides reports to all Medicare-certified home health agencies so that they can identify potential quality problems and devise appropriate strategies to address them. There are 41 OBQI quality measures. A data-driven “stepwise” approach currently is used to risk adjust the OBQI indicators with a separate set of risk factors included in the risk-adjustment model for each outcome. The purpose of this project was to use a theory and evidence-based approach to develop and test alternative risk-adjustment models for the OBQI quality indicators within the frame of the existing Outcome and Assessment Information Set (OASIS) instrument.

Methods. The data analyzed in this project were obtained from the Centers for Medicare and Medicaid Services’ (CMS) contractor at the University of Colorado. They drew the data from the OASIS National Repository at CMS to create discrete episodes of home health care during calendar year 2001. In this project, alternative models were estimated sequentially after replicating the current risk-adjustment models. The first model was limited to the admission (or baseline) value of the outcome indicator and a core set of risk-adjusters. Subsequent models included a small number of outcome-specific risk-adjusters. Following development of a final set of alternative risk-adjustment models, an agency-level analysis was conducted to determine the impact on agencies’ quality ratings.

Results. The alternative models that include outcome-specific risk-adjusters typically have slightly lower explanatory power than the current models. This finding is not surprising since the “stepwise” approach used to develop current models is likely to result in models with close to the best explanatory power possible for the data set analyzed. The number of OASIS data items required to risk-adjust all outcomes, on the other hand, is considerably higher for the current compared with the alternative models. The agency-level analysis examined how the alternative approaches to risk-adjustment of the OBQI indicators affect an agency’s quality ratings as calculated by CMS for public reporting. For most agencies and most outcomes, the adjusted proportion of patients with an outcome and the agency’s ranking relative to other agencies is similar regardless of whether the current or alternative model is used to risk-adjust outcomes.

Conclusions. The results suggest that the relatively small reduction in explanatory power of most of the alternative risk-adjustment models for the OBQI indicators is unlikely to have a substantial effect on the quality ratings of the majority of agencies. A theory and evidence-based modeling approach, then, has the potential to simplify risk-adjustment and provide a consistent and stable basis for risk-adjustment relative to the current approach. This should make it more understandable to providers and encourage individual agencies to risk-adjust their own outcomes. The reliance on a smaller number of OASIS data elements, in addition, would contribute to the Department’s efforts to streamline the OASIS instrument and potentially facilitate the identification of a parsimonious set of clinical measures appropriate for data exchange in an electronic health record environment.


EXECUTIVE SUMMARY

Background and Purpose

One of the central goals of the U.S. Department of Health and Human Services is to improve the quality of health care received by all Americans. In the home health care area, the Department has two key initiatives developed and implemented by the Centers for Medicare and Medicaid Services (CMS) to assess, improve, and report quality. The Outcome-Based Quality Improvement (OBQI) program provides reports to all Medicare-certified home health agencies so that they can identify potential quality problems and devise appropriate strategies to address them. The Home Health Quality Initiative (HHQI) uses a subset of the OBQI quality measures for public reporting.

There are 41 home health quality measures in the context of the OBQI framework including functional, physiologic, emotional/behavioral, cognitive, and health care utilization outcomes. The source of the data used in OBQI and HHQI is the Outcome and Assessment Information Set (OASIS). Since July 1999, home health agencies participating in the Medicare or Medicaid programs have been required to collect OASIS on all patients age 18 or older admitted to Certified Home Health Agencies. The two exceptions are persons receiving pre or postpartum maternity services and those receiving only personal care, chore or housekeeping services.

Thirty of the 41 OBQI quality indicators now are risk-adjusted when comparing outcomes for patients from one agency with outcomes for patients from all agencies in OBQI reports. An additional OBQI patient outcome indicator (Improvement in Pain Interfering with Activity) is risk-adjusted for public reporting in HHQI but not in OBQI reports sent to agencies. A data-driven “stepwise” approach currently is used to risk-adjust the OBQI indicators with a separate set of risk factors included in the risk-adjustment model for each outcome.

The purpose of this project was to use a theory and evidence-based approach to develop and test alternative risk-adjustment models for the OBQI quality indicators within the frame of the existing OASIS instrument. Specifically, instead of using a separate set of risk-adjusters for each OBQI quality indicator where risk-adjusters are primarily determined based on their statistical fit to the model, this project used a core set of risk-adjusters in all models that theory and prior research suggest are important determinants of home health quality. Advantages of a theory and evidence-based approach include simplicity, understandability, stability of the risk-adjustment models over time, conceptual meaningfulness, and the potential for greater parsimony in data elements when a large number of outcome indicators are being risk-adjusted, as is the case in the OBQI program. Findings from the project will contribute to CMS’s future plans for continued refinement of risk-adjustment and outcome measures, and support the Department’s efforts to reduce regulatory burden by streamlining OASIS.

Methods

Analyses were conducted in two major phases: preliminary data analyses and final data analyses. Preliminary data analyses included replication of the CMS risk-adjustment models for the first set of 11 outcomes reported in HHQI, and development of alternative models for these outcomes. A Technical Advisory Group (TAG) meeting then was conducted with experts in home health care and risk-adjustment as well as policymakers and provider representatives. Based on the results of the preliminary data analyses, the TAG provided input on our initial approach. After the TAG meeting, final data analyses were conducted. The project team replicated the current models for the remaining 20 quality measures that are currently risk-adjusted in OBQI or HHQI. A final set of alternative risk-adjustment models then was developed for all 31 OBQI quality indicators, followed by an examination of the impact of alternative risk-adjustment models on agency quality ratings.

The data analyzed in this project were obtained from the CMS contractor at the University of Colorado. They drew the data from the OASIS National Repository at CMS to create discrete episodes of home health care during calendar year 2001. The file includes episodes of care beginning and ending within the calendar year. Approximately 1,500,000 OASIS episodes are present in the overall data set. The University of Colorado randomly assigned about a third of the episodes to the developmental sample for initial estimation of risk-adjustment models for most outcomes. The remaining 1,000,000 were used to validate the final models derived from analysis of the developmental sample.

In the preliminary data analyses, six alternative models were estimated for each of the 11 initial HHQI outcomes. We began with a model limited to the core set of clinical, demographic and payment risk-adjusters, including the baseline value of the outcome measure if it was not already among the core variables. Outcome-specific risk-adjusters were added at subsequent steps: Model 2 included other clinical characteristics at baseline that might plausibly affect the outcome, and Model 3 included measures of clinical status prior to home health admission. Four clinical therapies at baseline (i.e., oxygen therapy, IV/infusion therapy, enteral/parenteral nutrition, and ventilator) then were added to the risk-adjustment models of the 11 HHQI outcomes. Living arrangements and social support indicators were added next. Finally, home health episode length of stay (LOS) was added solely to allow comparison of current and alternative model statistics and parameter estimates.

Only three alternative models were estimated for each of the 31 outcome indicators in the final data analyses.

The rationale for examining prior health status variables separately from clinical measures on admission is because of questions regarding the reliability of the former and possible elimination from the OASIS instrument.

The decision to estimate only three sequential models, as opposed to the six estimated in the preliminary analyses, was based on the advice of the TAG and further analysis of the living situation and informal support/assistance measures following the TAG meeting. The analysis confirmed that these factors contributed relatively little to the explanatory power of risk-adjustment models with the exception that they very modestly improved the explanatory power of the Improvement in Medication Management risk-adjustment model. Following this analysis, the living situation and informal support/assistance measures were excluded from all alternative models.

Four sets of statistics were estimated for each current and alternative risk-adjustment model:

The total number of OASIS items and elements used to risk-adjust all OBQI quality indicators also was compared.

An agency-level analysis was conducted following development of a final set of alternative risk-adjustment models. The purpose was to determine how the different approaches to risk-adjustment affect an agency’s quality ratings. Approximately 5,000 agencies were included on the calendar year 2001 files provided to the project team by the University of Colorado. Two “adjusted” agency outcome rates were calculated for each of the 31 outcomes currently risk-adjusted in OBQI or HHQI. One of the adjusted rates was estimated using the current risk-adjustment model and the other was estimated using the “full” alternative model (i.e., Model 3 which includes outcome-specific and “prior” OASIS items, or Model 2 where there were no relevant prior items).

Development of Alternative Models and Results of Analyses

The preliminary set of theory and evidence-based core risk-adjusters in the first phase of the project, where we focused on the original 11 HHQI outcomes, was drawn from a number of domains covered by the OASIS instrument. The selection of the final set of core risk-adjusters was based on findings from the preliminary analyses, comments of TAG members, and examination of a small number of additional OASIS items provided by the University of Colorado following the TAG meeting. In addition to the core, approximately 2-3 outcome-specific risk-adjusters were included in the final, “full” risk-adjustment model developed for each of the 31 OBQI outcomes currently risk-adjusted by CMS. In addition, 1-3 directly related, conceptually important “prior” health status measures were included in the full risk-adjustment models of most of the health status outcomes. The great majority of core as well as supplemental risk factors are clinical measures at baseline suitable for inclusion in electronic health records. All risk-adjusters were constructed from routinely collected OASIS data elements.

Comparison of Current and Alternative Models

Overall results from the comparison of the current and alternative risk-adjustment models are described first, followed by results for specific domains (e.g., Activity of Daily Living (ADL) measures, physiologic indicators). In general, the “full” alternative models typically have slightly lower explanatory power than the current risk-adjustment models. Specifically, the R-squared statistic for the full model tends to be within 1-2 percentage points of the R-squared statistic for the model developed by the University of Colorado. There is a similar pattern for the c statistic. While the number of OASIS items and elements used to risk-adjust a given outcome is sometimes larger and sometimes smaller the alternative model compared with the respective current model, the overall number of OASIS items and elements employed when risk-adjusting all 31 OBQI outcome indicators is considerably smaller for the full alternative models (64 versus 88 OASIS items, and 93 versus 135 OASIS elements).

The ADL and IADL outcomes represent 23 of the 41 OBQI quality indicators and over two-thirds of the 31 outcome indicators currently risk-adjusted by the University of Colorado.

“Prior” OASIS items contribute little to the explanatory power of the risk-adjustment models for the remaining health status outcomes. The one exception is risk-adjustment model for Improvement in Urinary Incontinence, a physiologic outcome in the OBQI framework. Among physiologic outcomes, the alternative risk-adjustment model for Improvement in Urinary Tract Infection (UTI) performs considerably worse than the current UTI risk-adjustment model. The R-squared statistic for the full model is 5.9% compared to 12.1% for the current model, and corresponding c statistics are 0.665 and 0.740. The main reason for this difference is the exclusion of home health episode LOS from the alternative model.

No “prior” OASIS items were included in the alternative models for the utilization outcomes (i.e., Acute Care Hospitalization, Discharged to the Community, and Emergent Care). As was the case with the UTI risk-adjustment model, the exclusion of LOS reduces the explanatory power of the alternative models for the three utilization outcomes relative to current models.

Agency Analyses

Regardless of whether the current or “full” alternative model was used to risk-adjust outcomes, the quality ratings for most agencies on most outcomes are similar. In particular, the difference between the current and alternative risk-adjusted percent of an agency’s patients with each outcome is within one to two percentage points for most agencies on most outcomes. It is the ranking of each agency relative to others, however, that is likely to be of greatest concern to providers. Our analysis found that the ranking of agencies using current risk-adjustment models and the ranking using the full alternative risk-adjustment models are in close agreement for most outcomes.

The agency-level analyses then were repeated using only the core risk-adjusters in the alternative risk-adjustment models. This was done in order to better understand the contribution of the outcome-specific and OASIS “prior” items to the finding of similar quality ratings regardless of risk-adjustment approach. The basic results hold. However, as would be expected, the quality ratings are not as close when outcome-specific and OASIS prior items are dropped from the alternative risk-adjustment models of the OBQI indicators.

Conclusions and Implications

There are important tradeoffs and differences between the current and alternative approaches to risk-adjusting OBQI quality indicators. The first is the generally higher explanatory power of the current models versus the simplicity of the alternative models and their overall reliance on a smaller number of OASIS items and elements. That current models generally have slightly better explanatory power than the alternative models is not surprising since the “stepwise” approach is likely to result in models with close to the best explanatory power possible for the data set analyzed. At the same time, however, it leads to the selection of a large number of risk factors when all outcome measures are considered. In addition, because the stepwise approach “fits” models to the data on which they are developed, the explanatory power of these models is likely to decline when they are applied to new data sets.

A second tradeoff is between the full alternative models that include the outcome-specific risk-adjusters and alternative models with only the core set of risk-adjusters. The latter tend not to predict outcomes as well as the full models. Measures of physical functioning prior to home health admission are particularly significant in the risk-adjustment models of ADL and IADL improvement. The “prior” OASIS items, however, are more difficult than many other items for home health agencies to collect and are thought to be less reliable than other clinical measures. Should they be dropped from the OASIS instrument, the explanatory power of the risk-adjustment models for most ADL and IADL improvement models would be reduced roughly two percentage points.

The decision to exclude home health LOS from the alternative models, in addition, has a significant impact on a small but important subset of risk-adjustment models (i.e., the utilization outcomes). LOS was excluded because it can be affected by problems in the care process that also affect outcomes (i.e., low quality care can cause a longer stay as well as worse outcomes). If LOS is included in risk-adjustment models, conclusions about the quality of agency care could be erroneous due to quality problems being risk-adjusted away. The TAG convened to review preliminary models developed by the project team strongly supported the decision to exclude LOS from risk-adjustment models. The consequence, however, is reduced explanatory power for some outcomes. A possible methodological solution, which has data burden and simplicity implications, is to collect information on the timing of all of the utilization outcomes (e.g., hospitalization) and estimate hazard models which take into account the time to the outcome of interest.

An agency-level analysis was conducted to examine how alternative approaches to risk-adjustment of the OBQI indicators affect an agency’s quality ratings, with two main findings. First, for most agencies and most outcomes, the adjusted proportion of patients with an outcome is similar regardless of whether the current or the full alternative model is used to risk-adjust outcomes. Second, the relative ranking of agencies using current risk-adjustment models and the ranking using the “full” alternative risk-adjustment models are in close agreement for most outcomes. One limitation of the agency analysis is that for some outcomes a relatively large number of agencies were excluded because too few patients at each of these agencies had the potential to have the outcome (i.e., less than 20 in the study sample).

The results suggest that the relatively small reduction in explanatory power of most of the alternative risk-adjustment models for the OBQI indicators is unlikely to have a substantial effect on the quality ratings of the majority of agencies. A theory and evidence-based modeling approach, then, has the potential to simplify risk-adjustment and provide a consistent and stable basis for risk-adjustment relative to the current approach. This should make it more understandable to providers and encourage individual agencies to risk-adjust their own outcomes. The reliance on a smaller number of OASIS data elements, in addition, would contribute to the Department’s efforts to streamline the OASIS instrument and potentially facilitate the identification of a parsimonious set of clinical measures appropriate for data exchange in an electronic health record environment.


INTRODUCTION

One of the central goals of the U.S. Department of Health and Human Services is to improve the quality of health care received by all Americans. In the home health care area, the Department has two key initiatives developed and implemented by the Centers for Medicare and Medicaid Services (CMS) to assess, improve, and report quality. The Outcome-Based Quality Improvement (OBQI) program provides reports to all Medicare-certified home health agencies so that they can identify potential quality problems and devise appropriate strategies to address them. The Home Health Quality Initiative (HHQI) uses a subset of the OBQI quality measures for public reporting. The purpose of HHQI is to provide useful information for potential home health consumers to make informed decisions when choosing a home health agency, and to provide an incentive for home health providers to improve the quality of care they provide.

The source of the data used in OBQI and HHQI is the Outcome and Assessment Information Set (OASIS). Since July 1999, home health agencies participating in the Medicare or Medicaid programs have been required to collect OASIS on all patients age 18 or older admitted to Certified Home Health Agencies. The two exceptions are persons receiving pre or postpartum maternity services and those receiving only personal care, chore or housekeeping services. OASIS data subsequently are submitted to State Survey Agencies which in turn send the data to CMS where they become part of a National Repository. The Medicare Prescription Drug, Improvement and Modernization Act of 2003 suspended OASIS requirements, beginning December 2003, for patients who are not covered by Medicare or Medicaid.

There are 41 home health quality measures in the context of the OBQI framework. They include functional, physiologic, emotional/behavioral, cognitive, and health care utilization (e.g., hospitalization) outcomes (Table 1). Currently, 30 of the 41 OBQI quality indicators are risk-adjusted when comparing outcomes for patients from one agency with outcomes for patients from all agencies in OBQI reports.1 One of the OBQI patient outcome indicators (Improvement in Pain Interfering with Activity) is risk-adjusted for public reporting in HHQI but not in OBQI reports sent to agencies.

The quality indicators are risk-adjusted so that agencies serving different types of patients can be compared. The statistical modeling approach currently used to risk-adjust these measures is a data-driven “stepwise” approach with a separate set of risk factors used for each OBQI measure. One potential drawback of using a stepwise approach to risk-adjustment is finding a set of adjustors that are specific to the particular data set being modeled. Since the decision to retain a variable as a predictor in a given model is driven by the data being analyzed, there is a risk of an “overfit” of the data. The resulting model may predict the analytic data set well, but be a poor fit when applied to future data. To at least partially address this problem, the risk-adjustment models developed by the CMS contractor at the University of Colorado were estimated on a randomly selected subsample of the overall dataset, referred to as the “developmental sample.” The developmental sample models then were validated by applying them to data that were set aside for this purpose. In those cases where there was a substantial discrepancy in the explanatory power of the model between the developmental and validation samples, the model was re-estimated using the developmental sample.

The purpose of this project was to develop and test alternative risk-adjustment approaches to assessing the quality of home health care. A theory and evidence-based approach was used to develop risk-adjustment models for the OBQI quality indicators. Specifically, instead of using a separate set of risk-adjusters for each OBQI quality indicator where risk-adjusters are primarily determined based on their statistical fit to the model, this project used a core set of risk-adjusters in all models that theory and prior research suggest are important determinants of home health quality. Advantages of a theory and evidence-based approach include simplicity, understandability, stability of the risk-adjustment models over time, conceptual meaningfulness, and the potential for greater parsimony in data elements when a large number of outcome indicators are being risk-adjusted, as is the case in the OBQI program.

The alternative models were developed within the framework of the uniform data collection system (OASIS) at the time of the study. A project goal was to develop alternative models that could be implemented using existing data sources and project resources limited analyses to OASIS data elements. Within this framework, clinically relevant measures that may be included in future electronic record systems were distinguished from other measures in the model-building process. We identified the relative contribution of OASIS items supplementing the core set of risk-adjusters to inform efforts to determine whether OASIS items can be excluded from the instrument without jeopardizing the explanatory power of the risk-adjustment models.

Findings from this project will contribute to CMS’s future plans for continued refinement of risk-adjustment and outcome measures. They also will provide home health care providers with a better understanding of current and alternative modeling approaches for risk-adjustment of home health quality indicators. Finally, the results will support the Department’s efforts to reduce regulatory burden by streamlining OASIS.


RISK-ADJUSTMENT OF HEALTH CARE QUALITY MEASURES

Risk-adjustment is a critical tool in the evaluation of health care quality. Its aim is to “level the playing field” so that providers serving different patients can be meaningfully compared (Johnson, 2003). Many of the risk-adjustment methods developed and implemented to date are designed to account for differences in patients’ health status when determining payment rates in public programs. The limited diffusion of risk-adjustment methods for assessing the quality of health care may be due to the multiple dimensions of quality, cost of appropriate data, and technical complexity of risk-adjustment methods. Blumenthal and colleagues (2005) argue that greater attention needs to be paid to simplicity, practicality and the intuitive appeal of risk-adjustment methods to increase diffusion and the effective use of this tool.

Theoretical and Conceptual Models

The most widely employed theoretical model in health services research is Andersen and Newman’s conceptual framework (1973) for examining the determinants of medical care utilization. It describes the use of individual health services as a function of societal determinants (technology, norms), health system determinants (resources, organization) and individual determinants (predisposing, enabling and health status). The development of theoretical models of patient health outcomes has lagged behind although the Anderson and Newman framework can serve as a starting point for thinking about the contributors to patient outcomes.

Iezzoni (2003) recently outlined the concepts underpinning the risk-adjustment of health care outcomes, highlighting the importance of the “medical meaningfulness” of risk-adjusters. She lists 26 potential risk factors in the following five broad categories (page 35):

Iezzoni points out that data limitations will constrain the range of potential risk-adjusters and that an “a priori conceptual model of which risk factors should be in a risk-adjustment method for a given outcome…” is necessary to evaluate the credibility of risk-adjusted findings (page 33).

Current Approach to Risk-Adjustment of OBQI Quality Indicators

There is relatively little empirical research on the quality of home health care (Institute of Medicine, 2001). Important exceptions include the work of the team at the University of Colorado responsible for developing OBQI under contract to CMS. The method used by CMS to risk-adjust patient outcomes is logistic regression.2 The initial modeling approach involved conceptually and clinically specifying all possible risk factors that might influence the OBQI outcome from a large number of candidate risk factors derived from the OASIS instrument. The grouping of the potential risk-adjusters by OASIS content area represents a framework for thinking about contributors to patient outcomes. All but the “length of stay” (LOS) category fit within the five broad categories identified by Iezzoni.

A “stepwise” logistic regression approach is used to assess the relationship between each candidate risk factor and the outcome measure under consideration. Separate logistic regression models are used for each outcome measure and, in some cases, “submodels” are estimated for patients with different baseline values of the outcome indicator. For example, three sub-models are estimated when assessing Improvement in Transferring: one for patients who can transfer with minimal human assistance; one for patients who are unable to transfer by themselves but able to bear weight and pivot during the transfer process; and one for patients with higher levels of disability in transferring. Details of the risk-adjustment methodology are provided in Shaughnessy and Hittle (2002) “Overview of Risk Adjustment and Outcome Measures for Home Health Agency OBQI Reports” available for direct download at http://www.cms.hhs.gov/apps/hha/RiskAdj1.pdf with the risk-adjustment models for the 30 outcomes risk-adjusted in OBQI at http://www.cms.hhs.gov/apps/hha/riskadj1appa.pdf. The risk-adjustment model for “Improvement in Pain Interfering with Activity” (risk-adjusted in HHQI but not in OBQI reports) is at http://www.cms.hhs.gov/apps/hha/RiskModels.pdf. Risk-adjusted outcomes are reported in a recent analysis of whether home health quality changed following the introduction of the Medicare prospective payment system (Schlenker, Powell and Goodrich, 2005).

Other Empirical Research on Home Health Outcomes Using OASIS

Mathematica Policy Research (Cheh and Black, 2002), as part of Laguna Research Associates’ analysis of the impact of the Medicare home health interim payment system, also has analyzed home health outcomes using OASIS (or slightly modified OASIS) data. The investigators grouped OASIS items in their risk-adjusted models into the following broad categories:

They also included in their models “Measures of Patient’s Prior Service Use Before Home Health Admission” derived from Medicare claims data.

Fortinsky and Madigan (1997) analyzed home health outcomes using standardized items from the “transition” and “full” OASIS data system. They used Andersen and Newman’s conceptual framework for organizing their explanatory variables although only bivariate analyses were conducted.

Prior work by the project team at the Center for Home Care Policy and Research at the Visiting Nurse Service of New York includes a study conducted by Peng, Navaie-Waliser and Feldman (2003) that examined physical functioning (activities of daily living (ADLs) and instrumental activities of daily living (IADLs)), psychological functioning (anxiety and depression) and discharge outcomes among home health care patients using OASIS data. They used a subset of OASIS items as case-mix adjusters, based on Andersen and Newman’s conceptual framework, with a focus on differences among patients across OASIS-derived race and ethnic categories.

Other prior work at the Center for Home Care Policy and Research has focused on the outcomes of heart failure patients. We relied on OASIS data for baseline patient measures and survey data at a uniform point in time after home health admission for outcome measures (Murtaugh et al., 2005; Feldman et al., 2005). In these studies, OASIS data at baseline were grouped into broad domains similar to those described above with key variables from each domain included as risk-adjusters in our models.

Empirical Research on Nursing Home Outcomes

There is a growing literature on risk-adjustment of nursing home quality measures. Mukamel and colleagues (2003), for example, used Minimum Data Set (MDS) information on over 45,000 residents in 671 nursing homes in New York State to develop a risk-adjusted urinary incontinence outcome measure. A large number of potential risk-adjusters was examined and goodness of fit statistics improved substantially when separate models were estimated for each of three age categories (i.e., 65-74, 75-84, and 85 and older). In earlier work, Mukamel and Brower (1998) examined the influence of three different risk-adjustment methods on conclusions about nursing home quality of care. The three methods examined were: (1) no risk-adjustment; (2) risk-adjustment using only items needed to determine nursing home payment (Resource Utilization Group, or RUGs, items); and (3) “comprehensive” adjustment based on patient-level risk factors. The investigators found substantial disagreement in quality ratings depending on the risk-adjustment method employed.

Reliability of OASIS Data

Item reliability is an important issue when selecting risk-adjusters. The testing of OASIS items by the team that developed OBQI at the University of Colorado is an important source of information on reliability. In addition, inter-rater reliability of the full range of OASIS items has been examined by the Center for Home Care Policy and Research of the Visiting Nurse Service of New York (Kinatukara, Rosati and Huang, 2005), and selected items have been examined by Madigan and Fortinsky (2000).

There is considerable variation among OASIS items in their inter-rater reliability as measured by the percent agreement and Cohen’s kappa (a measure of agreement that adjusts for the extent to which the observed agreement is due to chance). This is particularly true when reliability statistics are reported for specific categories of multi-category items rather than the average over all categories. The results from these analyses can be used to identify potential risk-adjusters that are more (or less) reliable than others as well as content areas within domains that more (or less) reliable than others.

Project Framework

The framework for selecting the core set of risk-adjusters is based on an integration of Iezzoni’s conceptual model, prior empirical research, and the input of a Technical Advisory Group (TAG) (see below). It is limited to patient measures although the context in which care is provided theoretically may influence patient outcomes. The specific domains and sub-domains of potential risk-adjusters are listed below and refer to patient status at the time of home health admission (or at the time of a subsequent baseline assessment) unless otherwise indicated:


METHODS

Source of Data

OASIS, as noted above, is the source of the data used in OBQI. Agencies are required to collect OASIS at different points in time over a patient’s stay. The reason for an assessment is recorded on the OASIS instrument from among the following categories:

01  =  Start of Care (SOC)--further visits planned
02  =   Start of Care (SOC)--no further visits planned (discontinued 12/2002)
03 =  Resumption of Care (ROC) (after inpatient stay)
04 =  Recertification (Follow-Up) assessment (every 60 days)
05 =  Other Follow-Up (when there is a significant change in patient condition)
06 =  Transferred to an Inpatient Provider--patient not discharged from agency
07 =  Transferred to an Inpatient Provider--patient discharged from agency
08 =  Death at Home
09 =  Discharged from Agency to the Community
10 =  Discharged from Agency--no visits completed after SOC/ROC assessment (discontinued 12/2002)

There is some variation in the data items collected depending on the reason for the assessment. Baseline data for risk-adjustment and the health status quality indicators are from SOC, ROC and Other Follow-Up assessments. Data from assessment types 06-09 are used to determine changes in health status as well as utilization outcomes. OASIS is described in detail at http://www.cms.hhs.gov/oasis.

The data analyzed in this project were obtained from the CMS contractor at the University of Colorado. They drew the data from the OASIS National Repository at CMS to create discrete episodes of home health care during calendar year 2001. The file includes all episodes of care beginning and ending within the calendar year. Approximately 1,500,000 OASIS episodes are present in the overall data set. The University of Colorado randomly assigned about a third of the episodes to the developmental sample for initial estimation of risk-adjustment models for most outcomes. The remaining 1,000,000 were used to validate the final models derived from analysis of the developmental sample.

The data set contains the OBQI outcome indicators and 143 potential risk-adjusters derived from OASIS. The University of Colorado replaced any missing values for the risk-adjusters with mean values from the sample used to develop their risk-adjustment models. A technical memo documenting the creation and coding of risk factors in the current models has been written by staff at the University of Colorado and can be downloaded from http://www.cms.hhs.gov/apps/hha/riskadjappb.pdf. All of the data needed to replicate the risk-adjustment models employed in OBQI and HHQI at the time of the study were included on the files.

The project estimated preliminary models using the 143 candidate risk-adjusters developed by the University of Colorado. Following the TAG meeting (see below), a small number of potentially important risk-adjusters available on raw data files edited by the CMS contractor were requested by the project team and provided by the University of Colorado.

Analytic Methods

Analyses were conducted in two major phases (i.e., preliminary data analyses and final data analyses). Preliminary data analyses included replication of the CMS risk-adjustment models for the first set of 11 outcomes reported in HHQI and development of alternative models for these outcomes. A TAG meeting then was conducted with experts in home health care and risk-adjustment as well as policymakers and provider representatives. The TAG provided input on our initial approach based on the results of the preliminary data analyses. Following the TAG, a final set of alternative risk-adjustment models was developed for all 41 OBQI quality indicators and the impact of alternative risk-adjustment models on agency quality ratings was examined.

Logistic regression is the statistical method currently used to risk-adjust OBQI outcomes. We also used logistic regression when estimating risk-adjustment models since the purpose of the project was to replicate the existing approach and compare it with a theory and evidence-based approach to selecting risk-adjusters. An R-squared statistic and c statistic were estimated to assess the explanatory power and fit of current and alternative models.

The R-squared statistic is the squared correlation between the observed and predicted value of the dependent variable. This pseudo R-squared measure is the one estimated by the CMS contractor at the University of Colorado and included in publicly released reports describing current risk-adjustment models. While it is not equivalent to the R-squared statistic estimated in ordinary least squares regression, throughout this report we refer to increases and decreases in the R-squared statistics as changes in the “explanatory power” of a model. The change technically represents an increase or decrease in the extent of the agreement between observed and predicted values.

Preliminary Data Analyses

Preliminary analyses were conducted on the first set of OBQI outcomes publicly reported as part of HHQI.3 The 11 measures are:

Current risk-adjustment models first were replicated to ensure that the samples for each model and specifications for independent and dependent variables in initial models exactly corresponded to those used by CMS when reporting the first set of HHQI outcomes. After replicating the risk-adjustment models for the 11 outcomes (a total of 15 models since three sub-models are estimated to risk-adjust Improvement in Transferring and Improvement in Pain Interfering with Activity) a theory and evidence-based approach was used to estimate alternative models for these outcomes.

Estimation of the theory and evidence-based models proceeded sequentially. A total of six models was estimated for each outcome. We began with a model limited to a core set of clinically relevant risk-adjusters, which included the baseline value of the outcome measure if it was not already among the core variables. We then added risk-adjusters at each subsequent step in the model building process.

The statistics below were estimated for the current and each of the alternative risk-adjustment models:

Technical Advisory Group Review of Preliminary Results

A one-day TAG meeting was convened with members, including industry representatives, having expertise in home health care quality, risk-adjustment, and home health care policy. The methodology and results of the preliminary analyses were summarized and provided to the TAG in a technical memorandum prior to the meeting. TAG members also received a technical memo reviewing the current CMS method for risk-adjusting OBQI outcome measures and other relevant literature on risk-adjustment of home health care outcomes. These documents served as the starting point for discussions at the TAG meeting.

The role of the TAG was to advise the project team on the development of the alternative risk-adjustment models, in particular, to provide advice on:

Final Data Analyses: Risk-Adjustment Models

The analytic methods for estimating a final set of alternative risk-adjustment models were very similar to those used to estimate preliminary models. First, the remaining outcomes of the current risk-adjustment models were replicated. Following refinement of the core and supplementary risk-adjusters, three sequential models were estimated for all 31 home health quality indicators currently risk-adjusted in OBQI or HHQI.

The decision to estimate only three sequential models (as opposed to the six estimated in the preliminary analyses) was based on the advice of the TAG and further analysis of the social support risk-adjusters following the TAG meeting. The analysis confirmed that these factors contributed relatively little to the explanatory power of risk-adjustment models (see below).

Ten of the 41 OBQI quality indicators are not currently risk-adjusted. Only a model with the “clinical core” (i.e., Model 1) was estimated for each of these outcomes. The model statistics listed above in the Preliminary Data Analyses section were estimated for all risk-adjustment models developed in the Final Data Analyses.

Final Data Analyses: Agency Impacts

An agency-level analysis was conducted to examine how alternative approaches to risk-adjustment of the OBQI quality indicators affect an agency’s quality ratings. The agency-level analysis employed the validation data set provided by the University of Colorado with approximately 5,000 agencies included on the calendar year 2001 files. Two “adjusted” agency outcome rates were calculated for each of the 31 outcomes currently risk-adjusted in OBQI or HHQI. For example, an agency’s adjusted rate for Improvement in Bathing (see formula below) first was estimated using the current CMS risk-adjustment model. The adjusted rate then was re-estimated using the full alternative model developed to risk-adjust Improvement in Bathing in this project (i.e., the final version of Model 3). Not all agencies have estimates for all outcomes. If an agency has fewer than 20 patients with the potential to have an outcome, that outcome is not included in agency OBQI reports or in HHQI. We followed this approach and did not estimate the adjusted outcome for an agency when there were fewer than 20 patients with the potential to have the outcome.

There were five steps in the calculation of the adjusted agency outcome rate:

  1. Identify the patients at an agency with the potential to have an outcome.

  2. Determine the observed percent with the outcome at each agency where at least 20 patients have the potential to have the outcome.

  3. Estimate the predicted probability of the outcome at the individual level using: (1) the current risk-adjustment model, and (2) the final alternative model.

  4. Calculate the average predicted probability of the outcome at each agency when the current risk-adjustment model is used, and then when the alternative model is used.

  5. Adjust the agency mean so that agencies can be compared to the national average for an outcome using the formula published by the University of Colorado:

Adjusted Agency Outcome Rate = Observed Agency Outcome Rate +
(Observed National Outcome Rate - Agency Predicted Outcome Rate)

The following statistics then were estimated for each of the 31 outcomes:

A sensitivity analysis subsequently was conducted to better understand the impact on agency quality ratings of the inclusion of outcome-specific and OASIS “prior” items in the alternative risk-adjustment models of the OBQI quality indicators. Specifically, the agency-level analysis was repeated with only the core risk-adjusters included in the alternative risk-adjustment model for each of the 31 OBQI outcomes (i.e., the final version of Model 1). The results with and without the outcome-specific and OASIS “prior” items as risk-adjusters then were compared.


DEVELOPMENT OF ALTERNATIVE MODELS AND RESULTS OF ANALYSES

The preliminary set of theory and evidence-based core risk-adjusters in the first phase of the project, where we focused on the original 11 HHQI outcomes, was drawn from a number of domains covered by the OASIS instrument. In selecting the core set as well as supplemental risk factors, special attention was paid to variables that are clinically relevant and suitable for inclusion in electronic health records. The preliminary set of core risk-adjusters is listed in Table 2. The only risk-adjusters that are not clinical or patient characteristics likely to be included in an electronic health record in this preliminary set are those under the Informal Support/Assistance and Living Situation sub-domains.

Preliminary Analyses

Currently, different subsets of home care patients are assessed when determining an agency’s performance on each OBQI quality indicator. The three utilization outcomes are computed for all episodes except those ending in death (i.e., approximately 98% of episodes are included). For all other outcomes, two additional criteria are used to determine whether or not a given episode will be included. First, the episode must end in discharge to the community (approximately 70% of episodes), because the endpoint measures used to calculate improvement or stabilization on the non-utilization outcomes are collected only on the more comprehensive assessment made for those patients discharged to the community. Second, the start of care (SOC) assessment item for the outcome must permit the patient to have the potential to have the outcome. OBQI health status improvement measures are binary indicators of whether the patient’s status at discharge is better than at baseline. Individuals who cannot improve because they do not have any deficit in the quality indicator at baseline are excluded from estimates of improvement. OBQI health status stabilization measures are binary indicators of whether the patient’s status at discharge is the same or better than at baseline. Individuals who cannot deteriorate because they are in the worst category of the quality indicator at baseline are excluded from stabilization estimates.

The initial developmental sample from which the University of Colorado identified individuals with the potential to have an outcome is 125,000 episodes. However, the developmental sample was supplemented by the University of Colorado for four of the 11 HHQI outcomes due to low numbers of episodes where patients had the potential to have the outcome. The developmental sample was 250,000 episodes for Improvement in Upper Body Dressing, Improvement in Transferring, and Improvement in Oral Medications, and approximately 350,000 episodes for Improvement in Confusion.

Respecification of Core Risk-Adjustors

After replicating the risk-adjustment models developed by the University of Colorado, alternative models were estimated using exactly the same coding of risk-adjusters as in current models with two exceptions where theory or prior evidence suggested other codings were likely to be more meaningful. Instead of a continuous measure of the age of the home care patient, four categories were specified: <65; 65 to <75 (reference category); 75 to < 85; 85 or older. The other change was the creation of a single numeric scale from the individual OASIS ADL and IADL measures at baseline. Spector and Fleishman (1998) examined the psychometric properties of ADLs and IADLs and concluded that they represent a single construct. We approximated the scale developed by Spector and Fleishman by classifying persons as either independent or dependent on human help to complete each ADL and IADL. The scale is a simple count of the number of ADLs and IADLs that the patient needs human help to complete. It ranges from 0 to 14.

After initial models were estimated, we examined the direction and consistency of the effect of the core risk-adjusters across the 11 HHQI quality indicator outcome models. A number of the original risk-adjusters were integer scales that did not appear to be linearly related to the HHQI quality indicators and/or the effect on the outcome measures was the opposite of what would be expected.

We also categorized dyspnea which was included in the risk-adjustment models of the ADL outcomes. The original integer scale was not linearly related to these outcomes. In some models of ADL outcomes, the direction of the effect of dyspnea was positive, suggesting improvement in ADL outcomes as the level of impairment increased (although generally decreasing in magnitude as impairment level increased). In other models the effect of higher levels of impairment on ADL outcomes was negative although never statistically significant. Despite its unexpected and inconsistent effects, we left dyspnea in the preliminary alternative risk-adjustment models for ADLs because of its conceptual importance. Dyspnea did have the expected effect on the utilization outcomes, with the probability of Emergent Care and Acute Care Hospitalization rising as the severity of dyspnea increased.

Respecification of Baseline and Prior Values of Outcome Indicators

The baseline and “prior” values of the outcome indicators were treated as continuous variables, following the approach of the University of Colorado, in our initial analyses. Higher values always represent a “sicker” state. Subsequently, these indicators were respecified as categorical variables to test the assumption that baseline and prior variables are linearly related to the outcome indicators. The respecification improved the explanatory power of the risk-adjustment models--in a few cases, substantially.

Summary of Preliminary Modeling Results

Six models were estimated for each outcome. We began with a model limited to the core set of clinical, demographic and payment risk-adjusters, including the baseline value of the outcome measure if it was not already among the core variables. Outcome-specific risk-adjusters were added at subsequent steps: Model 2 included other clinical characteristics at baseline that might plausibly affect the outcome, and Model 3 included measures of clinical status prior to home health admission. Four clinical therapies at baseline (i.e., oxygen therapy, IV/infusion therapy, enteral/parenteral nutrition, and ventilator) then were added to the risk-adjustment models for all 11 outcomes (Model 4). The living arrangements and social support indicators subsequently were added to all models (Model 5). Finally, LOS was added solely to allow comparison of current and alternative model statistics and parameter estimates.

By Model 3 (i.e., after the addition of the prior health status measures) the risk-adjustment models developed in the preliminary analyses generally approached but did not exceed the explanatory power of the HHQI risk-adjustment models developed by the University of Colorado. The effect of the measures of health status prior to admission on the explanatory power of the risk-adjustment models varied depending on the outcome indicator. They had a modest effect in the improvement in ADL models as well as the one improvement in an IADL model (i.e., Improvement in Management of Oral Medication). Prior health status risk-adjusters had virtually no effect in the remaining models of health status outcomes and were not included in the risk-adjustment models of the two utilization outcomes.

The social support indicators, while conceptually important, added almost nothing to the explanatory power of risk-adjustment models that already included clinically relevant variables. The one exception was the Improvement in Oral Medication risk-adjustment model where there was a one percentage point increase in the R-squared statistic after the addition of the core social support measures and a statistically significantly improvement in the fit of the model (p < 0.001).

The generally lower explanatory power of the preliminary alternative models is not surprising since the “stepwise” logistic regression technique used to develop the current models is likely to result in models with close to the best explanatory power possible for the data set analyzed. In addition, the exclusion of LOS from the alternative models, because it can be affected by the quality of care provided and therefore is not an appropriate risk-adjuster, results in a reduced R-squared value for the alternative utilization outcome models relative to the current models.

Whether the alternative models are more parsimonious than the University of Colorado models depends on whether the models are considered individually or all 11 are considered together. Only two of the preliminary risk-adjustment models were more parsimonious than the corresponding models developed by the University of Colorado to risk-adjust the 11 initial HHQI outcome indicators. The total number of OASIS items and elements used to risk-adjust all 11 HHQI outcome indicators, however, was smaller.

Technical Advisory Group (TAG) Meeting

A TAG meeting was conducted in Washington, DC, on August 20, 2004. Members of the TAG, which included industry representatives, were experts in home health care quality, risk-adjustment, and home health policy. The TAG made a number of comments and recommendations based on a review of preliminary analysis results and other background documents.

Strong support was expressed for identifying a core set of risk-adjusters (for statistical reasons as well as for face validity and interpretation of risk-adjustment models). TAG members agreed that the original file of risk-adjusters obtained from the University of Colorado had some limitations and that additional OASIS data should be requested to allow further development of three types of risk-adjusters: diagnoses, social support, and payer. Diagnoses were aggregated into broad body system categories on the original file. With the specific diagnosis information collected on OASIS, it will be possible to specify diagnoses that occur frequently in the home care population (e.g., diabetes) as well as conceptually important medical conditions. It was pointed out that some important diagnoses typically are recorded as secondary, not primary, diagnoses (e.g., multiple sclerosis) and that diagnosis risk-adjusters should take OASIS secondary diagnoses into account.

The TAG also recommended further examination of living arrangement and social support risk-adjusters after the original OASIS variables are obtained because of their high face validity for clinicians. There was a discussion about more detailed living arrangement data and whether knowing that the patient lives with his or her spouse, as opposed to other family members, is likely to perform better as a risk-adjuster. TAG members pointed out that it is possible that too much assistance could delay improvement in some activities. Also, it was suggested that the project team think about whether it is possible to identify spouses who can help with care versus those who cannot or who may require their own care.

There was a discussion of the original payer data (M0150) as well. Medicaid as a payer is to some extent an indicator of economic status. It also is likely to be an indicator of more permanent disability and/or chronic disease. One of the industry experts also suggested that agency staff completing OASIS assessments tend to check Medicare as a payer if there is any chance that the episode might be billed to Medicare. A very large share of episodes (greater than 94%) on the file obtained from the University of Colorado report Medicare as a payer. In addition to home health agency coding practices, this is partly due to the way episodes of home health care are selected for OBQI outcome analysis. All episodes must start and finish in the calendar year. This eliminates many long episodes that are more likely to have Medicaid as the payer including episodes where home health was provided the entire year but admission and discharge are outside the calendar year.

The rationale for examining the baseline therapy measures (i.e., oxygen therapy, IV/infusion therapy, enteral/parenteral nutrition, and ventilator) separately from other clinically relevant risk-adjusters was discussed by the TAG. The risk-adjustment experts agreed that it generally is a bad idea to include actual services in payment or outcome risk-adjustment models since it may encourage inappropriate use of the services. The clinical and industry experts, however, pointed out that these services were invasive and would not be initiated without very clear clinical indications and medical orders. These measures generally had little impact on the explanatory power of the 11 HHQI risk-adjustment models but may be appropriate as outcome-specific risk-adjusters in some cases.

One TAG member indicated that sensory measures (e.g., vision, speech) tend to vary in their relationship with outcomes and that the project team may want to consider dropping them from the core set of risk-adjusters and including them as outcome-specific risk-adjusters when appropriate. It also was suggested that “Life Expectancy” be dropped from consideration since agencies questioned its reliability and it is unclear whether it will be included in future versions of OASIS.

Overall, there was agreement that the sequential model building approach used by the project team was logical. There also was agreement that LOS should not be included as a risk-adjuster. Members of the TAG also agreed that agency-level analyses are an important part of the assessment of differences between current and alternative risk-adjustment models.

Final Data Analyses: Risk-Adjustment Models

Development of Final Set of Core and Supplemental Risk-Adjusters

The selection of the final set of core risk-adjusters was based on findings from the preliminary analyses, comments of TAG members, and examination of a small number of additional OASIS items provided by the University of Colorado following the TAG meeting. The analyses conducted after receipt of additional OASIS data included respecification of the Living Situation and Informal Support/Assistance risk-adjusters. Specifically, alternative specifications were explored utilizing the more detailed data on living arrangements (with the “lives with spouse/family” category in initial models separated into two categories) and the person providing assistance.

The additional data and respecification, however, did not substantially affect the contribution of the living situation and informal support/assistance measures to the explanatory power of the HHQI risk-adjustment models that already included demographic, payer and clinical measures. The one exception is the risk-adjustment model for Improvement in Medication Management. When the living arrangement and social support measures were added to a model with demographic, payer and clinical measures (i.e., added to Model 3), the R-squared statistic increased from 15.7% to 16.7%. These conceptually important measures were excluded from the alternative models because of the limited contribution to the explanatory power of the risk-adjustment models.

Table 3 lists the final set of core risk-adjusters in the alternative models along with their specification. A total of 43 OASIS items were used to construct the core risk-adjusters. The demographic and insurance measures clearly are likely to be included in electronic health records and the remaining items are all clinically relevant. The one core risk-adjuster that varies from model to model is the baseline value of the outcome indicator. The baseline value, specified as a categorical variable, tends to make a relatively large contribution to the explanatory power of risk-adjustment models. It appears to be adjusting for differences in the probability of improving (or stabilizing) related to the number of levels of the OASIS item.

Risk-adjusters specific to each outcome, other than measures of health status prior to admission, are listed in Tables 4a-4d. They are reported by domain of the outcome indicator (e.g., Table 4a lists the risk-adjusters specific to ADL outcome models). Some items are common to all risk-adjustment models within a domain. For example, obesity is included in the risk-adjustment models of all ADL outcomes. Other items are specific to a single outcome. For example, whether a patient smokes is specific to the Improvement in Dyspnea risk-adjustment model. Generally, 2-3 outcome-specific items were added to each risk-adjustment model. All of these items are clinical factors.

Tables 5a-5d list the measures of clinical status prior to home health admission that were added to the risk-adjustment models of selected OBQI outcomes. As noted above, these OASIS items were examined separately from other outcome-specific risk-adjusters because of questions about their reliability and possible elimination from the OASIS instrument. There were no directly related, conceptually important prior health status risk-adjusters used for four OBQI outcomes (i.e., Improvement in Dyspnea and the three utilization outcomes).

Comparison of Current and Alternative Models

The OBQI quality indicators are grouped into six broad domains by the University of Colorado: (1) ADLs, (2) IADLs, (3) Physiologic indicators, (4) Emotional/Behavioral measures, (5) Cognitive measures, and (6) Utilization Outcomes (see Table 1). We first present results from all models and then by domain. The models developed by the University of Colorado are referred to as the “current” models; the two final alternative models are referred to as the “core” alternative model (which includes only core risk-adjusters) and the “full” alternative model (i.e., Model 3 which includes outcome specific and prior OASIS items, or Model 2 where there are no relevant prior items).

The “full” alternative models typically have slightly lower explanatory power than the current risk-adjustment models. Specifically, the R-squared statistic for the full model tends to be within 1-2 percentage points of the R-squared statistic for the model developed by the University of Colorado. There is a similar pattern for the c statistic. While the number of OASIS items and elements is sometimes larger and sometimes smaller for the alternative models compared with current models, the overall number of OASIS items and elements employed when risk-adjusting all 31 OBQI outcome indicators is considerably smaller for the full alternative models (64 versus 88 OASIS items, and 93 versus 135 OASIS elements).

ADL and IADL Outcomes. The ADL and IADL outcomes represent 23 of the 41 OBQI quality indicators and over two-thirds of the 31 outcome indicators currently risk-adjusted by the University of Colorado. The performance (i.e., explanatory power as measured by the R-squared statistic) of the alternative and current risk-adjustment models for ADL and IADL outcomes is presented graphically in Figure 1 and Figure 2. Table 6a and Table 7a summarize the model statistics for all ADL and IADL outcome models, respectively, and Table 6b and Table 7b present the detailed regression results for the full alternative models estimated for the 23 ADL and IADL outcomes.4

As previously discussed, most of the full alternative ADL and IADL models have slightly lower explanatory power than the current models. This is not surprising since a “stepwise” approach was used to develop the current models. An exception is the alternative risk-adjustment model for the Improvement in Ambulation outcome where the R-squared statistic is more than six percentage points greater than the R-squared statistic for the current model. The ADL and IADL stabilization outcomes, it should be noted, are highly skewed (i.e., a very high proportion of those potentially able to stabilize do stabilize). This may explain the relatively low R-squared and relatively high c statistics for both current and alternative models.

The outcome-specific risk-adjusters generally contribute very little to the explanatory power of the ADL and IADL risk-adjustment models that already include the core risk-adjusters. In contrast, the prior OASIS items contribute substantially to the explanatory power (roughly two percentage points to the R-squared statistic) of almost all of the risk-adjustment models of improvement in ADLs and IADLs, but not stabilization in ADLs and IADLs. There is a similar pattern for c statistics.

Physiologic Outcomes. Figure 3 graphically presents the performance of the alternative and current risk-adjustment models for the five physiologic outcomes currently risk-adjusted in OBQI. Table 8a summarizes the model statistics for all physiologic outcome models and Table 8b presents the detailed regression results for the full alternative models estimated for the five physiologic outcomes that are currently risk-adjusted, and the alternative models with only core risk-adjusters for the four that are not currently risk-adjusted in OBQI.

The outcome-specific risk-adjusters tend to make a slightly greater contribution to the explanatory power of the physiologic outcome models compared to ADL and IADL outcome models. The effect of the prior OASIS items, on the other hand, is modest. Among the physiologic outcomes, the full alternative risk-adjustment model for Improvement in UTI performs considerably worse than the current UTI risk-adjustment model. The R-squared statistic for Model 3 is 5.9% compared to 12.1% for the current model, and corresponding c statistics are 0.665 and 0.740 (see Table 8a). The main reason for this difference is the exclusion of home health episode LOS from the alternative model.

Emotional/Behavioral Outcomes. None of the emotional/behavioral outcomes currently is risk-adjusted in OBQI. Only Model 1 (i.e., the model including only the core risk-adjusters) was estimated for outcomes that are not currently risk-adjusted. The model statistics for the alternative models for the three emotional/behavioral outcomes are reported in Table 9a. The detailed regression results for the final alternative models estimated for the emotional/behavioral outcomes are presented in Table 9b. The R-squared and c statistics for all three models are low.

Cognitive Outcomes. There are three cognitive outcomes in OBQI but currently only Improvement in Confusion Frequency is risk-adjusted. The right-most bar in Figure 3 graphically presents the performance of the alternative and current risk-adjustment models for Improvement in Confusion Frequency. Neither the outcome-specific nor the prior OASIS items contribute substantially to the explanatory power of the Improvement in Confusion Frequency model that already includes the core risk-adjusters. Table 10a summarizes the model statistics for all cognitive outcome models. Table 10b presents the detailed regression results for the full alternative model estimated for Improvement in Confusion Frequency as well as the alternative models with only core risk-adjusters for the two cognitive outcomes that are not currently risk-adjusted in OBQI. The R-squared and c statistics for all models are relatively low although the c statistic for the Stabilization in Cognitive Functioning risk-adjustment model that includes only the core risk-adjusters is 0.738 indicating adequate ability to predict what is a highly skewed outcome (i.e., over 90% of individuals who could stabilize did stabilize).

Utilization Outcomes. Figure 4 graphically presents the performance of the alternative and current risk-adjustment models for the three utilization outcomes (all three are risk-adjusted in OBQI). Table 11a summarizes the model statistics for all current and alternative utilization outcome models and Table 11b presents the detailed regression results for the full alternative models estimated for the utilization outcomes.

Two of the three outcome-specific variables at baseline (Dyspnea and IV/Infusion therapy) are highly statistically significant in the final, full risk-adjustment models for all three utilization outcomes (p < 0.001). Nevertheless, the outcome-specific variables as a group have only a very small effect on the explanatory power of the risk-adjustment models for the utilization outcomes. When added to models already including the core risk-adjusters, the R-squared and c statistics increase by at most roughly half a percentage point or 0.005, respectively. No prior OASIS items were included in the alternative models for these outcomes. As noted previously, the exclusion of LOS reduces the explanatory power of the alternative models for the utilization outcomes.

Comparison of Overall Number of OASIS Items and Elements Used in Risk-Adjustment

The overall number of OASIS items used in current and alternative risk-adjustment models (out of a total of 95 “M0” items) is graphically presented in Figure 5. The core OASIS items in the alternative models are in the lower left-hand corner shaded in the darkest color. On the diagonal (in the next darkest shade) are the OASIS outcome specific and “prior” items included in the full alternative models (i.e., Model 3 for the outcomes with “prior” OASIS items and Model 2 where there are no relevant “prior” items). The OASIS items for the additional variables used in one or more of the current risk-adjustment models but not in the alternative models are in the next darkest shade. Sixty-four OASIS items were used to construct the risk-adjusters included in one or more of the full alternative models, compared to 88 for the current models developed by the University of Colorado. There are seven OASIS items that are not used in either the current or alternative models (unshaded in the upper-right-hand corner of Figure 5). The “M0” items used for case-mix classification in the Medicare prospective payment system are in bold with an asterisk.

Some OASIS items include multiple elements with each element separately assessed and marked (i.e., the OASIS items with instructions to mark all categories that apply). The OASIS elements used in current and alternative risk-adjustment models are graphically presented in Figure 6 in the same manner as the OASIS items in Figure 5. There are a total of 180 OASIS elements with 93 used to construct the risk-adjusters in the full alternative models compared to 135 in the models developed by the University of Colorado. All OASIS elements in the alternative risk-adjustment models also are used in current models with two exceptions: the Current Payer elements “Medicaid traditional fee-for-service” (M0150_3) and “Medicaid HMO/managed care” (M0150_4), both of which are highlighted on the left side of Figure 6. The “M0” elements used for case-mix classification in the Medicare prospective payment system are in bold with an asterisk.

Final Data Analyses: Agency Impacts

The results of the agency analyses are reported by outcome domain in Tables 12-16. Overall, the results suggest that the quality ratings for most agencies and most outcomes are similar regardless of whether the current or alternative “full” model is used to risk-adjust outcomes. The difference tends to be minimal (no more than one to two percentage points) between the current and alternative risk-adjusted percent of an agency’s patients with each outcome (see Figure 7). For a small share of agencies (i.e., those below the 5th or above the 95th percentile of the distribution), however, differences exceed four percentage points for Improvement in Ambulation, Improvement in Light Meal Preparation, Improvement in UTI, Acute Care Hospitalization, and Discharge to the Community (see columns 3 and 4 of Tables 12-16).

The average of the differences at each agency is greatest for Discharge to the Community (0.374 percentage points) followed by Improvement in UTI (0.287 percentage points). In the case of the UTI outcome, the average percent of patients improving at each agency was 83.7% when estimated using the current risk-adjustment model and 83.9% when estimated using the alternative full model. Despite the very small size of average differences, they often are statistically significant because sample sizes tend to be large, ranging from a low of 771 agencies when comparing the risk-adjusted Improvement in UTI outcomes, to 4,798 agencies in analyses of the percent of patients with an Acute Care Hospitalization.

While the magnitude of the difference between outcome estimates using the two risk-adjustment approaches is important, it is the ranking of each agency relative to others that is likely to be of most concern to providers. The next-to-the-last column in Tables 12-16 reports estimates of Spearman’s rank correlation coefficient. These correlation coefficients are presented graphically in Figure 8. A value of one would indicate that rankings are exactly the same. For most outcomes, in fact, the correlation coefficient is close to one (i.e., it is above 0.950). The two lowest correlation coefficients are 0.912 for Improvement in UTI and 0.925 for Improvement in Ambulation.

The final column of each of the agency-level analysis tables reports the number and percent of agencies that change two or more deciles in rank when the risk-adjustment method is changed. (An agency, for example, would have to decline from the top decile--or top 10% in ranking--to the third decile or lower to be identified as changing two or more deciles.) The outcomes with the greatest number of agencies shifting at least two deciles in rank, not surprisingly, are those with the lowest Spearman’s rank correlation coefficient. Among the agencies analyzed, 20.1% shifted two or more deciles in their Improvement in UTI ranking while 17.3% changed two or more deciles in their Improvement in Ambulation ranking.

Agency quality rankings differ the most where the difference in the explanatory power of the current and alternative risk-adjustment models is substantial. In the case of Improvement in Ambulation, the alternative risk-adjustment model explains considerably more of the variation in the outcome than the current model. It is the reverse for the Improvement in UTI outcome where the current model includes LOS among the risk-adjusters. Agency quality rankings for the utilization outcomes do not differ as much as might be expected given the exclusion of LOS from the alternative models and, as a result, the lower explanatory power of alternative versus current risk-adjustment models.

A sensitivity analysis then was conducted to better understand the impact on agency quality ratings of the inclusion of outcome-specific and OASIS “prior” items in the alternative risk-adjustment models of the OBQI quality indicators. Specifically, agency-level analyses were repeated with only the core risk-adjusters included in the alternative risk-adjustment models (i.e., the final version of Model 1 for each of the 31 currently risk-adjusted OBQI outcomes). The results of the sensitivity analysis are presented graphically in Figure 9 and Figure 10. The basic pattern of impacts is the same but, as expected, the difference in risk-adjusted outcomes using the current and alternative approaches increases (to between one and three percentage points for most agencies on almost all outcomes). For almost a third of the outcomes the Spearman rank correlation coefficient now is in the 0.900-0.950 range with the correlation coefficient for Improvement in Ambulation falling slightly below 0.900.

Finally, it is important to note that for many OBQI outcomes a relatively large number of agencies had fewer than 20 patients in the analytic sample with the potential to have the outcome. These agencies, therefore, were excluded when examining the impact of the alternative approaches to risk-adjustment on the percent of patients with the outcome. The number of agencies excluded is particularly large for two outcomes. All but 14.7% of agencies were excluded when examining the impact of alternative risk-adjustment approaches on estimates of Improvement in UTI and all but 19.5% were excluded when examining the impact on estimates of Improvement in Bowel Incontinence.


CONCLUSIONS AND IMPLICATIONS

The purpose of this project was to develop and test alternative risk-adjustment approaches to assessing the quality of home health care. A data-driven “stepwise” approach currently is used to risk-adjust OBQI quality indicators with a separate set of risk-adjusters in each outcome model. In this project, a theory and evidence-based approach was used to develop alternative risk-adjustment models for the OBQI quality indicators. Advantages of a theory and evidence-based approach include simplicity, understandability, stability of the risk-adjustment models over time, conceptual meaningfulness, and the potential for greater parsimony in data elements when a large number of outcome indicators are being risk-adjusted as is the case in the OBQI program.

The alternative models were developed within the framework of the uniform data collection system (OASIS) in place at the time of the study. A project goal was to develop alternative models that could be implemented using existing data sources and project resources limited analyses to OASIS data elements. The examination of alternative risk-adjusters developed from other data sources (e.g., Medicare claims) is an important area of future research.

Based on theory and prior empirical research, a core set of risk-adjusters was identified from among the content areas covered by OASIS. These core items were included in the risk-adjustment models for all outcomes. A small number of outcome-specific risk-adjusters then was added to each model. The outcome-specific risk-adjusters are OASIS measures of patient status on admission, as well as status prior to admission, plausibly related to a specific outcome or outcome domain.

At the time of this study, 31 of the 41 OBQI quality indicators were risk-adjusted in either OBQI or HHQI. The analysis focused on a comparison of the current and alternative models for these 31 outcomes. In particular, it focused on statistics that measure how well a model predicts an outcome, as well as the number of OASIS items and elements needed to construct the risk-adjusters. While the OBQI quality indicators represent six broad health and functional domains, 22 of the 31 risk-adjusted outcomes (over 70%) are ADL or IADL outcomes.

There are important tradeoffs and differences between the current and alternative approaches to risk-adjusting OBQI quality indicators. The first is the generally higher explanatory power of the current models versus the simplicity of the alternative models and their overall reliance on a smaller number of OASIS items and elements. That current models generally have marginally better explanatory power than the alternative models is not surprising since the “stepwise” approach is likely to result in models with close to the best explanatory power possible for the data set analyzed. At the same time, however, it leads to the selection of a large number of risk factors when all outcome measures are considered. In addition, because the stepwise approach “fits” models to the data on which they are developed, the explanatory power of these models is likely to decline when they are applied to new data sets.

A second tradeoff is between the “full” alternative models that include the outcome-specific risk-adjusters and alternative models with only the core set of risk-adjusters. The latter tend not to predict outcomes as well as the full models. Measures of physical functioning prior to home health admission are particularly significant in the risk-adjustment models of ADL and IADL improvement. The “prior” OASIS items, however, are more difficult than many other items for home health agencies to collect and are thought to be less reliable than other clinical measures. Should they be dropped from the OASIS instrument, the explanatory power of the risk-adjustment models for most ADL and IADL improvement models would be reduced roughly two percentage points.

The decision to exclude home health LOS from the alternative models, in addition, has a significant impact on the risk-adjustment models for the small but important subset of utilization outcomes. LOS was excluded because it can be affected by problems in the care process that also affect outcomes (i.e., low quality care can cause a longer stay as well as worse outcomes). If LOS is included in risk-adjustment models, conclusions about the quality of agency care could be erroneous due to quality problems being risk-adjusted away. The TAG convened to review preliminary models developed by the project team strongly supported the decision to exclude LOS from risk-adjustment models. The consequence, however, is reduced explanatory power for a small number of outcomes. A possible methodological solution, which has data burden and simplicity implications, is to collect information on the timing of all of the utilization outcomes (e.g., hospitalization) and estimate hazard models that take into account the time to the outcome of interest.

Our agency-level analysis examined how the alternative approaches to risk-adjustment of the OBQI indicators affect an agency’s quality ratings as currently calculated by CMS. For most agencies and most outcomes the adjusted proportion of patients with an outcome is similar regardless of whether the current or the “full” alternative model is used to risk-adjust outcomes. Of greater potential concern to providers, however, is the ranking of each agency relative to others, irrespective of the size of the difference in risk-adjusted outcomes. Our analysis found that the ranking of agencies using current risk-adjustment models and the ranking using the “full” alternative risk-adjustment models are in close agreement for most outcomes.

The agency-level analyses were repeated using only the “core” risk-adjusters in the alternative risk-adjustment models. This was done in order to better understand the contribution of the outcome-specific and OASIS “prior” items to the finding of similar quality ratings regardless of risk-adjustment approach. The basic results hold. However, as would be expected, the quality ratings are not as close when outcome-specific and OASIS “prior” items are dropped from the alternative risk-adjustment models of the OBQI indicators.

One limitation of the agency analysis is that for some outcomes a relatively large share of agencies was excluded because they had too few patients with the potential to have the outcome (i.e., less than 20). Nevertheless, the results suggest that the relatively small reduction in explanatory power of most of the alternative risk-adjustment models is unlikely to have an effect on the ranking of the majority of agencies on OBQI quality indicators.

Overall, a theory and evidence-based modeling approach has the potential to simplify risk-adjustment and provide a consistent and stable basis for risk-adjustment relative to the current approach. This should make it more understandable to providers and encourage individual agencies to risk-adjust their own outcomes. The reliance on a smaller number of OASIS data elements, in addition, would contribute to the Department’s efforts to streamline the OASIS instrument and potentially facilitate the identification of a parsimonious set of clinical measures appropriate for data exchange in an electronic health record environment.


REFERENCES

Andersen, R. & Newman, J.F. (1973). Societal and individual determinants of medical care utilization in the United States. The Milbank Quarterly, 51, 95-124.

Blumenthal, D., Weissman, J.S., Wachterman, M., Weil, E., Stafford, R.S., Perrin, J.M., Ferris, T.G., Kuhlthau, K., Kaushal, R., & Iezzoni, L.I. (2005). The who, what, and why of risk adjustment: A technology on the cusp of adoption. Journal of Health Politics, Policy and Law, 30(3), 453-473.

Cheh, V., & Black, W. (2002). Did the balanced budget agreement of 1997 affect the quality of Medicare home health services? Final report (MPR Reference No. 8635-500). Princeton, NJ: Mathematica Policy Research, Inc.

Feldman, P.H., Murtaugh, C.M., Pezzin, L.E., McDonald, M.V., & Peng, T.R. (2005). Just-in-time evidence-based e-mail "reminders" in home health care: Impact on patient outcomes. Health Services Research, 40(3), 865-885.

Fortinsky, R.H., & Madigan, E.A. (1997). Home care resource consumption and patient outcomes: What are the relationships? Home Health Care Services Quarterly, 16(3), 55-73.

Iezzoni, L.I. (Ed.). (2003). Risk adjustment for measuring health care outcomes (3rd ed.). Chicago, IL: Health Administration Press.

Institute of Medicine. (2001). Improving the quality of long-term care (G.S. Wunderlich, & P. Kohler, Eds.). Washington, DC: National Academy Press.

Johnson, M.L. (2003). Risk assessment and adjustment: Adjustment for sick patients or a sick system? Medical Care, 41(1), 4-7.

Kinatukara, S., Rosati, R.J., & Huang, L. (2005). Assessment of OASIS reliability and validity using several methodological approaches. Home Health Care Services Quarterly, 24(3), 23-38.

Kramer, A.M., Shaughnessy, P.W., Bauman, M.K., & Crisler, K.S. (1990). Assessing and assuring the quality of home health care: A conceptual framework. The Milbank Quarterly, 68(3), 413-443.

Madigan, E.A., & Fortinsky, R.H. (2000). Additional psychometric evaluation of the outcomes and assessment information set (OASIS). Home Health Care Services Quarterly, 18(4), 49-62.

Mukamel, D.B., & Brower, C.A. (1998). The influence of risk adjustment methods on conclusions about quality of care in nursing homes based on outcome measures. The Gerontologist, 38(6), 695-703.

Mukamel, D.B., Watson, N.M., Meng, H., & Spector, W.D. (2003). Development of a risk-adjusted urinary incontinence outcome measure of quality for nursing homes. Medical Care, 41(4), 467-478.

Murtaugh, C.M., Pezzin, L.E., McDonald, M.V., Feldman, P.H., & Peng, T.R. (2005). Just-in-time evidence-based e-mail "reminders" in home health care: Impact on nurse practices. Health Services Research, 40(3), 849-864.

Peng, T.R., Navaie-Waliser, M., & Feldman, P.H. (2003). Social support, home health service use, and outcomes among four racial/ethnic groups. The Gerontologist, 43(4), 503-513.

Schlenker, R.E., Powell, M.C., & Goodrich, G.K. (2005). Initial home health outcomes under prospective payment. Health Services Research, 40(1), 177-193.

Spector, W.D., & Fleishman, J.A. (1998). Combining activities of daily living with instrumental activities of daily living to measure functional disability. Journal of Gerontology: Social Sciences, 53B(1), S46-S57.


NOTES

  1. Because of data constraints and methodological issues, 11 outcomes are not risk-adjusted. CMS is planning to address these constraints and issues so that these outcomes will be either risk-adjusted or modified so risk-adjustment is possible.

  2. CMS has tested several other risk-adjustment methods (e.g., multivariate standardization, discriminant function analysis, the classification and regression tree methodology). Logistic regression was determined to have several advantages over these methods, and thus, adopted for use.

  3. As of September 1, 2005, four of the initial 11 HHQI indicators were dropped from public reporting (i.e., Improvement in Toileting, Improvement in Upper Body Dressing, Improvement in Confusion Frequency, and Stabilization in Bathing). They were replaced by Improvement in Dyspnea, Improvement in Urinary Incontinence, and Discharge to the Community.

  4. Regression results for the core model (Model 1) and Model 2 are available from the project team upon request.


The HTML version of the Figures and Tables is available on a separate file.