A REVIEW OF SYNTHETIC HEALTH INDICATORS

Background paper prepared for the OECD Directorate for Education, Employment, Labour, and Social Affairs

Erik Nord, Ph.D.
National Institute of Public Health,
Oslo, Norway

June 1997



TABLE OF CONTENTS

Summary
Acknowledgement
Background
Issues in Health Status Measurement
The Present Study
Scope and Limitations
General Descriptive Data
The Conceptualisation of Health
Dependence and Correlations between Dimensions
Feasibility, Sensitivity and Reliability
Valuation Procedures
The Meaning of the Valuations
The Variation in Health State Valuations

The Validity of Health State Valuations
Measuring Population Health

Utility
Societal Value

Cost-Effectiveness Analysis and QALYs
Conclusions on the Validity of Health State Valuations
Possible Strategies to Improve Validity

Translations of the Instruments
Applications to Date
Development Work in Progress
Other Instruments
Conclusions
References



Summary



A number of multi-attribute utilities - here called synthetic indicators - are available for assigning values to health states on a scale from zero (dead) to unity (healthy). The values may be used in estimating health adjusted life expectancy (HALE) and in cost-effectiveness analysis in terms of QALYs. In general, the descriptive systems of synthetic indicators have a good feasibility and reliability. However, the descriptive systems differ greatly with respect to their conceptualization of health and the sensitivity to differences and changes in health. In selecting or constructing a new instrument for use in OECD health statistics and policy analysis, there are political choices to be made with respect to the conceptualization of health, and trade-offs to be made between feasibility and sensitivity. Consensus in the OECD on a standard descriptive system is probably most easily reached if a simple descriptive system, focusing on disabilities in daily living and pain/discomfort, is adopted. This issue needs to be addressed more closely by the OECD as a follow up to this report.

The health state values produced by synthetic indicators may be used to express either trade-offs between quality of life and life expectancy or trade-offs between different kinds of improvements in health (including avoiding death) and the number of people who get to enjoy the improvements. Data from studies of public preferences suggest that individuals are reluctant to sacrificing life expectancy to obtain better health, and that societies place very high value on life saving procedures relative to health improving ones. Values provided by most existing synthetic indicators fail to encapsulate this preference structure.

To obtain synthetic indicators that are valid and at this same time sensitive at the top end of the health scale, health state valuations should possibly the trade-offs that people would want to make between quality of life and life expectancy if they were given the choice between different possible life scenarios at birth, or trade-offs between benefits in terms of either increased life expectancy or increased quality of life. An alternative strategy is to regard health state values as numbers that are supposed to express trade-offs in terms of societal value rather than utility. The rationale for this is that the preference for the preservation of life itself may be somewhat less absolute when people are asked to prioritise between different health care programs in a budgeting context than when they are asked about their willingness to sacrifice own future life years or certainty of survival to be relieved of illness. Further reflection is needed on the meaning purpose of synthetic indicators in cost-effectiveness analysis and measuring population health. International collaborative empirical research could probably establish usable health state values within three years.

Table of Contents


Acknowledgement



I am very grateful to Dr. Michael Wolfson of Statistics Canada, who helped me design the questionnaire used in this study. All errors or unjustified conclusions that the reader might find in this report are entirely my responsibility.

Table of Contents


Background



At a High-Level OECD meting on Health Care Reforms in Paris in November 1994, Ministers and Senior Officials laid great stress on the need to gauge the outcomes of health related public policies. The development and use of health outcome indicators was singled out as an essential tool for government policy analysis and development. The OECD Secretariat was requested to talk stock of existing health status measurement instruments, to explore the possible development of selected outcome indicators, and to contribute to the demonstration of their usefulness in policy analysis.

To fulfill this request, advice was solicited at a meeting of expert OECD Member countries in Oslo, Norway, in September 1995. They recommended starting with a state-of-the-art description of currently available methods and practice.

Table of Contents


Issues in Health Status Measurement



Two applications of health status measurements are of particular interest for the OECD. One is measuring population health. The aim is here to allow comparisons of population health over time and across countries or social subgroups. The other application is in cost-effectiveness analysis of health care programs. The aim is here to aid decisions about resource allocation.

Health status measurement instruments may be divided in two main categories. One comprises instruments that measure individuals' health on a number of different dimensions and produce so-called health profiles. Examples are the Sickness Impact Profile, the Nottingham Health Profile and the SF-36. The other category comprises instruments that establish health profiles and then transform these into single index scores on a scale from zero (dead) to unity (healthy) by means of some mathematical formula. The transformations are based on statistical analyses of population preference data that show how highly different dimensions of health are valued relative to each other. We may call such instruments synthetic indicators. Examples are the Quality of Well-Being Scale, the Health Utilities Index and the EuroQol Instrument.

Profile instruments have the weakness that they do not always allow judgements of which of two profiles is better than the other, since one profile may have higher scores on some dimensions, and the other profile higher scores on other dimensions, and there is no way of judging which of the differences is more important. Synthetic indicators purport to resolve this problem by concentrating all profile information in one single number, according to which different complex health states may be ranked in terms of their overall value to the individuals concerned. Furthermore, the numbers produced by synthetic indicators purport to express trade-offs between quality of life and life expectancy. This is potentially useful both in measuring population health and cost-effectiveness analysis of health care interventions.

Table of Contents


The Present Study



As part of its assessment of health status measurement instruments, the OECD Secretariat asked me to conduct a survey among authors/developers of synthetic indicators, in collaboration with Dr. Michael Wolfson of Statistics Canada. The Secretariat was particularly interested in reviewing such instruments with respect to the feasibility, reliability and sensitivity of their descriptive systems, the structure and validity of the valuation algorithms which transform individuals' health profiles into summary index numbers, the existence of translated versions of the descriptive part of the various instruments, their use in actual studies and use experiences.

Dr. Wolfson and I constructed a questionnaire with a view to collect the above data in a standardized way, see annex 1.

We were at the time aware of seven research centers in the world who had developed a synthetic indicator. The indicators include the Quality of Well-Being Scale (QWB: Kaplan and Anderson, 1988), the 15-D (Sintonen and Pekurinen, 1993), the Health Utilities Index, mark II and III (HUI2 and HUI3: Feeny et al., 1995), the EuroQol Instrument (EQ-5D: Brooks et al., 1996), the Index of Health Related Quality of Life (IHRQL; Stosser et al., 1992), the Quality of Life and Health Questionnaire (QLHQ: Hadorn, 1995) and the Australian Quality of Life (AQOL: Hawthorne and Richardson, 1996). The Questionnaire was distributed in March 1996 by the OECD Secretariat to the constructors of these instruments, as well as to contact persons in Member countries' Ministries of Health. The latter were asked to forward the questionnaire to any groups in their country who might have done relevant work.

I was later asked to analyse the completed questionnaires and other materials received by the OECD between April and November 1996 and to prepare a note on the convergence and differential paths observed, including suggestions for greater harmonization.

I was not able to embark on this task until March 1997. The material made available to me by the Secretariat in the meantime includes the following:

  1. Fully completed questionnaires concerning 15-D, HUI2, HUI3, EQ-5D and AQOL with various publications attached.
  2. A partly completed questionnaire concerning the QLHQ with various publications attached.
  3. Partly completed questionnaires concerning the Nottingham Health Profile and the WHOQOL (a new instrument developed by the World Health Organization). None of these are synthetic health indicators. The measure health only in terms of multidimensional health profiles.
  4. A fully completed questionnaire covering a Swedish Population Health Index, in which three items from the Swedish Population Health Index, in which three items from the Swedish National Health Survey have been used ad hoc to quality-adjust Swedish life expectancy figures.

No material was submitted from the constructors of the QWB and the IHRQL.

A draft report was prepared and circulated to the instrument constructors, together with a short questionnaire that aimed at eliciting more information on the sensitivity of the instruments (annex 2). The draft report did not include the section on the validity of health state valuations. Responses were received from the constructors of the 15-D, the EQ 5-D and the AQOL. These have been incorporated in the final report.

Table of Contents


Scope and Limitations



In the following, I mainly analyse the material on synthetic indicators, i.e., those mentioned under points A and B above. That does of course not mean that the other material is without interest. On the contrary, each of them is valuable methodological contributions to the measurement of health. The Nottingham Health Profile (NHP) has been extensively tested for its psychometric properties and is widely used in clinical studies. The WHOQOL represents a rare effort at measuring health by means of concepts that are shared by people across different cultures and is likely to obtain a prominent position in the field in the future. The Swedish Population Health Index (SPHI) is one of the first attempts in the world to combine morbidity and mortality data in a single indicator in national health statistics. However, the OECD has expressed an interest in possibly including one or more synthetic indicators in a standardized approach to the measurement of health and health outcomes in its Member countries. The following is primarily meant to shed light on this issue. However, I include a section on "other instruments." in which I briefly comment on the NHP, WHOQOL and SPHI as well as three synthetic indicators for which I did not receive any material for this study (QWB, IHQL and the Rosser and Kind Index).

The aim of the report is not to suggest a "best choice." My personal ties to some of the constructors of synthetic indicators - particularly the EQ-5D (I am a former member of the EuroQol Group), 15-D and AQOL - are much too close for that. Also, some of the statistical material provided by the constructors requires much closer examination than I have had time for. I try to restrict myself to bringing forth what I perceive as the most important facts reportd by the constructors themselves. In doing this, I mainly use data provided in the standardized questionnaire. References to data in publications which I have not had at hand or which have been difficult to find in available publications, have not counted heavily in my evaluation.

The section on the validity of the various instruments' valuation functions involves subjective judgements on my part about criteria for validity. I emphasise that an overall assessment of the instruments relative to each other requires a lengthy process in which this report can only be a start.

Table of Contents


General Descriptive Data



The six instruments for which I received data vary considerably in complexity (see table 1), partly because they were developed for different purposes. The reader should bear in mind that their usefulness relative to each other may depend heavily on the context in which they are used

Table 1:

Instruments, years in use, dimensions, levels and completion times


Instrument Years in Use Dimensions Levels Completion Time (min)
15-D 15a 15 5 5-10
EQ-5D 6b 5 3 1-2
HUI2 15 7 3-5 3
HUI3 Recent 8 5-6 2
AQOL New 15c 4 5-10
QLHQ Recent 2 4 1-2


aFirst with 12 dimensions
bFirst with 6 dimensions
c15 items that form 5 dimensions/subscales

All instruments purport to be suitable for clinical decision making, cost-effectiveness analysis and measuring population health.

With all six instruments, a self administered questionnaire is the main data collection mode, and all have in person interview (using paper and pencil) as an additional mode. Some also use telephone interview (EQ-5D, HUI2, HUI3, AQOL) and computer assisted interview (15-D, HUI2, HUI3, AQOL).

The AQOL is a very recent development. Its 15 dimensions (items) form 5 subscales. Data collection to establish its valuation function is in progress.

Table of Contents


The Conceptualisation of Health



Table 2 shows the health dimensions included in the six instruments (see table 3 for more details).

The dimensions are grouped according to WHO's definition of three different aspects of ill health: Organic impairments (I), disabilities (D), and handicaps (H) (WHO, 1980). Organic impairments include all physical and mental dysfunctions "within the skin." Disabilities are operations that a person is unable to perform due to organic impairments. Handicaps are limitations in performance of roles that are normal in the individual's social surroundings. For example, being blind is an organic impairment. It leads to the disability of not being able to drive a car. In North American Suburbia this again leads to a handicap in terms of not being able to participate in normal community activities.

In this terminology, conditions that are rectified by medication or technical equipment (epilepsy, myopia) are organic impairments, but not disabilities. Depending on the social environment they may constitute handicaps.

In table 2, I have chosen to modify the WHO classification system slightly by adding a fourth class that I call "mood problems." My reasoning is that health and quality of life problems form a hierarchy. Bodily dysfunctions (impairments) lead to problems with instrumental activities (disabilities), which again lead to problems with expressive activities (handicaps). All these three may contribute to bad mood, which conceptually is very close to subjectively perceived quality of life.

Table 2:

Dimensions in the Six Instrumentsa

Dimension WHO-Class 15-D EQ-5D HUI2 HUI3 AQUOL QLHQ
Breathing I 4          
Seeing I 2   1 1 10  
Hearing I 3   1 1 11  
Speaking I 7   1 1 12  
Sleeping I 5       13  
Bladdder/bowel function I 8          
Fertility I            
Dexterity I       5    
Mobility I 1 1 2 4 6  
Mental functioning I 10   4 7 (12)  
Pain I 11 4 6 8 15 1
Discomfort D 11 4       1
Vitality/energy D 14          
Washing/toilet D   2 5   4 2
Dressing D   2 5   4 2
Eating D 6   5   4 2
Performing household tasks D/H 9 3     5 2
Performing work tasks and leisure Activities D/H 9 3        
Performing sexual activity H 15         2
Carrying out role in family H   3     9  
Having close relationships/not feeling lonely H         7, 8  
Using medicines H         1, 2  
Needing regular medical treatment H         3  
Anxiety/nervousness/distress Mood 13 5 3   14  
Feeling depressed/unhappy Mood 12 5 3 6 14  

aThe numbers in the table refer to the number that each dimension has in the instrument in question. For instance, "breathing" is covered by the fourth dimension in 15-D, while "seeing", "hearing", and "speaking" are all covered by the first dimension of HUI2.

As shown in table 2, the instruments differ widely with respect to which of the four aspects of health they focus on. Unlike the others, HUI3 mainly measures organic impairment. AQOL is the only one that includes the quality of social relationships and dependence on medical treatment. Four instruments include basic physical functions such as seeing, hearing and speaking, while two (EQ-5D and QLHQ) do not. Unlike the others, QLHQ leaves out mood-dimensions.

The most striking similarity between the instruments lies in the pain-dimension, which is included by all. Thereafter come mobility and depression/unhappiness, which are shared by all but the QLHQ.

A number of dimensions are outliers, in the sense that they occur in only one instrument. These include breathing, bladder/bowel-function, fertility, dexterity, vitality/energy, performing sexual activity, carrying out role in family, having close relationships and needing medicines/medical treatment.

It is difficult to judge which of the instruments conceptualize health "better" than others, as this will depend on the criteria one chooses for judging goodness and this choice largely is a value judgement with political implications. For instance, the EQ-5D seeks to capture "health related quality of life as defined and valued by ordinary people." IN the QLHQ, by contrast, the operational definition of health depended on the prior question of "what aspects of human experience society can reasonably hold medical and surgical services accountable for improving." Clearly the conceptualization of health in the OECD context needs careful consideration and ultimately requires some difficult political choices.

Table of Contents


Dependence and Correlations between Dimensions



Individual scores on different dimensions may be correlated for two reasons. One is direct causal association. This is often referred to as "interdependence." For instance, the ability to perform work tasks may depend on the ability to see. The other is correlation due to common causal factors. For instance, sleeping problems and pain may both, independently of each other, be caused by exposure to certain negative factors in the work environment.

High correlations are undesirable for two reasons. One is that they imply redundancy in the data collection when the aim is to estimate an overall quality of life score on the basis of unidimensional scores. The other is that it becomes more complicated in the statistical analysis of the data to separate the effects of each of the dimensions on overall quality of life.

With all instruments except the HUI2, independence between dimensions was reported to be one of the criteria for selecting dimensions. IN the construction of HUI3, this criterion was particularly strongly emphasized, and it is part of the reason why HUI3 focuses on organic impairments. The constructors of HUI3 claim that their dimensions are fully independent. This is probably correct with the exception of the emotion." Which presumably is affected by organic impairments. Considerable dependence occurs with all the other five instruments. (The EuroQol Group (EQ-5D) claims that their dimensions are "logically distinct." This does not mean, however, that they are independent, which they certainly are (for instance "mobility" and "usual activity")).

A number of empirical correlations in the order of 0.6-0.7 are reported for 15-D and EQ-5D. On the other hand, with the 15-0D, 85 per cent of 105 possible meaningful intercorrelations are below 0.5. In the AQOL, the 15 dimensions are grouped in five subscales, each including three items. Correlations between subscales range from 0.2 to 0.44. With HUI2, HUI3 and QLHQ, data on correlations are missing.

Table of Contents


Feasibility, Sensitivity and Reliability



The empirical basis for judging the instruments' psychometric properties in the following varies. In their reports, the constructors of 15-D, EuroQol and HUI3 all refer to use of the instruments in large, representative population samples (N = 1,200 - 20,000). HUI3 reliability data are based on a general population sample of 506 people. With the other instruments the basis for the psychometric reports are as follows: QHLQ: 400 cancer patients; HUI2: 124 school children; AQOL: 129 in-patients and 228 persons from the general population.

As noted above, 15-D and AQOL have higher completion times than the other instruments. With 15-D, this corresponds with its completion rate: 17 percent of a general population sample failed to answer on all the 15 dimensions of 15-D. Refusal or failure was less than 3 per cent with the other instruments.

The sensitivity and validity of a MAU-instrument to differences and changes in health depends on the number of dimensions (items) n its descriptive system and the number of levels on each dimension. As shown above, these vary considerably across the six instruments in question. For instance, the number of dimensions/items ranges from 2 (QLHQ) to 15 (15-D and AQOL), reflecting different philosophies and aims among instruments than for instance the EQ-5D and the QLHQ. Sensitivity is on the other hand obtained partly at the expense of ease of completion, see completion times in table 1, which again could mean at the expense of feasibility and response rates in collecting data, see the previous paragraph.

In the second questionnaire (annex 2), developers were asked to indicate - in relative terms - the sensitivity of their instrument in the five following contexts:

  1. Changes in general population health over time in wealthy, industrialized countries
  2. Changes in general population health in developing countries.
  3. Differences in general population health between OECD countries.
  4. Symptom relief and functional improvements in severely ill patients.
  5. Cures for patients with moderate conditions.

I received responses from only the developers of 15-E, EQ 5-D and AQOL. Generally (and unsurprisingly!) they hold their instruments to be highly sensitive. The 15-D purports to be particularly sensitive in context B. The EQ 5-D purports to have somewhat lower sensitivity in context E than in contexts A, B and C. The instrument has not been tried in context B. The AQOL claims to be most sensitive in contexts A and E, and least sensitive in context B, with context C and D in an intermediate position. The developers provide various references to substantiate their claims. This material needs to be examined more closely.

For the development of standard health status measures in OECD area, it may be useful to evaluate individual items and scales in addition to evaluating instruments as a whole, with a view to selecting items from different instruments that have proven to have satisfactory psychometric properties, or perhaps to construct similar items. Table 3 gives data on the individual items of five of the instruments (data on the QLHQ are lacking). The focus is on three aspects: Non-response, retest-reliability and discriminant capacity.

Table 3:

Psychometric properties of individual items


     

Distribution

 

Instrument/sample/dimensions

Levels

Non-response (%)

Level 1 (%)

Level 2 (%)

Two lowest (%)

Retest reliabilitya

15-D (general population)

-

-

-

-

-

-

Mobility

5

3.3

81

16

0.3

0.92

Vision

5

3.2

79

18

2.0

0.92

Hearing

5

3.3

83

14

0.2

0.92

Breathing

5

3.1

70

22

2.9

0.94

Sleeping

5

3.3

48

44

2.5

0.98

Eating

5

3.2

96

3

0.3

0.97

Speech

5

3.4

89

10

0.3

0.97

Elimination

5

3.4

73

28

0.3

0.92

Usual Activity

5

3.6

71

20

2.6

0.92

Mental

5

3.3

73

24

0.6

0.97

Discomfort

5

3.9

40

51

1.6

1.00

Depression

5

3.6

54

41

1.5

0.98

Distress

5

4.2

56

38

1.1

0.95

Vitality

5

3.6

46

45

1.8

-

Sexual activity

5

9.3

74

17

4.7

-

 

Distribution

Instrument/sample/dimensions

Levels

Non-response (%)

Level 1 (%)

Level 2 (%)

Two lowest (%)

Retest reliabilitya

HUI2 (school children)

 

Sensation

4

0.0

75

15

10.0

-

Mobility

5

0.0

97

3

0.0

-

Emotion

5

0.0

88

6

0.0

-

Cognition

4

0.0

97

3

0.0

-

Self care

4

0.0

100

0

0.0

-

Pain

5

0.0

92

5

2.0

-

Fertility

3

0.0

-

-

-

-

 

Distribution

Instrument/sample/dimensions

Levels

Non-response (%)

Level 1 (%)

Level 2 (%)

Two lowest (%)

Retest reliabilitya

HUI3 (general population)
Vision

6

1.1

48

50

0.7

0.73

Hearing

6

0.3

95

2

0.5

0.50

Speech

5

<0.1

99

0.7

0.2

0.14

Ambulation

6

0.0

97

1

1.0

0.69

Dexterity

6

0.0

99

0.8

0.3

0.35

Emotion

5

0.0

74

23

0.9

0.60

Cognition

6

0.0

69

6

3.2

0.59

Pain

5

0.0

83

6

6.2

0.63

 

Distribution

Instrument/sample/dimensions

Levels

Non-response (%)

Level 1 (%)

Level 2 (%)

Two lowest (%)

Retest reliabilitya

AQOL (inpatients)
Medicines

4

0.8

42

29

27

0.62

Medical aid

4

0.4

36

20

43

0.55

Treatment

4

1.2

40

18

40

0.59

Self care

4

1.2

71

19

9

0.45

House tasks

4

0.4

58

29

11

0.56

Mobility

4

0.4

82

10

8

0.49

Closeness of relations

4

1.6

70

24

5

0.27

Loneliness

4

0.4

46

43

11

0.37

Family role

4

1.6

64

23

12

0.51

Vision

4

1.6

59

38

2

0.34

Hearing

4

0.8

75

21

4

0.23

Communicating

4

1.2

77

16

6

0.26

Sleep

4

1.6

40

28

30

0.33

Mood

4

1.2

39

43

17

0.46

Pain/discomfort

4

0.4

32

58

10

0.33

 

Distribution

Instrument/sample/dimensions

Levels

Non-response (%)

Level 1 (%)

Level 2 (%)

Two lowest (%)

Retest reliabilitya

QLHQ (cancer patients)
Physical symptoms

4

-

-

-

-

-

Daily activities

4

-

-

-

-

-

When comparing instruments in table 3, one should bear in mind that the samples of subjects underlying the data range from general population samples to patient samples.

The highest level of non-response is (unsurprisingly) on the dimension "sexual activity" in 15-D. In general non-response seems to be a minor problem (even if the percentages reported on HUI2 and HUI3 look strangely low).

Retest-reliability coefficients are very high in 15-D, while only moderate in HUI3 and AQOL. For subscales in AQOL, reliability coefficients are from 0.52 to 0.87 (not in table). Further information on the statistical measures use is required to explain this difference between 15-D and HUI3/AQOL. Data on reliability are missing for the other instruments.

The AQOL has data from in-patients, in whom one would expect a large percentage performing less than at level 1 ("no problem") and a considerable percentage performing at low levels. This is also what the instrument produced in the pilot survey.

In data from general population samples, we would expect the vast majority of responses to occur at the highest levels and very few responses at the lowest levels. This is true of 15-D, EQ-5D and HUI3.

However, comparable items in the latter three instruments differ with respect to how subjects spread on the two highest levels. HUI3 tends to have a higher concentration of subjects at level 1 than the other two instruments, see table 4.

Table 4:

Percentage of subjects at level 1

 

Mobility

Vision

Hearing

Speech

Pain/discomfort

Emotion

Mental

15-D

81

79

83

89

40

55

73

EQ-5D

82

67

79

HUI3

97

48

95

99

83

74

69


Most of the differences in table 4 may be understood by comparing wordings at level 2 of the dimensions in question. For instance, 15-D speaks of "slight difficulties" with walking outdoors, while HUI3 speaks of "difficulty." So level 2 of mobility is an easier choice in 15-D than in HUI3. On the other hand, 15-D speaks of "slight difficulty" with reading, while HUI3 speaks of dependence on glasses. The latter is presumably a less severe condition than the former and makes level 2 an easier choice.

It is difficult to say how an instrument ideally should divide a population in different levels of functioning. The answer depends on the purpose of the measurement. This is largely a political issue. However, it seems that opportunities for describing health differences in a population are wasted if, on a multi-level dimension, the step from level 1 to level 2 is so big that nearly all subjects respond at the highest level. For this reason one might question the appropriateness of some of the level 2 wordings of the HUI3.

Table of Contents


Valuation Procedures



Valuations of multidimensional states on the conventional 0-1 scale are obtained as follows:

15-D: Each level on each dimension has been assigned a score by means of a two-step rating scale procedure in a general population sample (N=1288). The scores on all 15 dimensions are added to estimate the value of a composite state. The estimates have not been validated through comparison with direct valuations of composite states (however, see section on development work in progress).

EuroQol: Selected composite states were valued by means of a rating scale and the time trade-off techniques in sample of the UK population (N=3395). An additive formula has been established by means of regression analysis to estimate the values of other composite states. Explained variance exceeds 0.95. Tables of rating scale and time trade-off values for all 243 possible states are available.

HUI2: Levels on single dimensions were scored by means of a rating scale procedure, and selected composite states by means of a rating scale and standard gamble. The sample was 194 parents of school children. Single dimension scores are put together in a multiplicative formula to estimate values for composite states. On four composite states, the mean absolute difference between estimated values and directly measured standard gamble values was 0.04.

HUI3: Seventy states that are common in the general population have been scored directly by means of a rating scale and standard gable, yielding "values" and "utilities" respectively. A multiplicative model for estimating the value/utility of any state is being developed. The sample was 508 people from the general population.

AQOL: A multiplicative model is being developed based on time trade-off questions. Another model, based on person trade-off questions (see below), will be developed when funding is available.

QLHQ: All 16 possible states were scored directly by means of a rating scale in various convenience samples, total N=559.

The researchers behind the EQ-5D, HUI2, HUI3 and QLHQ found little variation in valuations across social subgroups.

Table of Contents


The Meaning of the Valuations



There is much debate about the interpretation of numerical values for health states. We therefore asked the constructors of the instruments to indicate the kind of decision-oriented propositions that can be made on the basis of the numbers that their instruments yield: "Assume two health states X and Y, to which your instrument assigns the scores 0.8 and 0.6 respectively. Disregarding imprecision of measurement, which of the following propositions are in your view implied (at least broadly speaking) by these numbers?" Table 6 shows the instruments and the propositions their values imply according to the instrument constructors. (I emphasize that these propositions are not necessarily correct. The table reports what is being claimed by the instrument developers.)

Table 6:

Propositions that V(X)=0.8 and V(Y)=0.6 purport to imply

 

15-D

EQ-5D

HUI2

HUI3

AQOLa

QLHQ

A. All else equal, the average person in state Y is willing to sacrifice twice as much to become health as the average person in state X.

X

B. All else equal, the average person who is about to die is willing to sacrifice five times as much to be restored to full health as the average person in state X.
C. The average person facing a life in state X is willing to take a treatment that gives him/her at least an 80 percent chance of becoming healthy and at most a 20 percent chance of dying immediately (assuming that the treatment itself causes negligible inconvenience and discomfort).

X

X

X

D. The average person in state X would be indifferent between living the rest of his/her life in this state and living a 20 per cent shorter life in full health.

X

X

E. Society regards as equal in value (a) a program that restores one person from dying to full health and (b) a program that cures five people in state X (given equal life expectancy after the intervention).

X

X

X

(PTO)

F. The health related quality of life in state Y is 75 percent of that of life in state X.

X

X

X

G. The utility derived from state Y is 75 percent of that derived from state X.

(X)

X

X

aWith the AQOL, positive responses apply to time trade-off based values, with the exception of proposition E, where the response applies to person trade-off based values.

Table 6 tells us that constructors of different instruments place quite different meanings on the numbers they offer. I return to the issue of meaning below.

The constructors were asked if the implications they claim that their numbers have are supported by direct preference measurements (questionnaire, p. 12). For instance, if implication C was claimed, the constructor was asked if there was evidence of the instrument's ability to predict actual chronic patients' willingness to risk death in order to get well. Table 7 shows the results for those of implications C-G which were claimed by each instrument (cfr. Table 6). Generally speaking, empirical support of this kind is weak. In my opinion, this is true also of 15-D, for which "some" support was claimed.

Table 7:

Empirical support of purported implications of health state values,
according to instrument constructors (cfr. table 6)

15-D

EQ-5D

HUI2

HUI3

AQOL

QLHQ

C

"some"

no

no

D

"some"

not yet

E

"some"

no

F/G

no

possible

no

Table of Contents


The Variation in Health State Valuations



Constructors of different health state scaling instruments not only place quite different meanings on the numbers they offer. They also offer different numbers for the same states. The magnitude of this problem may be indicated by the scores the following examples states receive:

  1. Severe problem: a person who sits in a wheel chair, has pain most of the time and is unable to work.
  2. Considerable problem: a person who uses crutches for walking, has light pain intermittently and is unable to walk.
  3. Moderate problem: a person who has difficulties in moving about outdoors and has slight discomfort, but is able to do some work and has only minor difficulties at home.

Table 9:

Health state scores according to different health status index models

MODEL

Problem Level

Severe

Considerable

Moderate

15D

.77

.86

.91-.3

HUI2

.40

.70

.90-.94

EQ-5D (rating scale)

.20

.60

.70

EQ-5D (TTO)

.20-.25

.40-.50

.80

QLHQ

.30-.40

.50-.60

.60-.70

IHQL (3D)

.50-.70

.75-.85

.89-.93

IHQL (complex)

.70-.75

.80-.90

.90-.94

Rosser/Kind Index

.68

.94

.97-.98

QWB

.45-.55

.65-.70

<.80


Given these large differences in valuations, there is clearly a need to judge which valuations are more valid than others and hence of greater interest for OECD health statistics. This is a highly controversial issue. I emphasize that not all researchers in the field will agree with the following analysis.

Table of Contents


The Validity of Health State Valuations



The validity of health state valuations depends on the use to which they are put. As noted above, two applications are of particular interest for the OECD. One is measuring population health. The aim is here to allow comparisons of population health over time and across countries or social subgroups. The other application is in cost-effectiveness analysis of health care programs. The aim is here to aid decisions about resource allocation.

In each of these applications, health state valuations may be understood in two different ways. One is in terms of utility, the other in terms of societal value.

Utility is basically an emotional category: How good is a health state or a health outcome felt by the individuals concerned? Total utility for a group of people, for instance for a nation as a whole, is simply the sum of all individual utilities. Societal value, on the other hand, is a broader, ethical concept. While it is partially a function of total utility, it is also determined by concerns for fairness and hence by the distribution of utility across individuals.

In the following I evaluate synthetic health indicators both as measures of utility and as measures of societal value. I focus mainly on measuring population health, which I believe is the most relevant application of synthetic indictors from the OECD's point of view. However, the conclusions I draw regarding measuring population health apply to cost-effectiveness analysis. A brief section is added to show this.

Table of Contents


Measuring Population Health



Health state valuations may be used to measure population health in terms of health adjusted life expectancy. This concept is in use in Canadian health statistics (Roberge et al, 1996). Following the Canadian example, I shall call an indicator of health adjusted life expectancy a HALE. To understand the concept, consider for instance a person who gets to experience a life scenario S which roughly may be described as follows:

First 50 years as healthy,
then 10 years in a state A of slight discomfort,
then 10 years in a state B of slight discomfort and disability,
and finally 5 years in a state C of moderate discomfort and disability.

If states A, B and C are assigned values of 0.9, 0.8 and 0.6 respectively, the person may be said to get 50x1.0+10x0.9+10x.0.8+5x0.6 = 70 health adjusted life years. In a whole population, one may calculate the average health adjusted life years per person on the basis of life expectancy tables and survey data on health status in different age groups. This average may then be used as an estimate of the health adjusted life expectancy at birth in that population.

The construction of such a HALE assumes the existence of three trade-offs.

First, there is a trade-off between quality of life and length of life. For instance, in the example above, 75 years with scenario S, which includes considerable discomfort and disability, is regarded as equivalent to 70 years in full health. I call this a time-quality trade-off.

Second, there is a trade-off between quality of life and the number of persons involved. Consider for instance a situation in which 100 people all live 75 years in a state that scores 0.8. Their average health adjusted life years is 60. Consider two possible changes in this situation. One consists in 20 people getting to live as healthy, while the other 80 people remain at level 0.8. The other possible change consists in all 100 people getting to live in a state that scores 0.84. In both cases, i.e. an average increase of 3 years. Effectively this means that, according to HALE, taking 20 people from 0.8 to healthy is equivalent to taking 100 people from 0.8 t0 0.84. I call this a person-quality trade-off.

Third, there is a trade-off between length of life and the number of persons involved. Again this follows from the averaging procedure. Consider for instance a situation in which 100 people all live 70 years as healthy. Consider another situation in which 20 people live 74 years and 80 people live 69 years. Again the average health adjusted life years is 70. So compared to the first situation, the upward movement of 20 people from 70 to 74 compensates for the downward movement of 80 people from 70 to 69. I call this a time-person trade-off.

From a societal perspective, the value assumption underlying the time-person trade-off is not problematic. For example, it is not self evident that society would regard the two situations described above (100 people/70 years versus 20 people/74 years plus 80 people /69 years) as equally desirable. However, the time-person trade-off assumption in a HALE is independent of the values that are assigned to health states and therefore lies beyond the scope of this report. In the following I concentrate on the time-quality trade-offs and the person-quality trade-offs. In a HALE, these trade-offs follow mathematically from the values that are assigned to health states. One way to test the validity of these values is therefore to examine whether the trade-offs that are implied by the values, fit with the trade-offs that people in society make if they are asked directly. If they do not fir, the values are unsuitable in the construction of a HALE.

To perform this validity test, we need to do one more thing. That is to specify the substance, or the object, of the trade-offs. There are two possibilities.

Table of Contents


Utility



Let us assume that the trade-offs that one wishes to build into a HALE are in terms of utility. The proposition that 75 years of life at levels described in scenario S above is equivalent to 70 full years in full health (time-quality trade-off) then means that the utility that individuals derive from these two scenarios is the same. Similarly, the proposition that taking 20 people from 0.8 to healthy is equivalent to taking 100 people from 0.8 to 0.84 (person-utility trade-off) means that the utility gained by the former equals the utility gained by the latter change. A policy maker may want to know how these interpretations can be verified. This implies checking that the utility assigned to the various health states involved is correct. How can this be done?

Since utility is a matter of subjective feeling, and the strength of feelings is not directly observable, verification of the utility assignments is not straightforward. However, in theory individual utility may be assessed indirectly by looking at a behavioral correlate to the subjective feeling of utility. One such behavioral correlate is the quality-of-life-scores that subjects assign to themselves when asked to evaluate their own health on rating scales. A rating scale may be a "ladder" of numbers from for instance 1 to 10, where 1 stands for a very bad health state, and 10 for a very good health state, or simply a line where the end points similarly stand for a very bad and a very good state respectively. Subjects are asked to chose a point on the scale which they associate with their own health. However, rating scale scores have been shown not to have interval scale properties. That is, equally large intervals on different parts of such scales (for instance a movement from 0.4 to 0.6 versus a movement from 0.7 to 0.9 on a scale from 0 to unity) do not carry the same significance to the individuals concerned (Allison and Durand, 1989; Nord, 1991; Richardson, 1994). Hence, rating scale measurements do not allow a meaningful summation of utility across individuals. I therefore do not recommend the use of utilities operationalised as self ratings on rating scales in OECD health statistics.

In economic theory and decision analysis, a more widely accepted behavioral correlate to perceived utility is the individual's willingness to sacrifice life expectancy to be cured of a state of illness or to obtain a given health improvement. People's willingness to undertake such sacrifices are elicited by means of standard gamble or time trade-off questions. So one potential use of synthetic indicators in the OECD context is to estimate the disutility of states of disability and illness that occur in the population in terms of people's willingness to sacrifice life expectancy to be relieved of these states.

As noted in table 6, the 15-D, HUI2, HUI3 and AQOL all purport to do this, and the EQ-5D might be expected to do the same, given that it in one version is based on time trade-off questions. However, the empirical support that the developers claim for this interpretation of their values is weak or non-existent (table 7). Results from studies other than those cited by the developers suggest that values assigned by synthetic indicators in fact overestimate willingness to sacrifice life expectancy. Consider for instance the "moderate problem" described earlier: "A person who has difficulties in moving about outdoors and has slight discomfort, but is able to do some work and only has minor difficulties at home." As shown in table 9, synthetic indicators assign values to this state that range from 0.60 to 0.98, with the majority lying around 0.90. The latter corresponds to a willingness to sacrifice 10 per cent of life expectancy to become well. However, a number of studies suggest that people who actually are at such a moderate problem level are not willing to sacrifice any life expectancy at all to be relieved of their problems. For instance, Sherbourne et al collected time trade-off and standard gamble data from close to 17.000 patients visiting primary care clinics across the USA. On average, the patients had two chronic conditions. Their average score on a rating scale from zero ("worst possible health state") to 100 ("perfect health") was 75. However, 85 percent of the patients were not willing to sacrifice any life expectancy to be relieved of their condition. Even for patients with five different chronic diseases this percentage was as high as 65 (Sherbourne and Sturm, 1997). Studies by O'Leary et al (1995), Fowler et al (1995), Nord (1996) and Stavem (1996) show similar results. The implication is that if health state valuations are to be used in measuring population health in terms of utility, and utility (or disutility rather) is measured in terms of willingness to sacrifice life expectancy in sick and disabled people, then none of the synthetic indicators in table 9 seem to provide valid valuations.

Table of Contents


Societal Value



The trade-offs that one wishes to build into HALE may alternatively be understood in terms of societal value. To say that 75 years of life at levels of health in scenario S above is equivalent to 70 years in full health (time-quality trade-off) then means that society values these two scenarios equally much. Similarly, the proposition that taking 20 people from 0.8 to healthy is equivalent to taking 100 people from 0.8 to 0.84 (person-quality trade-off) means that society values these two improvements equally much. Again, there is a need to make sure that these trade-offs that become built into the HALE through the health state valuations that are used, fit with actual societal preferences.

There are very few data on the time-quality trade-offs that society wishes to make in valuing alternative life scenarios for other people. Almost all studies focus on individuals' personal time trade-off preferences. An exception is a small study by Richardson and Nord (1996), who did not find any significant difference between preferences expressed in these two perspectives.

For lack of better data, I shall assume that society will wish to make the same time-quality trade-offs in valuing population health as individuals do in their own lives, since there is no need to adjust for concerns for distribution across persons in choosing between time and quality. The question then becomes: Do health state valuations provided by synthetic indicators correctly reflect individuals willingness to sacrifice lifetime in order to obtain better health? As we have seen in the quality section above, the answer is probably negative. The valuations generally seem to overestimate this willingness to sacrifice.

With respect to societal person-quality trade-offs, there are much more studies available. The studies, which includes work by Rachel Rosser and Paul Kind in the UK, Peter Ubel and colleagues in the US, Jose-Luis Pinto in Spain, Jeff Richardson and colleagues in Australia and myself in Norway, is reviewed in Nord (1996). The message from these studies is as follows:

Members of the public want their health care systems to produce as much health - or utility - as possible, but within certain constraints. One is that health improvements for the severely ill are valued more highly than equally large improvements for less ill people. People also tend to feel that their right to realize their potential for health is the same, whether the potential happens to be large or small.

The evidence reviewed by Nord (1996) is scattered and heterogeneous. However, it suggests the order of magnitude by which people in some industrialized countries value health improvements for people with different degrees of severity of illness and different potentials for health improvement. To picture this order of magnitude, consider four classes of outcomes, corresponding to the problem levels described earlier in connection with table 9:

  1. Saving a person's life to a life as healthy.
  2. Curing a person with a severe problem, for instance a person who sits in a wheel chair, has pain most of the time and is unable to work.
  3. Curing a person with a considerable problem, for instance a person who uses crutches for walking, has light pain intermittently and is unable to work.
  4. Curing a person with a moderate problem, for instance a person who has difficulties in moving about outdoors and has slight discomfort, but is able to do some work and has only minor difficulties at home.

In countries like Australia, England, Norway, Spain and the US the social appreciation of outcome A seems to be something like 3-6 times as high as that of class B outcomes, 10-15 times as high as that of class C outcomes and 50-200 times as high as that of class D outcomes. I emphasize that these numbers pertain to valuations of outcomes decisions about future treatment capacity (as opposed to decisions concerning identified patients in current need). In OECD countries, a HALE that purports to indicate whether one state of affairs in population health is more desirable than an other, needs to reflect this structure of concern. To do this, health states need to be assigned values in the following order of magnitude in the construction of HALE:

Severe problem (cfr. Outcome class B0: 0.65-0.85
Considerable problem (outcome class C): 0.90-0.94
Moderate problem (outcome class D): 0.98-0.995

As shown earlier in table 9, most of the synthetic indicators are very far from satisfying these requirements. In general, they assign too low values to states of moderate and slight illness, which in turn leads them to assign too high values to improvements for people with severe or life threatening conditions.

Table of Contents


Cost-Effectiveness Analysis and QALYs



In cost-effectiveness analysis, health state valuations are supposed to express utility. They are used to calculate the benefit of different interventions in terms of quality adjusted life years (QALYs) gained. If one person gets one additional life year in full health, that is a utility gain of 1 (1-0) for 1 year. This benefit is called a QALY. If an individual A gets an increase in utility from 0.6 to 0.9 for two years and from 0.6 to 0.7 in the next three years, his/her health benefit is 2x0.3 + 3x0.1 = 0.9 QALYs. If individuals B and C by similar calculation each receive a health benefit of 2.5 and 0.6 respectively, then the total health benefit for A, B and C is 4 QALYs (0.9+2.5+0.6).

The use of synthetic indicators in QALY calculations presupposes that the values they provide correctly express the utility of health states. As noted above, they in fact do not seem to meet this requirement. Utility needs to be measured in terms of willingness to sacrifice life expectancy, and values from synthetic indictors seem to overestimate people's true willingness to make such sacrifices.

It is common practice in health economics to rank health care programs according to their cost per QALY and to suggest that priority in resource allocation should follow this ranking (Weinstein and Stason, 1977; Williams, 1987). This means that QALYs are de facto interpreted as expressions not only of utility, but also of societal value. When this is the case, health state valuations have concrete policy implications in terms of person-quality trade-offs, as shown in the previous section. For instance, the assignment of values 0.6 and 0.8 to states A and B respectively implies that curing two people in state B would be valued as highly as curing one person in state A (assuming equal duration of the benefit). Such implications need to fit with the person-quality trade-offs that society actually holds. In the previous section I suggested that in fact, values provided by synthetic indicators fail to meet this requirement, due to lack of upper end compression. The values are then as unsuitable in cost-effectiveness analysis as they are in a HALE.

Table of Contents


Conclusions on the Validity of Health State Valuations



Health state valuations may be used to express either (a) trade-offs between quality of life and life expectancy (time-quality trade-offs) or (b) trade-offs between different kinds of improvements in quality of life and number of people who get to enjoy health improvements (person-quality trade-offs). These two kinds of trade-offs may refer either to utility or to societal value, and they may apply to measuring population health in terms of health adjusted life expectancy as well as to cost effectiveness analysis. Whichever of these applications one has in mind, health state valuations need to have strong upper end compression in order to reflect individual and societal preferences correctly. That is, states of moderate illness need to be assigned values close to unity, and even states of considerable discomfort and disability need to be assigned values around 0.90. This is necessary to capture the very string valuation of life itself that presents itself in preference studies both at the individual and the societal level: Individuals are reluctant to sacrificing life expectancy to obtain better health, and societies place very high value on life saving procedures relative to most health improving ones. As shown in table 9, all existing synthetic indicators except for the Rosser and Kind Index fail to meet these valuation requirements. Among those indicators for which material was submitted for this study, the 15-D deviates the least from the required valuation structure.

Table of Contents


Possible Strategies to Improve Validity



One response to the above state of affairs would be to adjust the valuation algorithms of synthetic indicators in such a way as to produce the necessary upper end compression of values. However, here we need to distinguish between the utility perspective and the societal-value perspective.

With respect to the utility perspective, the preceding sections are based on the common assumption in health economics and decision analysis that the disutility of a state of illness may be measured as the willingness to sacrifice life expectancy to be relieved of the illness. Given this particular definition, I am led to draw the conclusion that values from synthetic indicators generally lack upper end compression, since people with illness express great reluctance to make such sacrifices when asked directly in preference studies.

If this personal preference structure were built into the values assigned to health states, a serious sensitivity problem would arise: A great number of mild and moderate states would be assigned the value of 1. Real health improvements for people with such conditions would then not be captured by the synthetic indicators, since there would be no differences between the values for such conditions and the value for full health. This would be a problem both in measuring population health and in cost-effectiveness analysis.

Possibly, the problem can be resolved by looking at the utility of health states from a different angle. The reluctance to sacrifice life expectancy to be relieved of a given illness may derive from a general preference for maintaining status quo, or from a general aversion to accepting any kind of loss in health, including life expectancy. In long term health planning, such preferences are arguably a source of bias, which should be avoided in the valuation of health states. Instead of asking people how much they would be willing to sacrifice to be relieved of an illness they already have, one could therefore ask them which of different life scenarios they would prefer if they were at the start of life and therefore not yet attached to any particular scenario. Or one could ask them which of different scenarios they would prefer for their children. It is conceivable that with such a perspective, sacrificing life expectancy to gain quality of life would be an easier choice, in which case the trade-offs suggested by existing synthetic indicators would be closer to the truth.

Another possible strategy is to ask people not in terms of willingness to sacrifice lie expectancy, but rather to elicit their trade-offs between two different benefits: gained quality of life improvements and increased life expectancy. For instance, a 70 year old disabled person may not be willing to sacrifice any life years in order to become well. But he might be prepared to say that to become well would be just as valuable as getting an extra year of life.

I suggest these alternative utility perspectives as approaches that are worth looking further into as possible theoretical bases for valuing health states in the OECD context. Preference data based on these perspectives are to my knowledge scarce. Should one conclude that such perspectives are fruitful, there would be a need to collect new preference data. It would be natural to do this in a standardized way in several, if not all, OECD countries.

A third possible strategy is to regard health state values as numbers that are supposed to express trade-offs in terms of societal value rather than utility. The rationale for this is that the preference for the preservation of life itself may be somewhat less absolute when people are asked to prioritize between different health care programs in a budgeting context than when they are asked about their willingness to sacrifice own future life years or certainty of survival. In other words, I am suggesting that more health states will be assigned values below unity if person trade-off questions are posed to the general population than if time-trade off or standard gamble questions are posed to people with illness and disability. Choosing the societal value interpretation of health state valuations may therefore be a way to assure both validity and sensitivity in the construction of synthetic health indicators.

Table of Contents


Translations of the Instruments



The 15-D originated in Finnish and was later translated into English. The other instruments originated in English. Table 8 gives data on translations.

Table 8:

Translations of Instruments

  15-D EQ-5D HUI2 HHUI3 AQOL QLHQ
Number of languages 9 9 6 7 0 0
Number of translators 2 3-4 unsure several n.a. n.a.
Back translation yes mostly In some areas In some cases n.a. n.a.

The specific languages are as follows:

15-D: Finnish, Swedish, English, Norwegian, Greek, Czech, Japanese, Russian, Hebrew, Arabic
EQ 5-D: English, Dutch, Swedish, Finnish, Norwegian, Danish, French, German, Spanish/Catalan, Greek
HUI2: English, French, Dutch, Spanish, Swedish, Norwegian, Japanese
HUI3: English, French, Dutch, Spanish, Swedish, Norwegian, Japanese, Hebrew

Table of Contents


Applications to Date



15-D: Finnish National Health Survey 1995, 1237 complete responses. National Health Survey 1996, sample of 4,800 people. A number of clinical studies, including in the following areas: Hips and knee replacements, depression, acute low back pain, gastrointestinal disorder, epilepsy, survival of malignancies of childhood, spinal cord injury, genetic skeletal dysplasias.

EQ-5D: Health surveys in general population samples in the UK, Holland, Finalnd, Sweden, Spain, Germany, the US. Brooks et al. (1996) list 35 areas in which the EQ-5D is being used in clinical studies. The following areas are highlighted in the response to the OECD questionnaire: menorrhagia, cystic fibrosis, vascular disease, migraine, rheumatoid arthritis, end stage renal disease, dystonia.

HUI2: One Provincial and two National Social/Health Surveys in Canada 1990-1994, N = 11,000 to 68,000. The completed questionnaire lists 18 clinical studies in the areas of childhood leukemia, other childhood cancer, brain tumors in adults, children admitted to intensive care units, survivors of extremely low birth weight, osteoporosis and neurosurgical patients.

HUI3: One Provincial and two National Social/health Surveys in Canada 1990-1994, N = 11,000 to 68,000. The completed questionnaire lists 8 clinical studies in the areas of childhood leukemia, central nervous system tumors in childhood, brain tumors in adults, survivors of extremely low birth weight, osteoporosis and neurosurgical patients.

AQOL: The instrument is currently being employed or its use planned in a number of clinical studies, including pharmacoeconomic trials and studies in the areas of stress, stroke, Parkinson's disease, breast cancer, rehabilitation, ocular disease and heart disease.

QLHQ: Study of QoL in 400 patients with advanced-stage cancer.

Table of Contents


Development Work in Progress



15-D: Two childrens' versions have recently been finalized, one for ages 8-11, the other for ages 12-15. Standard 15-D values are being compared with standard gamble, time trade-off and rating scale scores in 400 patients.

EQ-5D: Alternative scaling methods are being tested and compared. Translation procedures are being standardized. A fourth level on each dimension is being tried out. Sensitivity is being measured.

HUI2: Development work focuses on HUI3.

HUI3: The instrument is being tested for reliability, responsiveness and concurrent validity. Disease specific modules are being developed.

AQOL: Scaling of items and development of a valuation function will be finalized in mid 1997. Piloting will be conducted in clinical trial with pre- and postintervention measurement.

QHLQ: No development work indicated.

Table of Contents


Other Instruments



I received material for two multidimensional quality of life instruments that do not yield a single index score (Nottingham Health Profile and WHOWOL), as well as material on an unfinished attempt by Statistics Sweden to health adjust life expectancy by means of data from the Swedish National Health Survey and assignment of quality weights based on rough judgement. These are briefly reviewed in the following, together with three synthetic indicators for which no material was submitted for the present study.

The Nottingham Health Profile

The Nottingham Health Profile was developed by Hunt et al (1980) for use in cost-effectiveness analysis and measuring population health. It is designed to assess perceived distress. The content was derived from patient and non-patient interviews, who were asked to state what was important about health and illness. The instrument has 38 dichotomuos items that form six dimensions: Energy level, pain, emotional reactions, sleep, social isolation, physical mobility. Data are collected by means of a self-administered questionnaire or personal interviews. Completion time for the questionnaire is 5 minutes. Response rate in a random sample of 1145 adults was 89 per cent. Test-retest reliability for dimensions range from 0.77 to 0.85 (Spearman rank correlation coefficients). The instrument has been translated into 14 other languages and has been used in hundreds of clinical studies.

The developers of the instrument write that each unidimensional score must be understood as ordinal rather than cardinal. They do not believe in the transformation of health profiles into single index numbers. Hence, they "do not believe that any current instrument can make valid comparisons between diseases" (McKenna in his response to the OECD-questionnare).

WHOWOL

The WHOQOL is an instrument being developed by the World Health Organization (WHOWOL Group, 1995) for clinical decision making, cost-effectiveness analysis and measuring population health. It is being developed as a collaborative project in numerous cultural settings and in more than 12 different languages simultaneously. The purpose of this approach is to increase comparability of health state measurements across countries that differ significantly with respect to cultre and health values. The WHOQOL pilot instrument contained 235 core questions addressing 29 facets of life. In on going field trials, a 100 item measure is being used, producing 24 facet scores and 6 domain scores. The domains are: Physical, psychological, level of independence, social relationships, environment, spirituality/religion/personal beliefs. A sophisticated procedure is followed to ensure equivalence - and hence comparability - between different language versions with respect to the various levels on the response scales. Research to establish the psychometric properties of the instrument is in progress in a number of countries.

The WHOQOL is much more than a measure of health related quality of life. Furthermore, it is a health profile. The developers do not aim at establishing a scoring function that would allow the transformation of profiles into single index numbers.

Swedish Population Health Index (SPHI)

The SPHI is being developed by Hans Petterson at Statistics Sweden (Petterson, 1991). It combines survival (life expectancy) with health related quality of years of survival and is essentially a HALE. Health data are taken from the annual "Survey of Living Conditions," which has a sample of 7000 people and a response rate of 80 per cent. Three variables are used from this survey: Self perceived health, longstanding or chronic illness affecting ability to work, mobility. Scores on these variables are used to classify individuals in four health states: full health, slight illness, moderate illness and severe illness. These states have been assigned the values of 1.0, 0.9, 0.7 and 0.5 respectively on the basis of rough judgement.

There has been no collection of population preference data. The validity of the transformation of three dimensional profiles from the Survey of Living Conditions into one of four main health states is therefore unclear. Also, the health state valuations are without a clear empirical basis. Further development work has been postponed due to lack of funding.

The Quality of Well-Being Scale (QWB)

The QWB was developed by Kaplan and colleagues at the University of California in San Diego (Kaplan and Anderson, 1990). It is intended for both measuring population health and cost-effectiveness analysis. It has been in wide use for many years, particularly in the USA. It has a health-state classification system consisting of 3 dimensions of functioning (mobility, with 3 levels; physical activity, with 3 levels; social activity, with 5 levels) and 25 symptoms/problems complexes. Community surveys have been conducted in which respondents were asked to use a rating scale to indicate the disutility of a single day with each kind of dysfunction and symptom. ON the basis of these unidimensional disutility judgements, the value of any composite health state within the classification system can be determined on the standard 0-1 scale by means of a simple additive formula.

Completion time for the QWB questionnaire is approximately 10 minutes (John Anderson, personal communication 1991). Inter-day reliability coefficients and agreement coefficients of around 0.9 are reported for the descriptive system (Anderson et al., 1989). However, these results are questionable, as the data were collected in a retrospective manner rather than independently on different days. The sensitivity of the QWB is very high at the top end of the scale, due to its inclusion of 25 questions about symptoms and bodily impairments. While this may be seen as a virtue, it is also a weakness, inasmuch as it leads to a severe lack of upper end compression of the instrument's valuations, cfr. table 9.

The Rosser/Kind Index

The Rosser/Kind Index (Rosser and Kind, 1978) has been in wide use for many years, particularly in the United Kingdom. It covers 28 combinations of diability (7 levels) and pain/distress (4 levels) as well as death. The valuations were obtained by means of magnitude estimation. "No disability and mild distress" was chosen as a reference state. Each of the other states was scaled by asking a sample of 70 doctors, nurses, patients, and others "how many times more ill" a patient in that state would be than a patient in the reference state. To clarify the meaning of the question, subjects were asked to imagine that their answer would define the propotion f resources that should be allocated to the relief of each health state.

The time needed for a person to locate him/herself in the descriptive system of the Rosser and Kind Index is presumably only a minute. I am not aware of data on the reliability and sensitivity of the descriptive system.

The Index of Health-Related Quality of Life (IHQL)

The IHQL was developed by Rosser et al (1992). In its simplest form it has three dimensions: Disability, with 7 levels; physical discomfort, with 5 levels; emotional distress, with 5 levels. The instrument is thus an expansion of the Rosser and Kind Index. The standard gamble technique was used to value the 175 possible composite states. Potential users of the index may read values for all possible states directly from a three dimensional table. The IHQL also exists in a very complex version, in which each of the main domains is divided into a number of subscales. To my understanding, the IHQL is still at a developmental stage. The psychometric properties of its descriptive system are not clear. However, the instrument was reported to be in use in three different clinical studies in 1992.

Table of Contents


Conclusions



A number of synthetic indicators are available for assigning values to health states on a scale from zero (dead) to unity (healthy). The values may be used in estimating health adjusted life expectancy and in cost-effectiveness analysis. In general, the descriptive systems of the instruments have good feasibility and reliability. The descriptive systems differ greatly with respect to their conceptualization of health and their sensitivity to differences and changes in health. In selecting or constructing a new instrument for use in OECD health statistics, there are political choices to be made with respect to conceptualization of health, and trade-offs to be made between feasibility and sensitivity.

One strategy to consider, which might increase the chances of achieving agreement on a measurement standard across 29 member countries, is to adopt a simple descriptive system that focuses on two widely accepted, main dimensions of health, namely ability to perform activities of daily living and freedom of pain and discomfort. I recommend that this possibility be explored more closely by the OECD.

The health state values produced by the instruments may be used to express either trade-offs between quality of life and life expectancy or trade-offs between quality of life and life expectancy or trade-offs between trade-offs between different kinds of improvements in health (including avoiding death) and the number of people who get to enjoy the improvements. Data from studies of public preferences suggest that individuals are reluctant to sacrificing life expectancy to obtain better health, and that societies place very high value on life saving policies and procedures relative to health improving ones. Values provided by most existing synthetic indicators fail to encapsulate this preference structure.

To obtain synthetic indicators that are valid and at the same time sensitive at the top end of the health scale, health state valuations should possibly reflect the trade-offs that people would want to make between quality of life and life expectancy if they were given the choice between different possible life scenarios at birth, or trade-offs between benefits in terms of either increased life expectancy or increased quality of life. An alternative strategy is to regard health state values as numbers that are supposed to express trade-offs in terms of societal value rather than utility. The rationale for this is that the preference for the preservation of life itself may be somewhat less absolute when people are asked to prioritize between different health care programs in a budgeting context than when they are asked about their willingness to sacrifice own future life years or certainty of survival to be relieved of illness. Further reflection is needed on the meaning and purpose of synthetic indicators in cost-effectiveness analysis and measuring population health. However, I believe that researchers in the field can reach agreement on these issues and produce health state values that would be usable in OECD health statistics.

With the encouragement of the OECD, international collaborative empirical research could probably establish such values within three years.

Table of Contents


References



Anderson JP, Kaplan RM, Berry CC et al. Interday reliability of function assessment for a health status measure. Medical Care 1989, 27, 1076-1084.

Brooks R et al. EuroQol: The current state of play. Health Policy 1996,37, 53-72.

Feeny D, Furlong W, Boyle M, Torrance GW. Multi-attribute health status classification systems. Pharmacoeconomics 1995,7,490-502.

Fowler FJ, Cleary PD, Massagli MP et al. The role of reluctance to give up life in the measurement of the value of health states. Medical Decision Making 1995, 15,195-200.

Hadorn D. Large scale health outcomes evaluation: How should quality of life be measured? Journal of Clinical Epidemiology 1995,48,607-618.

Hawthorne G, Richardson J. An Australian multi-attribute utility: Rationale and preliminary results. Working paper 49. Melbourne: Centre for Health Program Evaluation, 1996.

Hunt SM, McKenna SP, McEwen J et al. A quantitative approach to perceived health status: A validation study. Journal of Epidemiology and Community Health 1980,34,281-286.

Kaplan RM, Anderson JP. A general health model: Update and applications. Health Services Research, 1988,23,203-235.

Morris J, Durand A. Category rating methods: Numerical and verbal scales. Mimeo. University of York: Centre for Health Economics. 1989.

Nord E. The validity of a visual analog scale in determining social utility weights for health states. International Journal of Health Planning and Management 1991,6, 234-242.

Nord E. Health status index models for use in resource allocation decisions. A critical review in the light of observed preferences for social choice. International Journal of Technology Assessment in Health Care. 1996,12,31-44.

Nord E. Time trade-off scores in patients with chronic disease. Comparison with the York hypothetical TTO tariff. Paper for the EuroQol Plenary Meeting, Oslo, October 1996.

O'Leary JF, Fairclough DL, Jankowski MK et al. Comparison of time trade-off utilities and rating scale values of cancer patients and their relatives. Medical Decision Making 1995,15132-137.

Petterson H. Use of Health Expectancies in Sweden. Paper presented at the 1st Meeting of Euro-REVES, Leiden, June 1995.

Richardson J. Cost-utility analysis: What should be measured? Social Science & Medicine 1994,39,7-21.

Richardson J, Nord E. The importance of perspective in the measurement of quality adjusted life years. Medical Decision Making 1997,17,33-41.

Roberge R, Berthelot JM, Wolfson MC. Adjusting life expectancy to account for morbidity in a national population. Quality of life Newsletter no 17. Lyon: Mapi Research Institute, 1997.

Rosser R, Kind P. A scale of valuations of states of illness: Is there a social consensus? International Journal of Epidemiology, 1978, 7,347-358.

Rosser R, Cottee M, Rabin R, Selai C. Index of health-related quality of life. In: Hopkins A (ed). Measures of the quality of life, and the uses to which they may be put. London: Royal College of Physicians of London. 1992.

Sherbourne C, Sturm R. Utility measures from the screener. Mimeo. Paper presented at RAND, Santa Monica, February 1997.

Sintonen H, Pekurinen M. A fifteen-dimensional measure of health-related quality of life (15 D) and its applications. In Walker SR, Rosser RM (eds). Quality of life assessment. Key issues in the 1990s. Dordrecht: Kluwer Academic Publishers, 1993.

Stavem K. Valuing theoretical versus own condition: A comparison using time trade-off and the EuroQol Instrument in COPD patients. Paper for the EuroQol Plenary Meeting, Oslo, October 1996.

Weinstein MC, Stason WB. Foundations of cost-effectiveness analysis for health analysis and medical practices. New England Journal of Medicine 1977,296,716-721.

WHOQOL Group. The World Health Organization Quality of Life Assessment (WHOQOL): Position paper from the World Health Organization. Social Science & Medicine 1995,41,1403-1409.

Williams A. Who is to live? A question for the economist or the doctor? World Hospitals 1987,13,34-45.

Table of Contents


Science Policy  |  Planning  |   Bioethics  |   Biomarkers  |  Economic Studies Program  |
Staff  |  Office of Science Policy  |  National Institutes of Health  |


Last updated on: June 02, 2000

We welcome your questions and comments. Please send general questions and comments to the Office of Science Policy Analysis Webmaster. Please read our Web site Disclaimer and Privacy Statement.