Introduction: Evaluation of IHC |
Wired for Health and Well-Being: The Emergence of Interactive Health CommunicationEditors: Thomas R. Eng, David H. Gustafson Suggested Citation: Science Panel on Interactive Communication and Health. Wired for Health and Well-Being: the Emergence of Interactive Health Communication. Washington, DC: US Department of Health and Human Services, US Government Printing Office, April 1999. Download in PDF format: [Entire Document] [References] Chapter IV. Evaluation of IHC Applications The Panel considers widespread evaluation as the primary mechanism to improve quality of IHC. Evaluation is the examination of evidence in a way that provides a full perspective on the expected quality, nature, experience, and outcomes of a particular intervention. The purpose of evaluation is to systematically obtain information that can be used to improve the design, implementation, adoption, re-design, and overall quality of an intervention or program. This chapter provides some fundamental background information about evaluation for developers and purchasers and others who may have to conduct evaluations or interpret evaluation results. Types of Evaluation The design and implementation of an evaluation typically depends on its purpose, the status of the intervention, and the type of decision the evaluation is intended to address (Rossi and Freeman, 1993). The process of evaluation can be defined in the following stages.
The formative, process, and outcome evaluation model might be amplified by another perspective derived from training evaluation. Five levels or facets of evaluation for IHC can be conceptualized (Figure IV-1).
From the perspective of many stakeholders, particularly purchasers and users, evaluation of a proposed health intervention may focus on the central question, "Does this intervention provide enough measurable positive outcomes to justify the cost?" There are no widely accepted standards for measuring outcomes and costs associated with IHC applications. The Panel on Cost-Effectiveness in Health and Medicine, however, recently developed a framework for cost-effectiveness analyses that is applicable for the assessment of any health intervention (PCEHM, 1996; Russell et al., 1996). Their technical guidance may be helpful for developers and evaluators of IHC applications. Outcomes to be measured for any intervention should include both benefits and harms associated with the intervention. In assessing the total costs of an application, it would be appropriate to include costs associated with any change in both health- and nonhealth-related resource uses. For IHC evaluations, both actual costs of pilot projects and projected costs of large-scale implementation of the application should be considered. Distinction Between Evaluation and Research Research and evaluation are components of a continuum of disciplined inquiry that are driven by different goals. Research generally has two types of goals: theoretical and empirical. Research with theoretical goals is intended to explain phenomena through the logical analysis of the results of scientific investigations and the synthesis of these results, along with theories and principles from other fields and original insights, to develop new or refine existing theory. Research with empirical goals is intended to determine how and why phenomena occurs by testing hypotheses related to theories, eventually leading to increased capacities to describe, predict, and control phenomena. Evaluation, however, generally has two different types of goals: formative and summative. Evaluation with formative goals is intended to support the development and improvement of innovative solutions to problems. Evaluation with summative goals is focused on estimating the effectiveness and worth of a particular program, product, or method for the purpose of making a decision about it in an applied setting. Typical decisions might be selection, purchase, certification, extension, or elimination. With a continuum of such goals in mind, the need to make sharp distinctions between research and evaluation is reduced. One issue that must be clarified is that rigor and discipline are not necessarily distinguishing features between research and evaluation. The research to evaluation continuum represents a shift from theoretical goals to goals that are more action-oriented. Research is generally focused on adding to the body of knowledge about phenomena, whereas evaluation is usually focused on solving particular problems; rigor and discipline are important aspects of both. Qualitative Methods and Statistical Process Control Evaluation methods often focus on the need to prove rather than explain an effect. Hence, resources are allocated toward large sample sizes and with a focus toward one- or two-time assessments of effect. Such strategies are appropriate for addressing stable applications whose effects need to be demonstrated beyond doubt. However, the field of IHC is evolving and the content and even structure of applications will change to keep up with new knowledge. Moreover, because IHC is in its infancy, the goals of evaluation should be to not only determine effectiveness but also to guide improvements. This implies the need for evaluative efforts that explain effect, offer guidance for improvement, and monitor the changing nature of IHC over time. Toward that end, it may be more valuable to monitor the impact over an extended period of time on a smaller sample of users and to invest resources in understanding why things happened as they did. Qualitative research methods and statistical process control may be important resources for such evaluation strategies. Qualitative research relies on observation and interviews with stakeholders to better understand the underlying causes of success or failure. This understanding could be very important as ongoing improvements to the application are made. Statistical process control provides a strategy to monitor application performance over time and to identify when the application is moving out of control. Such techniques could help monitor the dynamic nature of electronic support groups and identify whether discussions are having detrimental effects. It could also be useful to detect when there are significant changes in use patterns that may warrant further examination or even intervention. However, these strategies also could be used to assess the effectiveness of IHC applications. Because the applications are of such a dynamic nature and their impact may be a cumulative one, the goal may not be to conclude beyond a doubt at one point in time but beyond a reasonable doubt across the life span of the application. Potential Benefits of Evaluation From the perspective of potential stakeholders of IHC applications, the potential benefits of widespread evaluation include the following (Eng et al., 1999):
Developers of IHC applications also may benefit substantially from adopting a norm of evaluation (Henderson et al., 1999). From their perspective, evaluation may improve their chances of success in the following ways:
Psychosocial Theories and Models and Evaluation of IHC The psychosocial theories and models summarized in Chapter III can be utilized in evaluations of IHC applications. For example, researchers have examined whether appropriate matching (tailoring) of psychosocial concepts to individuals influences behavior change and informed decisionmaking more than providing unmatched concepts (Curry et al., 1991; Velicer et al., 1993; Campbell et al., 1994; Skinner et al., 1994; Strecher et al., 1994; Brug et al., 1996, in press; Shiffman et al., 1997) or deliberately mismatched communications (Dijkstra et al., in press). These outcome evaluations provide information related to whether the overall approach was successful. It should be possible to determine whether IHC applications are influencing targeted psychosocial concepts and whether these applications are moving individuals through the maps laid out by theory-builders. Assessing the concepts either before or as part of the IHC application, followed by post-treatment assessment of the same concepts, allows evaluators to examine changes in the concepts targeted by the application. For example, if perception of ones risk is viewed as an important factor in health-related behavior change, then it should be possible to determine whether the application is influencing this concept. In turn, it should be possible to determine whether changes in risk perception influence changes in the targeted behavior. Many current psychosocial theories are sufficiently organized to hypothesize the relevance of a construct based on the specific state of the individual. For example, risk perception, in a number of models, would be more relevant to an individual not interested in changing a health-related behavior than to an individual ready to make the change. Other concepts, such as self-efficacy, become more relevant once the person is interested in making a change (Velicer et al., 1985; Bandura, 1986; Weinstein, 1988; Prochaska et al., 1992; Strecher and Rosenstock, 1998; Dijkstra et al., in press). Assessment of motivational stage has been an important method of framing a broad spectrum of behavior change interventions (Velicer et al., 1985; Prochaska et al., 1992). Evaluative efforts could, in turn, determine whether the individual moves through these stages of change as a result of the IHC application. Standard evaluations of outcomes determine whether an application works. Evaluations that examine intermediate, psychosocial concepts linked with a conceptual framework of the IHC application determine why an intervention did or did not work. Both are important as more powerful, relevant applications are developed. Understanding how and when to measure intermediate psychosocial processes requires an understanding of the relevant theories and the psychometric properties of the concepts within these theories. For this reason, it is important that individuals with expertise in behavior change and decisionmaking theories become more involved at the earliest stages of IHC application development. Explicit development of conceptual frameworks guiding the content of the program may lead to stronger applications and improve the quality of evaluations for IHC applications. Link Between Application Development and Evaluation Evaluation of IHC is an ongoing process that begins during the product development cycle and continues for the life of the product. Given the highly dynamic state of IHC, development and accompanying evaluations would never really end because content will become outdated and new technology-based approaches and delivery methods will emerge. In addition, there is a role for evaluation even after an evaluated product has been in the marketplace for a period of time. As with drugs and medical devices, post-marketing surveillance data can alert developers and policymakers to potential harm associated with product use that may not have been detected in initial evaluations among limited study populations. It is helpful for developers to understand the relationship between development and evaluation activities during the product development cycle. An inventory of potential application development and evaluation activities is presented in Table IV-1. At each stage of application development, from conceptualization and design to assessment and refinement, there is a series of evaluation activities that are relevant and should be considered. An array of evaluation methods and tools can be used to implement these evaluation activities. As illustrated by Table IV-1, there may be some overlap between development and evaluation activities. Ideally, an evaluation plan should be formulated at the conception of an application. User needs and the objectives of the application should be clearly specified prior to implementation. Identifying intended effects helps define the outcomes of interest and the appropriate evaluation design to measure outcomes. Needs assessment is one of the initial stages of evaluation and the results of this analysis help determine product specifications. Evaluations during product development include component testing to ensure that all aspects of the system perform accurately and meet design specifications. Iterative usability testing to ensure that the product meets the needs of potential users with regard to usability and the facilitation of workflow or tasks is critical. Experience has shown that several 1- to 2-hour sessions where individual learners are observed as they use an IHC application, and then are personally interviewed, can provide accurate usability feedback. Just four or five participants can provide sufficient information to complete a study of an application. Because of the small number of participants, this approach is more easily arranged than those with larger groups, and can be completed in one to three days depending on the facility and personnel available. If there is sufficient funding, IHC designers should utilize the services of professional usability testers. If funding is modest, designers may choose to conduct their own usability testing using portable usability lab equipment. When conducting one-on-one usability studies, it is very helpful to maintain a relaxed and informal atmosphere that encourages both negative and positive participant feedback. Without proper rapport, participants will likely be less open and may unintentionally invalidate the study. Developers should realize, however, that a usability lab may be much more of a controlled environment than the home. With experience, any developer can learn the skills necessary to conduct usability testing at the minimum level of formality required to obtain strong evidence that can be used to improve an application. The next stage of evaluation is to measure outcomes during system use. At this stage, conducting a pilot evaluation to work out the implementation details of the evaluation and assessment tools is often helpful. Quite often, there are obvious misunderstandings of terms or unanticipated barriers that can be corrected before beginning the larger, more complete study. Because evaluation of IHC applications should be a continuous process, there is no "final" stage of evaluation. For many IHC applications, a long-term commitment to a process of updating and revision with ongoing quality-assurance evaluations is required.
Partially adapted from: National Cancer Institute. Making Health Communication Programs Work. Bethesda, MD: National Institutes of Health, US Department of Health and Human Services. NIH Publication no. 89-1493, April 1989. Original version published in: Eng TR, Gustafson D, Henderson J, Jimison H, Patrick K, for the Science Panel on Interactive Communication and Health. Introduction to evaluation of interactive health communication applications. Am J Prev Med 1999;16:10-15. Challenges of Evaluating IHC Applications It would be misleading to suggest that high-quality evaluations of applications will be conducted if only developers would simply decide to do so. Indeed, there are several challenges to evaluation of IHC applicationssome technical and some related to external forcesthat will need to be addressed. High-quality evaluations will require careful planning and implementation, along with consideration of the following factors:
Evaluation Criteria A number of organizations and individuals have published, and, in some cases, implemented criteria for evaluating the appropriateness or quality of health-related and other Web sites (Jadad and Gagliardi, 1998; Pealer and Dorman, 1997). Some of these criteria are the basis for tools used to produce a summary rating or grade to help potential users assess the site. There are literally dozens of criteria proposed in the literature (Kim et al., 1999), many of which are closely related. In selecting and prioritizing criteria to use in evaluating IHC applications, developers and other evaluators often will consider many factors, including the objectives of application and the preferences and values of the evaluator and potential users.6 After identifying relevant criteria, the relative weights assigned to each criterion may vary depending on the application. For example, for an application that provides information about clinical trials to the general public, accuracy and appropriateness of content may receive relatively heavy weighting. In contrast, evaluators of an application that focuses on enhancing peer support for a chronic health condition among a disabled population may choose to emphasize the usability of the program. For general purposes, key criteria for evaluation that can be applied to most programs include (Henderson et al., 1999):
Standards of Evidence Much of the controversy in the field of evaluation has to do with standards of evidence. An understanding of this concept is helpful in interpreting evaluation results. Two central concepts are the reliability and validity of the evaluation. Reliability and Validity Reliability can be seen as repeatability: If one asks the same question of the same people repeatedly, would he or she get the same answer? Poor reliability makes it much more difficult to measure the effect of an intervention. Thus, it is very important in evaluations to be certain that what one is asking is understood fully by those who are being asked, and that they can provide dependable or reliable answers. The validity of evaluation findings can be viewed as the truthfulness of the findings. Do the measures really reflect what is intended to be measured? Are the findings correct, or are they an aberration? Are they meaningful in this context? There are two types of validity: internal and external. Internal validity is the validity of the findings within the study itself. External validity is the validity of applying the findings to other situations. External validity often is referred to as "generalizability." If the people who tested a program liked it, will everyone else who uses it have the same overall reaction? Can the results obtained with the study sample be generalized to other groups? Generalizability can be critically important because, in some situations, developers rely on the findings or results obtained by others. For example, if tailoring improves message impact in similar settings, it may be more appropriate for a developer simply to adopt a proven approach rather than to conduct additional evaluations. Judging Effect: Statistical Significance and Effect Size Many evaluators emphasize the statistical significance of outcome findings, and some may conduct statistical tests on a variety of outcomes hoping to find a statistically significant result. Although statistical significance is an important measure of intended effect, it can be over-emphasized. The key concepts underlying statistical significance are as follows: To what degree are we confident that the results did not occur by chance? Is there really a connection between use of the program and the outcomes? What are the chances that the outcomes really are due to the intervention, rather than due to chance and chance alone? The traditional metric of scientific studies is a p-value less than 0.05, which simply means that no more than 5 percent of the time, or 1 in 20 times, would one expect a given result to occur by chance. In other words, there is at least a 95 percent probability that the outcomes occurred because of the program rather than by chance. Reporting absolute probabilities often may be helpful. Statistical significance depends greatly on the size of the study sample (i.e., the number of participants in the evaluation). A larger sample size and/or a larger effect size both contribute to greater statistical significance. When judging the usefulness of an IHC application, effect size often is a more important concern. Effect size is used to describe the magnitude of impact the intervention has on its users. For example, for a program that encourages diabetics to monitor their blood sugar more carefully, just how much more (or less) carefully do they do it after using the program? If an application is designed to decrease utilization of a service, to what extent do users of the program utilize that service less (or more) than people who did not use the program? While the statistical significance of results is important, it may be more meaningful to know how strongly it affected the users. Therefore, effect size should be considered along with statistical significance in evaluating outcomes. What is a reasonable standard of evidence for IHC applications? Subjecting all IHC applications to randomized-controlled trials is neither practical nor appropriate. Although such trials produce the strongest evidence, they are not suitable for all interventions or for all stages of product development and dissemination. Developers face the challenge of balancing the need to conserve limited resources with protecting the safety of users and ensuring that the program is effective. One reasonable approach is to match the level of evaluation to the intended purposes of the application and the resources it consumes. That is, in the case of applications that have substantial potential risk or require a large investment, it seems appropriate to demand a higher level of evidence, such as an appropriately designed and implemented randomized-controlled trial. The level of confidence in the evidence of safety and efficacy for such interventions (e.g., shared decision support applications for serious illnesses) should be "beyond a reasonable doubt." However, for interventions that have minimal potential risk and require few resources (e.g., Web sites that provide general information from trusted and reliable sources), formative and process evaluations may be sufficient to provide a "preponderance of evidence" indicating that the application will be beneficial to users. In addition, evaluation methods, such as interviews and focus groups, often may provide important insights and understanding of how an application may benefit users as randomized-control trials. Standardized Reporting of Evaluation Results Prior to the Panels work, there were no models for standardized reporting of evaluation results for IHC applications. As a first step toward promoting appropriate evaluation and disclosure about IHC applications, the Panel developed an "evaluation reporting template" (Appendix A) and a "disclosure statement" (Appendix B) to serve as a guide for reporting essential information and the results of any evaluations about a specific IHC application (Robinson et al., 1998). The template is based on the rationale that all applications should undergo some level of evaluation, and that the results of such evaluations should be available to potential users and purchasers of the application. Disclosure of such information may enable potential users and purchasers and others to judge the appropriateness of a given IHC application for their needs and compare one application with another. The notion of disclosure of information about IHC applications is similar to the common practice of disclosing information about the use of a potential intervention or consumer product. Examples of this practice include health professionals informing patients about the risk and benefits of potential treatment options or experimental trials (Rodwin, 1989), and manufacturers disclosing product information (e.g., automobile specifications, nutritional content analyses) that may be critical to a potential buyers decision. In developing the template, the Panel identified a critical set of information that would help inform decisions about use and purchase and also would apply to essentially all IHC applications, regardless of the specific technologies or communication strategies employed or the goals of the program. Some developers may find addressing all the elements of the template to be somewhat overwhelming but not all IHC applications need to be evaluated in all of the categories specified in the template. To the contrary, evaluation targets should reflect the specific needs of the target audience and the objectives of the developer. The Panel believes that all IHC stakeholders can benefit from a voluntary standard of reporting evaluation results. This template and its future versions can: 1) assist developers plan, conduct, and report the results of their evaluations; 2) help users determine which applications are most likely to benefit them given their particular needs; 3) assist clinicians in selecting relevant applications for their patients; and 4) help purchasers, investors, and policymakers focus on the most promising applications and strategies for investment and dissemination. Will developers of IHC applications voluntarily disclose information about their products? As mentioned previously, there are several benefits to developers who conduct evaluations. With increased awareness among users and purchasers about the possibility of harmful effects or no effect from IHC applications, these groups will increasingly seek information about an application before using or purchasing it. If the current leaders in IHC development begin the process of public disclosure of information about their products, market forces may pressure other developers to follow. Although version 1.0 of the template arose from an extensive multiyear development effort, additional refinement is necessary, and the template will need to be updated as it is used and the field evolves. As with all instruments of this type, deficiencies will be identified and improvements can be made as the template and disclosure statement are circulated to, and used by, wider audiences. 5 There is limited
scientific research on the impact that public release of evaluation results of goods and
services has on subsequent sales, but anecdotal reports suggest that products rated highly
by Consumer Reports tend to sell better and low-rated products decrease in sales
(Shapiro, 1992; Kelly, 1994; Eldridge, 1997).
Return to Table of Contents
Comments: SciPICH@nhic.org Updated: 05/01/08 |
|||||||||||||||||||||||||||||||||||||