SECTION 3 | ||||||
M ETHODS OF EVALUATIONQualitative Methods Introduction Personal Interviews Focus Groups Participant-Observation General Information Quantitative Methods Introduction Counting Systems Surveys Experimental and Quasi-Experimental Designs Factors To Be Eliminated as Contributors to Program ResultsSchematics for Experimental and Quasi-Experimental DesignsExamples of Experimental Designs Examples of Quasi-Experimental Designs Converting Data on Behavior Change into Data on Morbidity and MortalityConverting Data on Behavior Change into Data on Cost SavingsSummary of Quantitative Methods Tables Qualitative Methods of Evaluation 3. Advantages and Disadvantages of Methods of Administrating Survey Instruments4 . Relative Risk for Death or Moderate-to-Severe Injury in a Car Crash5. Quantitative Methods Used in Evaluation
|
||||||
|
||||||
METHODS
OF EVALUATION I NTRODUCTIONBecause qualitative methods are open-ended, they are especially valuable at the formative stage of evaluation when programs are pilot testing proposed procedures, activities, and materials. They allow the evaluator unlimited scope to probe the feelings, beliefs, and impressions of the people participating in the evaluation and to do so without prejudicing participants with the evaluator’s own opinions. They also allow the evaluator to judge the intensity of people’s preference for one item or another.Qualitative methods are also useful for testing plans, procedures, and materials if a problem arises after they are in use. Using these methods, evaluators can usually determine the cause of any problem. Armed with knowledge about the cause, program staff can usually correct problems before major damage is done.For example, let us say you put an advertisement in the local newspaper offering smoke detectors to low income people. Not as many people respond as you expected, and you want to know why. Conducting formative evaluation using qualitative methods will usually reveal the reason. Perhaps the advertisement cannot be understood because the language is too complex, perhaps your target population seldom reads newspapers, perhaps most people in the target population cannot go to the distribution location because it is not on a public transportation line, or perhaps the problem is due to some other factor. Whatever the cause, once you learn what the problem is, you are in a position to remedy it.In this section, we describe three methods of conducting qualitative research: personal interviews, focus groups, and participant-observation. Each has advantages and disadvantages.P ERSONAL INTERVIEWSIn-depth personal interviews with broad, open-ended questions are especially useful when the evaluator wants to understand either 1) the strengths and weaknesses of a new or modified program before it is in effect or 2) the cause of a problem should one develop after the program is in effect. Relatively unstructured personal interviews with members of the target population allow interviewees to express their point of view about a program’s good and bad points without being prejudiced by the evaluator’s own beliefs. Open-ended questions allow interviewees to focus on points of importance to them, points that may not have occurred to the evaluator. Personal interviews are particularly important when the target population differs in age, ethnicity, culture, or social background from program staff and when the program staff has a different professional background from those directing the program. Through the interview, the interviewee becomes a partner in, rather than the object of, the evaluation.5The interviewer's objective is to have as much of the conversation as possible generated spontaneously by the interviewee. For this reason, interviewers must avoid questions that can be answered briefly.Personal interviews are the most appropriate form of qualitative evaluation when the subject is sensitive, when people are likely to be inhibited speaking about the topic in front of strangers, or when bringing a group of people together is difficult (e.g., in rural areas).Personal interviews should be audiotaped and transcribed verbatim. Most commonly, evaluators analyze the results of personal interviews by looking through the transcripts for insightful comments and common themes. They then give a written report to program management. Thus, the interviewees’ words become the evaluation data with direct quotes serving as useful supporting evidence of the evaluators’ assessments.Examples of open-ended questions to ask during personal interviews begin on page 76. See also the focus groups questions (page 81), many of which are suitable for personal interviews.F OCUS GROUPSFocus groups serve much the same function as personal interviews. The main difference is that, with focus groups, the questions are asked of groups. Ideally these groups comprise four to eight people who are likely to regard each other as equals.6 A feeling of equality allows all members of the group to express their opinions freely. Focus groups have an advantage over individual interviews because the comments of one participant can stimulate the thoughts and ideas of another. You must conduct several focus groups because different combinations of people yield different perspectives. The more views expressed, the more likely you are to develop a good understanding of whatever situation you are investigating.As with personal interviews, focus-group discussions should be audiotaped and transcribed verbatim. The evaluator looks for insightful comments and common threads both within groups and across groups and uses direct quotes as the evaluation data. Also as with personal interviews, evaluators analyze the data and prepare a written report for program management. Many of the same questions may be used for personal interviews and for focus groups.On page 81 are examples of questions that might be used with focus groups during formative evaluation of a program.P ARTICIPANT ANT-OBSERVATIONEvaluation by participant-observation involves having members of the evaluation team participate (to the degree possible) in the event being observed, look at events from the perspective of a participant, and make notes about their experiences and observations. Aspects to observe include physical barriers for participants, smoothness of program operation, areas of success, and areas of weakness. Observers should be unobtrusive and ensure that their activities do not disrupt the program. They should be alert, trained in observational methods, and aware of the type of observations of greatest importance to the program evaluation.Participant-observation is particularly valuable to the study of behavior for several reasons:
A major disadvantage of participant-observation is that it is time consuming for the evaluator. Examples of events to observe begin on page 89. G ENERAL INFORMATIONWho To Interview, Invite to Focus Groups, or Observe: If you are evaluating your program’s methods, procedures, activities, or materials, select people similar to those your program is trying to reach. Indeed, you could even select members of the target population itself, if that is possible.If you are conducting formative evaluation because a large group of people dropped out of the program or refused to join the program, then select people from that group to interview, observe, or invite to focus groups. They are the people most likely to provide information about aspects of the program that need correction.Number of People To Interview, Focus Groups To Conduct, or Events To Observe: The number depends on the size and diversity of the target population.7 The larger and more diverse the target population, the more interviews, focus groups, or observations are needed. In all cases, the more interviews, observations, or focus groups you conduct, the more likely you are to get an accurate picture of the situation you are investigating.Trained Evaluator: For several reasons, all qualitative evaluation must be conducted by people trained in the particular method (interview, focus group, or participant observation) being used:
See Table 2 for a summary of qualitative methods of evaluation, including the advantages and disadvantages of each.
|
||||||
Table 2. Qualitative Methods of Evaluation |
||||||
Method |
Purpose |
Number of People eople To Interview or Events To Observes |
Resources Required |
Advantages |
Disadvantages |
|
|
||||||
Personal Interviews
|
To have individual, open-ended discussion on a range of issues.To obtain in-depth information on an individual basis about perceptions and concerns.
|
The larger and more
diverse the target
population, the more people
must be interviewed.
|
Trained interviewers Written guidelines for interviewer Recording equipment A transcriber A private room
|
Can be used to discuss sensitive subjects that interviewee may be reluctant to discuss in a group. Can probe individual experience in depth. Can be done by telephone.
|
Time consuming to conduct interviews and analyze data. Transcription can be time-consuming and expensive. Participants are one-on-one with interviewer, which can lead to bias toward "socially acceptable" or "politically correct" responses. |
|
|
||||||
Focus Groups
|
To have an open-ended group discussion on a range of issues. To obtain in-depth information about perceptions and concerns from a group.
|
4 to 8 interviewees
per group.
|
Trained moderator(s) Appropriate meeting room Audio and visual recording equipment
|
Can interview many people at once. Response from one group member can stimulate ideas of another.
|
Individual responses influenced by group. Transcription can be expensive. Participants choose to attend and may not be representative of target population. Because of group pressure, participants may give "politically correct" responses. Harder to coordinate than individual interviews. |
|
|
||||||
Participant- Observation
|
To see firsthand how
an activity operates.
|
The number of events
to observe depends on the
purpose. To evaluate
people’s behavior during a
meeting may require
observation of only one event
(meeting). But to see
if products are installed
correctly may require observation
of many events (installations).
|
Trained observers
|
Provides firsthand knowledge of a situation. Can discover problems the parties involved are unaware of (e.g., that their own actions in particular situations cause others to react negatively). Can determine whether products are being used properly (e.g., whether an infant car seat is installed correctly). Can produce information from people who have difficulty verbalizing their points of view. |
Can affect activity being observed. Can be time consuming. Can be labor intensive.
|
|
|
||||||
Q UANTITATIVE METHODSI NTRODUCTIONQuantitative methods are ways of gathering objective data that can be expressed in numbers (e.g., a count of the people with whom a program had contact or the percentage of change in a particular behavior by the target population). Quantitative methods are used during process, impact, and outcome evaluation. Occasionally, they are used during formative evaluation to measure, for example, the level of participant satisfaction with the injury prevention program.Unlike the results produced by qualitative methods, results produced by quantitative methods can be used to draw conclusions about the target population. For example, suppose we find that everyone in a focus group (randomly selected from bicyclists in the target population) wears a helmet while riding. We cannot then conclude that all bicyclists in the target population wear helmets. However, let’s say that, instead of a focus group, we conducted a valid survey (a quantitative method) and found that 90% of respondents wear helmets while bicycling, we could then estimate that the percentage of bicyclists who wear helmets in the target population is in the 85% to 95% range.Next we will explain four quantitative methods: counting systems, surveys, experimental designs, and quasiexperimental designs. We will also describe a method for converting quantitative data on changes in behavior by the target population into estimates of changes in morbidity and mortality (page 64) and into estimates of financial savings per dollar spent on your program (page 66). C OUNTING SYSTEMSA counting system is the simplest method of quantifying your program’s results and merely involves keeping written records of all events pertinent to the program (e.g, each contact with a member of the target population or each item distributed during a product-distribution program). Counting systems are especially useful for process evaluation (see page 27). Simply design and use forms on which you can record all pertinent information about each program event (see Appendix B for sample forms).S URVEYSDescription: A survey is a systematic, nonexperimental method of collecting information that can be expressed numerically.Conducting a Survey: Surveys may be conducted by interview (in person or on the telephone) or by having respondents complete, in private, survey instruments that are mailed or otherwise given to them. Which method to use is determined by the objectives of the survey. For example, if you want to survey businesses or public agencies, the telephone may be best because staff from those organizations are readily accessible by telephone. On the other hand, if you want to survey people who received a free smoke detector, personal visits to their homes may be best since many people in poor areas do not have telephones. In this example, personal visits also have the advantage of allowing you to observe whether the smoke detectors are installed and working properly.Response rates are generally highest for personal interviews, but telephone and mail surveys allow more anonymity. Therefore, respondents are less likely to bias their responses toward what they believe to be socially acceptable or "politically correct." Telephone surveys are the quickest to conduct and are useful during the development of a program. However, households with telephones are not representative of all households. Indeed, the people we most want to reach with public health programs are often the people most likely not to have telephones.Purpose of Surveys: While a program is under development, surveys have several uses:
Selecting the Survey Population: Who to survey depends in part on the purpose of the survey. To evaluate the level of consumer satisfaction with the program, the survey population may be selected from among those who use the program. To learn about barriers that prevent people from using the program, select a survey population from among people who are eligible to use the program but do not. Before the program is in effect, select from a representative sample of the entire target population to determine what they like or dislike about the program’s proposed procedures, materials, activities, and methods.In all cases, you will need a complete list of the people or households targeted by the program. Such a list is called a sampling frame. From the sampling frame, you may select the people to be surveyed using statistical techniques such as random sampling, systematic sampling, or stratified sampling. You must use stratified sampling if you want a representative sample of both those who participate in the program and those who do not. A full discussion of sampling techniques is outside the scope of this book. However, several textbooks (e.g., Measurement and Evaluation in Health Education and Health Promotion2) can provide you with information on sampling methods.Survey Instruments: A survey instrument is the tool used to gather the survey data. The most common one is the questionnaire. Other instruments include checklists, interview schedules, and medical examination record forms.Methods for Administering Survey Instruments: Before designing a survey instrument, you must decide on the method you will use to administer it because the method will dictate certain factors about the instrument (length, complexity, and level of language). For example, instruments designed to be completed by the respondent without an interviewer (i.e., self-administered) must be shorter and easier to follow than those to be administered by a trained interviewer.There are three methods for administering survey instruments: personal interview, telephone interview, or distribution (e.g., through the mail) to people who complete and return the questionnaire to the program. The advantages and disadvantages of each method are laid out in Table 3.The best method to use depends on the purpose of the evaluation and the proposed respondents to the survey. Let’s say, for example, you want to evaluate a training program. If class participants have a moderate level of education, having them complete and return a questionnaire before they leave the classroom is clearly the least expensive and most efficient method. On the other hand, if class participants have problems reading, a questionnaire to be completed in class would not be useful, and you may need to conduct personal interviews.Likewise, if you are evaluating a program to distribute smoke detectors in a well-defined, low-income housing area, you may need to interview. In this case, face-to-face would be better than telephone interviews, since income is an issue and some poor people do not have telephones. |
||||||
Table able 3. Advantages and Disadvantages of Methods of Administrating Survey Instruments |
||||||
Method | Advantages | Disadvantages | ||||
|
||||||
Personal interviews
|
Least selection bias: can interview people without telephones—even homeless people. Greatest response rate: people are most likely to agree to be surveyed when asked face-to-face.8 Visual materials may be used. |
Most costly: requires trained interviewers and travel time and costs. Least anonymity:
therefore, most likely that
respondents will shade their
responses toward what they
believe is socially
acceptable. |
||||
|
||||||
Telephone interviews
|
Most rapid method. Most potential to control the quality of the interview: interviewers remain in one place, so supervisors can oversee their work. Easy to select telephone numbers at random. Less expensive than personal interviews. Better response rate than for mailed surveys. |
Most selection bias: omits homeless people and people without telephones. Less anonymity for respondents than for those completing instruments in private. As with personal interviews, requires a trained interviewer.
|
||||
|
||||||
Instruments to be completed
by respondent
| |
Most anonymity: therefore, least bias toward socially acceptable responses. Cost per respondent varies with response rate: the higher the response rate, the lower the cost per respondent. Less selection bias than with telephone interviews.
|
Least control over quality of data. Dependent on respondent’s reading level. Mailed instruments have lowest response rate. Surveys using mailed instruments take the most time to complete because such instruments require time in the mail and time for respondent to complete. |
||||
|
||||||
General Guidelines for Survey Instruments: When designing a survey instrument, keep in mind that it must appeal as much as possible to the people you hope will respond:
Steps Involved in Designing Survey Instruments: Instrument design is a multistep process, and the steps need to be done in order.1. Clearly define the population you want to survey. (See page 15, A Description of the Target Population.)2. Choose the method you will use to administer the survey. (See page 46 for more information.)3. Develop the survey items meticulously. Survey items are the questions or statements in the survey. Items that are closed-ended are easiest for respondents to complete and least subject to error. Closed-ended items are multiplechoice, scaled, or questions answerable by yes or no or by true or false (See page 104 for examples.)4. Put items in correct order. Begin with the least sensitive items and gradually build to the most sensitive. Respondents will not answer sensitive questions until they are convinced of the survey’s purpose and have developed a rapport with the "person behind the survey" (the person or group they believe is requesting the information).Demographic questions such as those about age, education, ethnicity, marital status, and income can be sensitive. For this reason, these questions should be at the end. Not only are they more likely to be answered then, but when a survey has solicited intimate or emotional information, the demographic questions draw respondents’ attention away from the survey’s subject matter and back to everyday activities.Survey items should progress from general to specific, which eases respondents into a subject and therefore increases the likelihood that they will answer and do so accurately and truthfully. If the survey instrument covers several subjects (e.g., seatbelt use, speeding, and driving while intoxicated), the survey items for each subject should be grouped together, again progressing from general to specific within each group. Put the least sensitive subject first and the most sensitive last.5. Give the survey instrument an appropriate title. This step is particularly important for survey instruments to be completed by the respondent, since the title is the respondent's first impression of the group collecting the information. To increase the number of responses you get, emphasize the importance of the survey in the title and show any relationship between your injury prevention program and the people you want to respond to the questionnaire. Examples of good titles are "Survey of the Health Needs of Our Community" and "Survey of Your Level of Satisfaction with Our Services."6. Assess the reliability of the survey instrument. This step involves measuring the degree to which the results obtained by the survey instrument can be reproduced. Assess reliability by one of three methods: 1) determine the stability of the responses given by a respondent, 2) determine the equivalence of responses by one respondent to two different forms of the questionnaire, or 3) determine the internal consistency of the instrument, which is the degree to which all questions in the questionnaire are measuring the same thing.Following are details on the three methods:
7. Assess the validity of the survey instrument. Validity is the degree to which the instrument measures what it purports to measure. For example, how well data on seatbelt use gathered from questionnaires completed by respondents agree with actual seatbelt use reflects the questionnaire’s degree of validity. Clearly, if data produced by responses to a questionnaire—in this example, the extent of self-reported seatbelt use—cannot be reproduced using a more direct method of gathering data (e.g., counting the number of people who are actually wearing seatbelts), then the questionnaire is not valid. There are three main types of validity: face validity, content validity, and construct validity.
If no related survey instruments exist, establish construct validity through hypothesis testing. For example, if you developed a survey instrument to determine how often people exceed the speed limit, you could hypothesize that people who most frequently exceed the speed limit are likely to have more traffic citations than people who do not often exceed the speed limit. You could then gather traffic citation data and determine whether the people identified by the survey instrument as the most frequent speeders had more citations, as hypothesized.8. Pilot test the survey instrument. Before an instrument can be used on the entire target population, you must pilot test it on a group of people similar to the target population or, preferably, on a small group within the target population. The purpose is to determine whether the survey instrument is effective for use with the people who are potential respondents. The evaluator’s job is to find out if any survey items are confusing, ambiguous, or phrased in language unfamiliar to the intended audience. The evaluator will also determine if certain words differ in meaning from one ethnic group to the next and if certain questions are insensitive to the feelings of many people in the target population.If the survey instrument is not significantly modified as a result of the pilot test (a rare event), the information gathered from the people who participated in the pilot test can be added to the information obtained from the people in the full survey. 9. Modify. At each step of the design, modify survey items and the survey instrument itself on the basis of information gathered at that step, particularly information gathered during the pilot test.Many good references are available on the design of survey instruments (see "Bibliography," page 117).E XPERIMENTAL AND QUASI-EXPERIMENTAL DESIGNSIntroduction: In this section, we discuss research designs that you can use during several stages of evaluation:
How you operate your program will be influenced by how you plan to evaluate it. If you use an experimental or quasiexperimental design, impact and outcome evaluation will be a breeze because, in effect, you will be operating and evaluating the program at the same time. Experimental Designs: The best designs for impact and outcome evaluation are experimental designs. Evaluation with an experimental design produces the strongest evidence that a program contributed to a change in the knowledge, attitudes, beliefs, behaviors, or injury rates of the target population. The key factor in experimental design is randomization: evaluation participants are randomly assigned to one of two or more groups. One or more groups will receive an injury intervention, and the other group(s) will receive either no intervention or a placebo intervention. The effects of the program are measured by comparing the changes in the various groups’ knowledge, attitudes, beliefs, behaviors, or injury rates. Randomization ensures that the various groups are as similar as possible, thus allowing evaluators of the program’s impact and outcome to eliminate factors outside the program as reasons for changes in program participants’ knowledge, attitudes, beliefs, behavior, or injury rates. See "Factors To Be Eliminated as Contributors to Program Results" (page 54) for a full discussion. Difficulties with Experimental Designs: Although experimental designs are ideal for program evaluation, they are often difficult—sometimes impossible—to set up. The difficulty may be due to logistical problems, budgetary limitations, or political circumstances.To demonstrate the difficulties, let us consider the example of introducing a curriculum on bicycle safety for third graders at a certain school. Selecting children at random to participate in the program would cause many problems, including the following:
In addition, evaluation of the program’s effectiveness would be compromised if children in the safety class shared information with the children who were not in the safety class. Another difficulty with experimental designs is that participants must give their informed consent. People who willingly agree to participate in a program in which they may not receive the injury intervention are probably different from people in the general population. Therefore, program effects shown through evaluation involving randomized studies may not be generalizable (i.e., they may not reflect the probable effects for all people). For example, let us suppose you want to test how effective a bicycle rodeo is at getting bicyclists to wear helmets. You ask a random sample of 500 children who do not own bicycle helmets to attend a bicycle rodeo you have organized for the following Saturday morning. Let’s say, 300 agree to go. The 200 who do not agree are probably different from the 300 who do agree: perhaps the 200 who do not agree have other activities on Saturday morning (if they are poor, they may work; if they are rich, they may go horseback riding), or they may be rebellious and refuse to listen to adults, or they may believe bicycle helmets and bicycle rodeos are not "cool," or they may have some other reason. Whatever the reason, it makes those who refuse to participate in the study different from those who agree. And because of that difference, the results of your study will not be generalizable to the whole population of children who do not wear bicycle helmets. Quasi-Experimental Designs: Because of the difficulties with experimental designs, programs sometimes use quasiexperimental designs. Such designs do not require that participants be randomly assigned to one or another group. Instead, the evaluator selects a whole group (e.g., a thirdgrade class in one school) to receive the injury intervention and another group (e.g., the third-grade class in a different school) as the comparison or control group.As an alternative, if a suitable comparison group cannot be found, the evaluator could take multiple measurements of the intervention group before providing the intervention.When using quasi-experimental designs with comparison groups, evaluators must take extra care to ensure that the intervention group is similar to the comparison group, and they must be able to describe the ways in which the groups are not similar. F ACTORS TO BE ELIMINATED AS CONTRIBUTORS TO PROGRAM RESULTSEvents aside from the program can produce changes in the knowledge, attitudes, beliefs, and behaviors of your program’s target population, thus making your program seem more successful than it actually was. Therefore, anyone evaluating an injury prevention program’s success must guard against assuming that all change was produced by the program. Experimental designs minimize ( i.e., decrease to the least possible amount) the effects of outside influences on program results; quasi-experimental designs reduce those effects.The two main factors evaluators must guard against are history and maturation.History: What may seem like an effect produced by your program, an apparent impact, may often be more accurately attributed to history if the people who participate in your program are different from those who do not. For example, suppose you measured bicycle-helmet use among students at a school that had just participated in your injury-prevention program and also at a school that did not participate. Let us say that more students wore helmets at the school with your program. You have not demonstrated that your program was the reason for difference in helmet use unless you can show that the students at the school with the program did not wear helmets any more frequently before the bicycle-helmet program began than did the students at the school without the program. In other words, you must show that the students at the school with the program did not have a history of wearing helmets more often than did the students at the school without the program.Maturation: Sometimes events outside your program cause program participants to change their knowledge, attitudes, beliefs, or behavior while the program is under way. Such a change would be due to maturation, not to the program itself. For example, suppose you measured occupant-restraint use by the 4- and 5-year-olds who attended a year-long Saturday safety seminar, both when they began the seminar and when they completed it. Let us say that the children used their seatbelts more frequently after attending the program. You have not demonstrated that the program was effective unless you can also show that seatbelt use by a similar group of 4- and 5-year-olds did not increase just as much simply as a result of other events (e.g., the children’s increased manual dexterity due to development, exposure to a children’s television series about using seatbelts which ran at the same time as the seminar, or a new seatbelt law that went into effect during the course of the seminar).S CHEMATICS FOR EXPERIMENTAL AND QUASI-EXPERIMENTAL DESIGNSIntroduction: The steps involved in the various experimental and quasi-experimental designs are presented verbally and then in schematic form. In each schematic, we use the same symbols:R = Randomization O1 = The first, or baseline, observation (e.g., results of a survey to measure the knowledge, attitudes, beliefs, behaviors, or injury rates of the target population)O2 = The second observation (O3 = the third, etc.) X = Intervention P = Placebo (usually in parenthesis to indicate that a placebo may or may not be used)The schematic for each intervention and comparison group is shown on a separate line. For example,O 1 X O2means that there is only one group (one line), that the group is observed for a baseline measurement (O1), provided with the intervention (X), and observed again (O2) to measure any changes.Another example: RO 1 X O2RO 1 (P) O2means that people are randomly assigned [R] to one of two groups [two lines]. Both are observed for baseline measurements [O1]. One is provided with the injury intervention [X]; the other may or may not get a placebo intervention [(P)]. Both groups are observed again [O2] for any change.A placebo is a service, activity, or program material (e.g., a brochure) that is similar to the intervention service, activity, or material but without the characteristic of the intervention that is being evaluated. For example, to test the effectiveness of the content of a brochure about the value of installing smoke detectors, the intervention group will be given the brochure to read and discuss with the evaluator and the comparison group might be given a brochure on bicycle helmets to read and discuss with the evaluator. To ensure that the placebo conditions are comparable with those of the intervention, evaluators should give the same amount of time and attention to the comparison group as they give to the intervention group. E XAMPLES OF EXPERIMENTAL DESIGNSPretest-Posttest-Control Group Design: Scientists often call this design a true experiment or a clinical trial. These are the steps involved:1. Recruit people for the evaluation.2. Randomly assign each person [R] to one of two groups: one group will receive the injury intervention [X] and the other will not [(P)]. To select at random, use a computergenerated list of random numbers, a table of random numbers (found at the back of most books on basic statistics), or the toss of a coin.3. Observe (measure) each group’s knowledge, attitudes, beliefs, behaviors, injury rate, or any other characteristics of interest [O1]. You could use a survey (page 44), for example, to make this measurement.4. Provide the program service (the intervention) [X] to one group and no service or a placebo service [(P)] to the other group.5. Again, observe (measure) each group’s knowledge, attitudes, beliefs, behaviors, injury rates, or whatever other characteristic you measured before providing the program service [O2] .The schematic for the pretest-posttest-control group design is as follows:RO 1 X O2RO 1 (P) O2The effect of the program is the difference between [O1] to posttest [O2] for the intervention [X] group and [O1] to posttest [O2] for the comparison [(P)] group. To clarify, let’s take a hypothetical example of a study you might conduct during formative evaluation. Suppose you want to pilot test a proposed brochure designed to increase people’s awareness that working smoke detectors save lives.1. Select a group of people at random from the target population. This group is your study [evaluation] population.2. Randomly assign each person in the study population either to the intervention group or to the comparison group.3. Test each group to see what the members know about smoke detectors.4. Decide whether to give a placebo to the comparison group.5. Show the proposed brochure on smoke detectors only to intervention group members and allow them time to study it. If a placebo is used, show a brochure, perhaps on bicycle helmets, to the comparison group members and allow them to study it. Give the same amount of time and attention to each group.6. To see if their awareness has increased, test each group again to measure how much they now know about smoke detectors.Unless the proposed brochure is a dud, the intervention group’s awareness of the benefits of smoke detectors will increase. However, the comparison group’s test scores might also increase because of the placebo effect. For example, the comparison group might develop a rapport with the evaluators and want to please them, thus causing group members to put more thought into their responses during the second observation than they did during the first. In addition, just completing the survey at the first observation may cause them to think or learn more about smoke detectors and give better answers during the second observation.The effect of the brochure is the difference between the change (usually increase) in the intervention group’s awareness and the change (if any) in the comparison group’s awareness.Variations on the Pretest-Posttest-Control Group Design: There are several variations on the pretest-posttest-control group design.The pretest-posttest-control group-followup design is used to determine whether the effect of the program is maintained over time (e.g., whether people continue to wear seatbelts months or years after a program to increase seatbelt use is over). This design involves repeating the posttest at scheduled intervals. The schematic for this design is as follows:RO 1 X O2 O3 O4RO 1 (P) O2 O3 O4For example, suppose you want to test the effectiveness of counseling parents about infant car seats when parents bring their infants to a pediatrician for well-child care. First, select a target population for the evaluation (e.g., all the parents who seek well-child care during a given week). Then, observe (measure) the target population’s use of safety seats [O1]. Next, randomly assign some parents to receive counseling about car safety seats [X] and the remaining parents to receive a placebo (e.g., counseling on crib safety) [P]. At regular intervals after the counseling sessions, observe each group’s use of infant car seats to see how well the effect of the program is maintained over time (let’s say, 3 months [O2], 6 months [O3], and 9 months [O4]) .The cross-over design is used when everyone eligible to participate in a program must receive the intervention. Again, participants are randomly divided into two groups. Both groups are tested, but only one receives the intervention. At regular intervals, both groups are observed to see what changes (if any) have occurred in each group. After several observations, the second group receives the intervention, and both groups continue to be observed at regular intervals. Below is an example schematic for this design:RO 1 X O2 O3 O4 O5 O6 O7RO 1 O2 O3 O4 X O5 O6 O7A program is effective if the effect being measured (e.g., increase in knowledge) changes for Group 1 after the first observation and for Group 2 after the fourth observation.For example,
suppose you wanted to evaluate whether children
who took a fire-safety class presented by the fire department
had better fire-safety skills than children who did not
take the class. To conduct such an evaluation you could, for
example, test the fire-safety skills of all
the children in the
third grade of the local
elementary school, then randomly select
half of
the children (Group 1) to attend the fire-safety class
on September 15. You would test the fire-safety skills of all
the children again on,
say, October 15, November 15, and December
15. In January the other half of the class (Group 2) would
attend the fire-safety class. You would again test the fire-safety
skills of all the
children on January 15, February 15, and
March 15. If the class were to increase the children’s firesafety
skills, the results of evaluation
might look something like
this. |
||||||
The Solomon four-group design is useful when the act of measuring people’s pre-program knowledge, attitudes, beliefs, or behaviors (getting baseline measurements) may affect the program’s goals in one or both of the following ways:
To compensate for those possibilities, this design expands the pretest-posttest-control group design from two groups (one intervention and one control) to four groups (two intervention and two control). To separate the effect of getting a baseline measurement from the effect produced by the program, the evaluator takes baseline measurements of only one intervention and one control group. The four groups are distinguished from one another as shown below: Group 1: Provides baseline measurement and receives the intervention.[RO 1 X O2]Group 2: Provides baseline measurement and receives nothing or a placebo.[RO 1 (P) O2]Group 3: Provides no baseline measurement and receives the intervention.[R X O 2]Group 4: Provides no baseline measurement and receives nothing or a placebo.[ R (P) O2]Since the only difference between Groups 2 and 4 is that Group 2 provided a baseline measurement and Group 4 did not, the evaluator can compare the posttest results (O2) of Group 2 with those of Group 4 to determine the effect of taking a baseline observation (O1).Similarly, since the only difference between Group 1 and Group 3 is whether they provided a baseline measurement, evaluators can compare their posttest results (O2) to determine whether providing a baseline measurement primed program participants to be more interested in the program’s information, thus increasing the program’s effectiveness.The schematic for the Solomon four-group design is as follows: RO 1 X O2RO 1 (P) O2R X O 2R (P) O 2Unfortunately, however, since this variation increases the number of people required for study, it also increases the study’s cost, time, and complexity. As a result, people who are willing to participate in an evaluation with this design may be even less representative of the general population than people who would participate in an evaluation with a less complex, randomized design.E XAMPLES OF QUASI-EXPERIMENTAL DESIGNSHere are some examples of quasi-experimental designs. These are useful when a randomized (experimental) design is not possible:Nonequivalent Control Group Design: Sometimes it is difficult to introduce an injury-prevention program to some people and not to others (e.g., it is impossible to be sure that a radio campaign will reach only certain people in a town and not others). In such a case, the nonequivalent control group design is useful. It is similar to the pretest-posttestcontrol group design except that individual participants are not randomly assigned to separate groups. Instead an entire group is selected to receive the program service and another group not to receive it. For example, a radio campaign could be run in one town but not in a similar town some distance away.For this example, it is important to select two groups that are well separated geographically in order to reduce the likelihood that the effect of the injury intervention will spill over to the people who are not to receive the intervention. As the name of the design indicates, without randomization the groups will never be equivalent; however, they should be as similar as possible with respect to factors that could affect the impact of the program.As with the pretest-posttest-control group design, pretest each group [O1]; the result of the pretest shows the degree to which the two groups are not equivalent. Next, provide the intervention to one group [X] and a placebo or nothing [(P)] to the other. Then posttest each group [O2].The evaluator must look at history, in particular, as a possible way in which the two groups are not equivalent. See page 54 for a discussion of history as an explanation for change. The schematic for this design is as follows: O 1 X O2O 1 (P) O2Time Series Design: Sometimes it is impossible to have a control group that is even marginally similar to the intervention group (e.g., when a state program wants to evaluate the effect of a new state law). Although other states may be willing to act as comparison groups, finding a willing state that is similar with respect to legislation, population demographics, and geography is not easy. Furthermore, it is difficult to control the collection of evaluation data by a voluntary collaborator and even more difficult to provide funding to the other state.The time series design attempts to control for the effects of maturation when a comparison group cannot be found. Maturation is the effect that events outside the program have on program participants while the program is under way. See page 55 for a full discussion on maturation.To minimize the effect of maturation on program results, take multiple measurements (e.g., O1 through O4) of program participants’ knowledge, attitudes, beliefs, or behaviors before an injury-prevention program begins and enter those measurements into a computer. Then, using special software, you can predict the future trend of those measurements were the program not to go into effect. After the program is over, again take multiple measurements (e.g., O5 through O8) of program participants’ knowledge, attitudes, beliefs, or behaviors to determine how much the actual post-program trend differs from the trend predicted by the computer.If the actual trend in participants’ knowledge, attitudes, beliefs, or behaviors during the course of the program is statistically different from the computer-predicted trend, then you can conclude that the program had an effect. The major disadvantage to this design is that it does not completely rule out the effect of outside events that occur while the program is under way. For example, this design would not separate the effect of a new law requiring bicyclists to wear helmets from the effect of increased marketing by helmet manufacturers. Although this design cannot eliminate the effects of outside events, it does limit them to those that are introduced simultaneously with the injury-prevention program. The schematic for this design is as follows: O 1 O2 O3 O4 X O5 O6 O7 O8Multiple Time Series Design: This design combines the advantages of the nonequivalent control group design (page 61) with those of the time series design (page 62): the effects of history on program results are reduced by taking multiple baseline measurements, and the effects of maturation are reduced by the combined use of 1) a comparison group and 2) predicted trends in baseline measurements. As with the nonequivalent control-group design, a disadvantage of this design is that the groups are not strictly equivalent and may be exposed to different events that could affect results. The schematic for this design is as follows:O 1 O2 O3 O4 X O5 O6 O7 O8O 1 O2 O3 O4 O5 O6 O7 O8
|
||||||
C ONVERTING DATA ON BEHAVIOR CHANGE INTO DATA ON MORBIDITY AND MORTALITYYou can convert data on changes in the behavior your program was designed to modify into estimates of changes in morbidity and mortality if you know the effectiveness of the behavior in reducing morbidity and mortality.As an example, let us suppose your program was designed to increase seatbelt use. Let us also suppose that you counted the number of people wearing seatbelts at a random selection of locations around your city both before and after the program. You found that 20% more people in large cars and 30% more people in small cars are wearing seatbelts after the program than before.To convert that 20% increase in seatbelt use (for people in large cars) to a decrease in deaths and injuries, you will need two sets of information:
In our example, both sets of information are available.
|
||||||
Table 4. Relative Risk for Death or Moderate-to-Severe Injury in a Car Crash10 | ||||||
|
||||||
Relative Risk |
||||||
|
||||||
Car Size |
Seatbelt Buckled |
Seatbelt Unbuckled |
||||
|
||||||
Large >3,000 lbs |
1.0 |
2.3 | ||||
Small <3,000 lbs |
2.1 |
5.0 |
||||
|
||||||
Let’s say, for our example, that 125 people were severely injured or died in large cars and 500 in small cars during the year before the program began. Now the calculation:1. Subtract the risk ratio for people wearing seatbelts in large cars (1.0) from the risk ratio for people not wearing seatbelts in large cars (2.3):2.3 - 1.0 = 1.3 The result (1.3) is the amount of risk ratio that is attributable to not wearing seatbelts2. Divide this difference (1.3) by the total risk ratio for people not wearing seatbelts (2.3):1.3 ÷ 2.3 = 0.565 3. Express the result as a percentage:0.565 x 100 = 56.5% This calculation tells us that, when riding in a large car, people reduce their risk for injury or death by 56.5% if they buckle their seatbelts.4. Multiply the percentage of decreased risk (56.5%) by the increase in the percentage of people wearing seatbelts in large cars (in our example, 20%):56.6% x 20% = 0.566 x 0.20 = 0.1132 = 11.3% This calculation shows that injuries and deaths are reduced by 11.3% among people in large cars when 20% more of them buckle their seatbelts.5. Multiply the percentage of decreased risk in large cars (11.3%) by the number of injuries and deaths in large cars (in our example, 125):11.3% x 125 = 0.113 x 125 = 14.125 This calculation shows that 14 fewer people will die or be seriously injured as a result of a 20% increase in seatbelt use by people traveling in large cars.6. Repeat the same series of calculations for people traveling in small cars.7. Add the numbers for large cars and for small cars to determine the total number of lives saved.C ONVERTING DATA ON BEHAVIOR CHANGE INTO DATA ON COST SAVINGSTo convert data on behavior change (e.g., increased seatbelt use) into estimates of financial savings per dollar spent on your program, you can do the same set of calculations as those used to convert data on behavior change into estimates of changes in morbidity and mortality (page 64). Then multiply the number of deaths and injuries prevented by the cost associated with deaths and injuries, and divide by the total cost of the program. For example, if your program to increase seatbelt use produces an estimate that it saved 14 lives during the previous year, multiply 14 by the average cost-per-person associated with a death due to injuries sustained in a car crash, then divide the result by the total cost of the program.S UMMARY OF QUANTITATIVE METHODSQuantitative methods of evaluation allow you to express the results of your activities or program in numbers. Such results can be used to draw conclusions about the effectiveness of the program’s materials, plans, activities, and target population. Table 5 lists the quantitative methods we have discussed in this chapter and the purpose of each one.
|
||||||
Table able 5. Quantitative Methods Used in Evaluation |
||||||
|
||||||
Method |
Purpose |
|||||
|
||||||
Counting systems
|
|
|||||
|
||||||
Surveys
|
|
|||||
|
||||||
Experimental studies
|
|
|||||
|
||||||
Quasi-experimental studies
|
|
|||||
|
||||||
Converting data on behavior change into data on morbidity and mortality
|
|
|||||
|
||||||
Converting data on behavior change into data on cost savings
|
|
|||||
|
||||||
This page last reviewed April 1, 2005. Privacy Notice - Accessibility Centers
for Disease Control and Prevention
|