Youth Violence: A Report of the Surgeon General

Chapter 5:
Prevention and Intervention

Promoting Healthy, Nonviolent Children

Methods of Identifying Best Practices

Scientific Standards for Determining Program Effectiveness

Model
Promising
Does Not Work

Stategies and Programs: Model, Promising, and Does Not Work

Appendix 5-A: Consistency of Best Practices Evaluations

Appendix 5-B: Descriptions of Specific Programs That Meet Standards for Model and Promising Categories

Model Programs: Level 1 (Violence Prevention)

Model Programs: Level 2 (Risk Prevention)

Promising Programs: Level 1 (Violence Prevention)

Promising Programs: Level 2 (Risk Prevention)

Chapter 5

SCIENTIFIC STANDARDS FOR DETERMINING PROGRAM EFFECTIVENESS

The scientific community agrees on three standards for evaluating effectiveness: rigorous experimental design, evidence of significant deterrent effects, and replication of these effects at multiple sites or in clinical trials. For example, the level of evidence required to establish the effects of an agent or intervention in Mental Health: A Report of the Surgeon General (1999) was demonstration of the effects in randomized, controlled experimental studies that had been replicated. The U.S. Food and Drug Administration requires the same level of evidence before approving a new drug for use in humans. Unfortunately, this level of evidence has not been routinely required by agencies that recommend or fund youth violence prevention programs, though some organizations and most researchers are calling for establishment of meaningful criteria for program effectiveness (Elliott, 1998; Mendel, 2000, p. 74). Most researchers want evaluations to meet one or more of these three scientific standards for assessing effectiveness.

Rigorous experimental design includes, at a minimum, random assignment to treatment and control groups (Andrews, 1994; Center for Substance Abuse Prevention, 2000; Chamberlain & Mihalic, 1998; Howell et al., 1995; Lipsey, 1992a; Lonigan et al., 1998). A less stringent, but acceptable, study design is quasi-experimental, in which equivalent comparison and control groups are established but assignment of study participants to the groups is not random (Center for Substance Abuse Prevention, 2000; Howell et al., 1995; Lipsey, 1992b; Sherman et al., 1997; Tolan & Guerra, 1994).

Well-designed studies should also have low rates of participant attrition, adequate measurement, and appropriate analyses (Andrews, 1994; Center for Substance Abuse Prevention, 2000; Chamberlain & Mihalic, 1998). High attrition can undermine the equivalence of experimental and control groups. It can also signal problems in program implementation. Adequate measurement implies that the study measures, including the outcome measure, are reliable and valid indicators of the intended outcomes and that they are applied with quality, consistency, and appropriate timing (Tolan & Guerra, 1994).

In clinical trials, replication means conducting both efficacy and effectiveness trials (Lonigan et al., 1998). Efficacy trials test for benefits to participants in a controlled, experimental setting, and effectiveness trials test for benefits in a natural, applied setting. In practice, this distinction is often blurred, but the principle of independent replication at multiple sites is well established. Replication is an important element of program evaluation because it establishes that a program and its effects can be exported to new sites and implemented by new teams under different conditions. A program that is demonstrated to be effective at more than one site is likely to be effective at other sites as well.

Statistical significance is based on the level of confidence with which one can conclude that a difference between two or more groups (generally a treatment and a control group) results from the treatment delivered and not, for example, from the selection process or chance. A probability value of .05 is widely accepted as the threshold for statistical significance; a probability below this threshold (p < .05) indicates that a difference of this magnitude could happen by chance less than 5 percent of the time.

High-quality evaluations of youth violence prevention programs should be designed to demonstrate with this degree of confidence that a program is reducing the onset or prevalence of violent behavior or individual rates of offending (Andrews, 1994; Tolan & Guerra, 1994). Since serious delinquency is strongly related to violence, reductions in serious criminal behavior (or index crimes) are also considered to be acceptable outcome measures for identifying effective violence prevention programs (Andrews, 1994; Elliott, 1998; Lipsey, 1992a, 1992b). However, direct scientific evidence of a deterrent effect on violent behavior is certainly preferable.

Prevention programs are designed to prevent or reduce violent behaviors by acting on risk and protective factors. Reducing risk is a less stringent standard than reducing violence, but reducing risk undoubtedly holds some promise of preventing violence. Thus, significant changes in risk factors for violence are acceptable indications of program effectiveness (Gottfredson, 1997; Gottfredson et al., in press; Howell et al., 1995; Sherman et al., 1997). In addition, because most violence begins in adolescence, childhood interventions are concerned primarily with risk reduction.

A less widely accepted but nevertheless important standard for demonstrating effectiveness is long-term sustainability of effects (Elliott & Tolan, 1999). Although this criterion may not be required to establish effectiveness in other disciplines, it is very important in evaluating violence prevention programs because beneficial effects can diminish quickly after youths leave a treatment setting or program to return to their usual environment.

Effective programs produce long-term changes in individual competencies, environmental conditions, and patterns of behavior. Thus, successful programs get youths off a violent life course trajectory. The sustainability of effects is particularly difficult for early intervention programs, which can be implemented more than a decade before the peak age of onset for youth violence. Ideally, effects would be sustained though adolescence. On a practical level, programs in this report are considered to have demonstrated sustainability if the effects of the intervention continue for at least a year after treatment or participation in the designed intervention, with no evidence of a subsequent loss of effect (Elliott & Tolan, 1999).

Higher standards should be set for programs that are promoted and disseminated on a national level than for those being developed and implemented on a more restricted basis at the local level. Before a program is recommended and funded for national implementation, it is important to show clearly that it has a significant, sustained preventive or deterrent effect and that it can be expected to have positive results in a wide range of community settings (as long as it is implemented correctly and with the appropriate population). Programs that meet such high standards are designated Model programs. Those that do not quite meet these rigorous standards are recognized and encouraged as Promising, with the caution that they be carefully evaluated.

Identifying ineffective programs is another element of assessing best practices. It is as important to know which programs do not work—and should not be supported with limited prevention funds—as it is to know which do work. The same scientific standards are used in judging effectiveness and ineffectiveness. Because it is generally unlikely that a high-quality evaluation will be conducted on a program that shows little sign of effectiveness, only two specific programs have been designated Does Not Work in this report.

Some general strategies identified as ineffective in this report may not actually be flawed; rather, their lack of effectiveness may result from poor program implementation or a poor match between program and target population. Alternatively, some approaches may appear ineffective when used in isolation because their effects are quite small and difficult to detect. These approaches should not be used alone, but they may be useful as components of more comprehensive strategies that have positive preventive effects. In other cases, however, a program or approach may be ineffective because the basic strategy is flawed—that is, the method or approach used to change the targeted risk or protective factors does not have the intended effect.

The following is a summary of the scientific standards for establishing the effects of a violence prevention program.

Model

Rigorous experimental design (experimental or quasi-experimental)

Significant deterrent effects on:

Violence or serious delinquency (Level 1)
Any risk factor for violence with a large effect (.30 or greater) (Level 2)

Replication with demonstrated effects

Sustainability of effects

Promising

Rigorous experimental design (experimental or quasi-experimental)
Significant deterrent effects on:

Violence or serious delinquency (Level 1)
Any risk factor for violence with an effect size of .10 or greater (Level 2)

Either replication or sustainability of effects

Does Not Work

Rigorous experimental design (experimental or quasi-experimental)
Significant evidence of null or negative effects on violence or known risk factors for violence
Replication, with the preponderance of evidence suggesting that the program is ineffective or harmful

Other standards have been proposed for youth violence prevention programs, particularly those intended for implementation on a national level. One of these is cost-effectiveness, a key consideration in program funding but not a scientific criterion for effectiveness. Unfortunately, there are no standardized cost criteria for violence prevention programs, so it is difficult to compare costs across programs (Elliott, 1998). Moreover, it is difficult to obtain reliable cost-benefit estimates for individual programs. Despite these obstacles, some researchers have conducted extensive reviews of the costs and benefits of violence and delinquency prevention and intervention programs (Greenwood, 1995; Greenwood et al., 1998; Karoly et al., 1998; Washington State Institute for Public Policy, 1999). Their findings will be discussed in the cost-effectiveness section of this chapter. This is an important and growing area of research.

Setting such stringent scientific standards automatically limits the number and types of programs that will be identified as effective in this report. The specific programs that can meet these standards will be determined in part by the nature of the program—the design must lend itself to scientific evaluation—and in part by whether funding has been made available for program evaluation. For instance, early childhood individual change programs are overrepresented in the list of effective programs. This fact is probably a result of the relatively large amount of funding allocated to the study of these programs and the relative ease with which experimental evaluations can be carried out. On the other hand, programs promoting change in the social structure, community-level programs, and programs that focus on environmental change more generally (in schools, neighborhoods, peer groups, and so on) are probably underrepresented in this report. Evaluation of such programs and strategies is more difficult and costly; therefore, fewer rigorous evaluations of these programs have been done.

Because of these limitations, the programs discussed in this report may not represent the overall balance of youth violence prevention programs currently being implemented in communities throughout the country. This shortcoming highlights the need for more research on program effectiveness and for the development of additional criteria and valid measures for assessing the effects of community- or school-based and environmental change programs. In addition, the imbalance should not be interpreted as an indication that such programs are less effective than programs that focus on individual change. Indeed, there is some evidence that school-based programs designed to change the social climate of the classroom or school are more effective than individual change programs (Gottfredson et al., in press).

Home | Contents | Previous | Next