Skip Navigation

What Works Clearinghouse


WWC Procedures and Standards Handbook
WWC Procedures and Standards Handbook
Version 2.0 – December 2008

Appendix E – Intervention Rating Scheme

  1. Definitions and Defaults
  2. Characterizing Study Effects

The following heuristics are applied to the outcome variable(s) identified by the principal investigator (PI) as relevant to the review. The PI may choose to ignore some variables if they are judged sufficiently peripheral or nonrepresentative and to consider only the remaining ones. Similarly, if the PI judges that there is one core variable with all the others secondary or subsidiary, only that one may be considered.

A. Definitions and Defaults

  • Strong and weak designs. A strong design is one that Meets Evidence Standards, whereas a weak design is one that Meets Evidence Standards with Reservations.
     
  • Effect size. A single effect size or, in the case of multiple measures of the specified outcome, either (1) the mean effect size or (2) the effect size for each individual measure within the domain.
     
  • Substantively important. The smallest positive value at or above which the effect is deemed substantively important with relatively high confidence for the outcome domain at issue. Effect sizes at least this large will be taken as a qualified positive effect even though they may not reach statistical significance in a given study. The suggested default value is a student-level effect size greater than or equal to 0.25.17 The PI may set a different default if explicitly justified in terms of the nature of the intervention or the outcome domain.
     
  • Statistical significance. A finding of statistical significance using a two-tailed t-test with a = .05 for a single measure or mean effect within each domain.
     
  • Accounting for clustering. A t-test applied to the effect size (or mean effect size in cases of multiple measures of the outcome) that incorporates an adjustment for clustering. This procedure allows the reviewer to test the effect size directly when a misaligned analysis is reported (see Appendix D). The suggested default intra-class correlation (ICC) value is .20 for achievement outcomes and .10 for behavioral and attitudinal outcomes. The PI may set different defaults if explicitly justified in terms of the nature of the research circumstances or the outcome domain.
     
  • Accounting for multiple comparisons. When multiple hypothesis tests are performed within a domain, the Benjamini-Hochberg procedure may be used to correct for multiple comparisons and identify statistically significant effects for individual measures (see Appendix E).

Top

B. Characterizing Study Effects

Statistically significant positive effect if any of the following is true:

If the analysis as reported by the study author is properly aligned:

     For a single outcome measure:

  • The effect reported is positive and statistically significant.

     For multiple outcome measures:

  • Univariate statistical tests are reported for each outcome measure and at least half of the effects are positive and statistically significant and no effects are negative and statistically significant.
     
  • Univariate statistical tests are reported for each outcome measure and the effect for at least one measure within the domain is positive and statistically significant and no effects are negative and statistically significant, accounting for multiple comparisons.
     
  • The mean effect for the multiple measures of the outcome is positive and statistically significant.
     
  • The omnibus effect for all the outcome measures together is reported as positive and statistically significant on the basis of a multivariate statistical test.

If the analysis as reported by the study author is not properly aligned:

     For a single outcome measure:

  • The effect reported is positive and statistically significant, accounting for clustering.

     For multiple outcome measures:

  • Univariate statistical tests are reported for each outcome measure and the effect for at least one measure within the domain is positive and statistically significant and no effects are negative and statistically significant, accounting for clustering and multiple comparisons.
     
  • The mean effect for the multiple measures of the outcome is positive and statistically significant, accounting for clustering.

Substantively important positive effect if the single or mean effect is not statistically significant, as just described, and either of the following is true:

     For a single outcome measure:

  • The effect size reported is positive and substantively important.

     For multiple outcome measures:

  • The mean effect size reported is positive and substantively important.

Indeterminate effect if the single or mean effect is neither statistically significant nor substantively important, as described earlier.

Substantively important negative effect if the single or mean effect is not statistically significant, as described earlier, and either of the following is true:

     For a single outcome measure:

  • The effect size reported is negative and substantively important.

     For multiple outcome measures:

  • The mean effect size reported is negative and substantively important.

Statistically significant negative effect if no statistically significant or substantively important positive effect has been detected and any of the following is true:

If the analysis as reported by the study author is properly aligned:

     For a single outcome measure:

  • The effect reported is negative and statistically significant.

     For multiple outcome measures:

  • Univariate statistical tests are reported for each outcome measure and at least half of the effects are negative and statistically significant.
     
  • Univariate statistical tests are reported for each outcome measure and the effect for at least one measure within the domain is negative and statistically significant, accounting for multiple comparisons.
     
  • The mean effect for the multiple measures of the outcome is negative and statistically significant.
     
  • The omnibus effect for all the outcome measures together is reported as negative and statistically significant on the basis of a multivariate statistical test.

If the analysis as reported by the study author is not properly aligned:

     For a single outcome measure:

  • The effect reported is negative and statistically significant, accounting for clustering.

     For multiple outcome measures:

  • Univariate statistical tests are reported for each outcome measure and the effect for at least one measure within the domain is negative and statistically significant, accounting for clustering and multiple comparisons.
     
  • The mean effect for the multiple measures of the outcome is negative and statistically significant, accounting for clustering.

17 Note that this criterion is entirely based on student-level effect sizes. Cluster-level effect sizes are ignored for the purpose of the rating scheme because they are based on a different effect size metric than the student-level effect sizes and, therefore, are not comparable to student-level effect sizes. Moreover, cluster-level effect sizes are relatively rare, and there is not enough knowledge in the field yet to set a defensible minimum effect size for cluster-level effect sizes.

Top

PO Box 2393
Princeton, NJ 08543-2393
Phone: 1-866-503-6114