Skip Navigation

What Works Clearinghouse


Effect Size Substantive Interpretation Guidelines:<br />Issues in the Interpretation of Effect Sizes
Effect Size Substantive Interpretation Guidelines:
Issues in the Interpretation of Effect Sizes

Jeff Valentine and Harris Cooper

Attempts to Characterize the Strength of Standardized Measures of Effect Size

There are several ways to characterize the strength of the standardized measures of effect size. Below we discuss the strengths and limitations of three of them.

Proportion of Variance Explained

One well-known method of characterizing the strength of a relationship involves calculating the proportion of variance explained. This is what happens when a researcher writes "The correlation between self-concept and achievement is r = +.30. Thus, self-concept accounts for 9% of the variance on achievement." The researcher arrived at the 9% number by squaring the correlation coefficient (.30 in the example). Squaring the correlation coefficient puts the estimate of the relationship in the context of the total variance in the outcome measure.

There are problems with using this metric as a measure of the strength of an intervention’s impact. Most importantly, the proportion of variance explained often seems low and this can lead even experienced researchers to labeling efficacious interventions as ineffective (Rosenthal, 1984). We presume that more general audiences, such as the policymakers and the public, will experience these same problems to an even greater degree.

In addition, proportion of variance explained can be applied—in a misleading fashion—to comparisons involving more than two groups. For example, one study might compare reading intervention A, reading intervention B, and a control reading intervention. A second study might compare reading intervention B, reading intervention C, and a control intervention. A proportion of variance explained estimate can be derived from both studies. However, comparing or combining these estimates would be misleading, because the statistics were not focused on a single comparison between a reading intervention and a control group, making it impossible to know which interventions were different from which.

Cohen’s Benchmarks

Cohen (1988) attempted to address the issue of interpreting effect size estimates relative to other effect sizes. He suggested some general definitions for small, medium, and large effect sizes in the social sciences. However, Cohen chose these quantities to reflect the typical effect sizes encountered in the behavioral sciences as a whole—he warned against using his labels to interpret relationship magnitudes within particular social science disciplines or topic areas. His general labels, however, illustrate how to go about interpreting relative effects.

Cohen labeled an effect size small if d = .20 or r = .10. He wrote, "Many effects sought in personality, social, and clinical-psychological research are likely to be small . . . because of the attenuation in validity of the measures employed and the subtlety of the issue frequently involved" (p. 13). Large effects, according to Cohen, are frequently "at issue in such fields as sociology, economics, and experimental and physiological psychology, fields characterized by the study of potent variables or the presence of good experimental control or both" (p. 13). Cohen suggested large magnitudes of effect were d = .80 or r = .50. Medium-sized effects were placed between these two extremes, that is d = .50 or r = .30.

A caution against using Cohen’s benchmarks as generic descriptors of the magnitude of effect size is implied above. Because some areas, like education, are likely to have smaller effect sizes than others, using Cohen’s labels may be misleading.

Proportion of Distribution Overlap

Cohen (1988) proposed another method for characterizing effect sizes by expressing them in terms of distribution overlap, called U3. This statistic describes the percentage of scores in the lower-meaned group that are exceeded by the average score in the higher-meaned group. As an example, assume that high school students who do homework outperform students who don’t do homework, and that the effect size is d = +.20. For this effect size, U3 is approximately equal to 57. In this example, it would mean that if an average student left a high school in which all students did homework and moved to a high school in which no students did homework, then that students would move from the 50th percentile to the 57th percentile in achievement.

PO Box 2393
Princeton, NJ 08543-2393
Phone: 1-866-503-6114