To determine whether a program caused a particular outcome, a study’s research design
must be able to rule out alternative explanations. For example, an employment program
for low-income fathers may measure employment levels before and after program participation,
but changes in employment between the two points in time may be caused by factors other than the program.
Fathers who are motivated to attend the program may also be motivated to seek out jobs,
so their employment levels might increase over time regardless of program participation.
To measure the true effects of the program, we must also estimate the “counterfactual”—that is,
what would have happened in the absence of the program.
In the SFER, only studies that used
a comparison group with characteristics that are initially similar to those of the treatment group
are considered credible impact studies. The outcomes of the comparison group represent the counterfactual.
In the example above, the comparison group would be a group of similar fathers who did not participate in the program.
These fathers could be followed over the same period of time and used to establish what the program
participants’ outcomes would have been without the program. The differences at followup between this group
(those who did not participate in the program) and the treatment group (those who did) likely reflects
the effects of the program on employment, rather than the effects of other factors.
Not all comparison groups provide equally plausible counterfactual comparisons, and this review
does not designate all studies with a comparison group as credible impact studies. In some cases,
studies use comparison groups that differ in important ways from the treatment groups. For example,
if a comparison group includes fathers with a lower level of educational attainment than those in the treatment group,
the fathers in the comparison group may have poorer employment prospects regardless of the program. In this case,
the comparison group is not a good representation of the counterfactual because the treatment-group fathers
and comparison-group fathers are different before the program begins.
A study design that randomly assigns participants to treatment or comparison groups is one of the best designs
for establishing causality. In a randomized controlled trial, fathers are assigned by chance to one of the two groups.
The key advantage of this design is that fathers in the treatment and comparison groups are similar, on average,
in all initial characteristics, whether they are measured (such as education or employment history) or unmeasured
(such as intrinsic motivation to get a job). If the treatment and comparison groups are very similar
at the beginning of the study, the comparison group will be an excellent representation of the counterfactual.
To indicate the study’s quality for determining the effects of the program, we assigned a rating to every study
that includes participant outcomes. This rating reflects the level of confidence that should be applied
when assessing how well the research design can determine whether the program, rather than other factors,
caused the reported outcomes. We took into account factors such as the use of a comparison group,
use of random assignment, and similarities between the treatment and comparison groups before the start of the program.
There are three rating categories: high, moderate, and low. (Studies that do not include participant outcomes
were unrated.) Only impact studies that used random assignment (randomized controlled trial) could receive a high rating.
Studies with a nonrandomly assigned comparison group (quasi-experimental design)
that was equivalent at baseline could receive a moderate rating.
1
 
We assigned low ratings to studies that reported outcomes but did not use a comparison group
(such as pre/post designs) as well as studies that had methodological problems. Studies that did not include
participant outcomes were unrated. See the table below for more details on the quality rating system.
[1]
Regression discontinuity and
single case designs also have strong internal (causal) validity, but we did not
identify any relevant studies with these designs.
SUMMARY OF RATING CRITERIA
- The sample was randomly assigned to at least two conditions
(for example, treatment and comparison groups).
- The sample meets the What Works Clearinghouse
(WWC)a
standards for low levels of overall and differential attrition.
- The sample members were not reassigned after random assignment was conducted
(for example, members assigned to the treatment group were not switched
to the comparison group or vice versa).
- There are no confounding factors, when one part of the design lines up exactly
with either the treatment or comparison groups. An example would be a study
in which all fathers in the treatment group are from one county, and all fathers
in the comparison group are from another county. In this case, we cannot distinguish
between the effect of the program and the effect of county-related factors,
such as access to other available services.
- The analysis includes statistical adjustments for selected measures
(baseline measures of the outcomes, race/ethnicity, and socioeconomic status)
if the treatment and comparison groups are not equivalent on these measures at baseline.
|
- Not applicable; these studies cannot receive a high rating because
the sample was not randomly assigned.
|
- Not applicable; these studies cannot receive a high rating
because there is no comparison group.
|
- The sample members were not reassigned after random assignment was conducted.
- The sample meets the WWC standards for low levels of overall and differential attrition.
- There are no confounding factors.
- The study includes groups that were not equivalent on selected baseline measures
(baseline measures of the outcomes, race/ethnicity, or socioeconomic status,
but the analysis does not include statistical adjustments.
OR
- The study has high rates of overall or differential attrition
OR sample members were reassigned after random assignment was conducted.
- There are no confounding factors.
- There is baseline equivalence of the treatment and comparison groups on selected measures
(baseline measures of the outcomes, race/ethnicity, and socioeconomic status).
- The analysis includes statistical adjustments for the selected measures.
|
- There are no confounding factors.
- There is baseline equivalence of the treatment and comparison groups on selected measures
(baseline outcomes, race/ethnicity, and socioeconomic status).
- The analysis includes statistical adjustments for the selected measures.
|
- Not applicable; these studies cannot receive a moderate rating
because there is no comparison group.
|
- A study received a low rating if it includes participant outcomes
but does not meet the criteria for a high or moderate rating.
|
- We did not rate studies that do not include participant outcomes.
|
[a]
WWC is an initiative of the U.S. Department of Education’s Institute of Education Sciences,
which reviews and evaluates education research. For more information, visit
http://ies.ed.gov/ncee/wwc/.