Archived: National Evaluation of The Even Start Family Literacy Program

A r c h i v e d I n f o r m a t i o n

National Evaluation of The Even Start Family Literacy Program: 1998

Chapter 9

Insights into Evaluation of the Impact of the Even Start Program

At one level, the modifications to the Even Start program, and the corresponding refinements in the ESIS data collection forms, illustrate the value of maintaining clear connections between the program under review and the methods employed for that review. Yet the outcome or impact portion of the evaluation did not change over the course of the program's second four years. There are several key issues that have surfaced: identifying appropriate educational or other progress indicators; ensuring consistency and quality of data collection; test administration; use of comparison groups; and matching data collection to actual participation patterns. Each of these is discussed below.

Identifying Appropriate Outcomes

Not surprisingly, some of the changes in the participant population have had consequences for the second national evaluation. When decisions were made over five years ago about the appropriate educational and developmental measures to include in the current evaluation, for example, the proportion of Hispanic participants was considerably lower than it is now, in 1998. Because the Even Start program is designed to improve parents' literacy in English, the evaluation focused upon assessments of adults' progress in English. Over the past several years, however, the steady increase in the numbers of Hispanic participants has translated into an increasing number of adults for whom there has not been any assessment of progress in any language. Additionally, there is growing consensus in the field of second language acquisition that facility in a second language requires some minimal proficiency in a first language (August and Hakuta, 1997). While measures of functional literacy (like the CASAS) can assess progress for some low-literate adults, the progress of families with very limited educational experience has not been adequately assessed.

Given what we know about patterns of participation, what are the appropriate outcomes? Progress measured on standardized assessments such as the TABE may be inappropriate for adults who enter the program having completed fewer than six years of formal schooling, or for whom attaining a GED represents a multi-year process.

Ensuring Consistency and Quality of Data Collection

In retrospect, the strategies used to collect outcome data were not as effective as planned. The Sample Study staff were trained in test administration and scoring only once. It is now clear that personnel changes at the project level resulted in inconsistently trained Sample Study staff, and consequently, in inconsistent data quality. Additionally, there was an assumption that all Even Start projects would attend annual program and evaluation conferences that would include sessions devoted specifically to evaluation. In the first national evaluation, such annual evaluation-focused conferences provided feedback to all projects about the data they were submitting as part of the national evaluation, and also provided introductory or refresher training in test administration and data entry to program staff as necessary. There have been at least two consequences of the absence of such conferences: one, Sample Study project staff have not consistently been trained either about test administration or data entry, and two, neither Sample Study nor other projects have participated in program-wide conversations about the use of evaluation data. Both of these consequences have obvious implications for evaluation. In order for the national evaluation to examine the Even Start program, the relationship between data quality and any credible findings must be clearly understood at all levels—from the state to the local project staff responsible for recording and submitting project- and participant-level data. The evaluation began to provide local projects with summary data from their own projects and their own states (as well as national level data) in order to make such comparisons useful at the local level.

Test Administration

The selection of instruments reflected a concern that tests be relatively easy to administer and score for people with varied experience in testing. The Sample Study relied upon local project staff to administer tests and to record test scores. Although all Sample Study projects sent staff to training in the fall of 1994, Sample Study staff ranged from inexperienced to expert in their experience and comfort with test administration. Since then, there has been turnover at the project level, and the recorded scores from test administrations have reflected lack of experience in scoring tests correctly. It is clear that if project staff are responsible for administering and scoring tests, the instruments should be easy to use, the use of the local testing data at the national level should be understood, and that project staff should be required to attend regular training in the event of local project personnel changes.

Use of Comparison Groups

The Sample Study did not use a program-comparison group design, and as a result, changes in test scores for Even Start participants cannot be compared to non-participants. For some measures, we have been able to fall back upon findings from the In-Depth Study component of the first national evaluation, but because the measures have changed, such reference points have not consistently been available. The Sample Study design would have been much stronger had there been a comparison group against which to assess progress of Even Start participants.

Matching the Data Collection Schedule to Actual Participation Patterns

The Sample Study measurement design assumed that data could be collected over two program years, or for three waves of data collection: the first wave at intake, the second wave sometime later during the same program year (assuming entry in the fall months), and a third wave sometime during the second year of participation. The decision to collect data using this schedule reflected, among other factors, the experience of the first national evaluation, when evaluation contractor staff themselves located and visited the participating families in the In-Depth Study to administer assessments (even when families were no longer active Even Start participants). In the Sample Study, however, most families remained actively involved in Even Start for far less than two program years, and only approximately 10 percent of the Sample Study participants were available for a third wave. Tying data collection to the actual patterns of participation is critical.

-###-

[Lessons Learned About Evaluation of the Even Start Program]

[Future Evaluation Questions]