I. Introduction
Monitoring and evaluation are critical components of the U.S.
Department of Agriculture’s (USDA) pilot Global Food for Education (GFE)
program. In response to the need to monitor and evaluate implementation by
private voluntary organizations (PVO’s), USDA’s Foreign Agricultural Service
(FAS) asked its International Cooperation and Development (ICD) program area to
hire qualified staff to manage and design a program to effectively accomplish
this task. For statistical technical assistance with sampling and analysis, ICD
asked USDA’s National Agricultural Statistics Service (NASS) to help design a
plan to adequately fulfill this requirement.
The GFE plan (designed by NASS) relies heavily on survey
design and statistical sampling to accomplish its objectives effectively within
the limited resources. The general approach is to identify what estimate(s) of
some characteristic(s) of the target population are required. For GFE, the
strategy is to define the objective and design a methodology that will
efficiently monitor the program performance by the PVO’s, and collect
appropriate data for evaluation purposes to quantify program effectiveness.
To best accomplish the design objectives, the target
population of interest is limited to the schools selected by the PVO’s
participating in the GFE feeding program. This is the target population from
which all estimates will make inference. Considering that GFE is a pilot
program, non-GFE schools can be excluded as they represent a nonparticipating
sub-set of the total population of schools.
FAS’ Export Credits program area is responsible for
administering GFE through a series of agreements in partnership with a PVO. In
some instances, a country may have several GFE programs, each with a separate
and independent PVO. Due to the different and difficult nature associated with
each participating country, each FAS agreement is unique.
The primary objective of a survey design is to specify the
methodology used for making inferences of the target population. For GFE, the
task is to specify a methodology to accomplish the monitoring and evaluation
needs only. The first basic requirement is to establish a separate and unique
domain for sampling and analysis purposes that categorizes each PVO project as a
separate population domain (entity) for sampling and making required inference
as to population characteristics. By definition, a pilot program is usually
reduced in scope and scale. Lessons learned from the pilot will be applied to
subsequent program expansions for improved program execution. In the case of the
pilot GFE, the program execution is a dynamic situation with a steep learning
curve. Each country/PVO agreement varies considerably, which naturally tends to
increase management needs for USDA administration. In some cases, the original
agreement required amending due to unforeseen conditions and/or circumstances.
Taking all factors into consideration and based on the pilot nature of GFE, the
design for monitoring and evaluation of each country/PVO project is best handled
as a separate case study.
Consistent with the nature of any pilot program, the
monitoring and evaluation component will be streamlined for each case study
based on budgetary constraints as well as on the limited number of trained and
experienced in-country field personnel and resources. Each case study requires
creation of a target population subset and designating it the case study domain
as the subset of schools participating in each country/PVO feeding program and
referred to as the population size (Ni) for the ith
agreement. Due to the limited monitoring and evaluation budget for each case
study, the number of sample schools is limited to about 20 and will be referred
to as the sample size (ni) for the ith
country/PVO project.
For the case study’s small sample to be more statistically
representative of the target population, the sample methodology design needs to
specify selection of a sample of schools using a purposeful sample technique.
This will avoid bias and ensure that the small number of schools selected in the
sample is sufficiently representative as to allow the necessary inferences for
measurement of program effectiveness. If and when GFE expands beyond the pilot
stage, it will be important to allocate sufficient resources to an adequate
number of sample schools to establish an efficient monitoring and evaluation
program. With adequate resources supporting a fully operational GFE, the
requirements and specifications for survey design will need to be more demanding
to achieve an acceptable level of statistical confidence and precision. The
results of the GFE pilot will be necessary and useful for determining a future
adequate sample size for an operational program.
II. Sampling
Purposeful sampling becomes a powerful tool for the
statistician and investigator when dealing with typically small sample sizes
associated with case studies. In the context of GFE methodology in conjunction
with a stratification matrix strategy, a purposeful sample is basically a random
sample of schools that is more representative of the target population than a
totally random sample with a small sample size. To further increase the
efficiency of a purposeful sample, the form of stratification uses the matrix to
sub-divide the target population (Ni) using as the matrix the
important target population characteristics or "factors" to control
variability inherent within those factors affecting the school feeding programs
associated with GFE. The matrix approach facilitates selection from each matrix
cell a purposeful sample of schools (nijk) from the matrix row
j of the column k of the sub-divided target population (Ni).
The sum of nijk sampled schools equals the total number of
schools (ni...) selected for monitoring and evaluation.
The use of the matrix approach for GFE sampling methodology is
an effective mechanism to collect representative data objectively with a small
sample in order to measure the program’s effectiveness. Matrix factors are the
most important elements within each country/PVO project and have the potential
to contribute differences in program effectiveness. It is anticipated that the
matrices will differ considerably from project to project, even within a single
country, when PVO’s have uniquely different feeding programs in different
areas of the country. The matrix factors will be identified by the GFE regional
coordinators during the initial phase of their work as the program, field staff,
and participatory government/private agencies are fully defined.
Once the matrix factors are defined for each country/PVO
project, each ijk-th target population school is systematically assigned
to one and only one of the ijk-th cells of the matrix. The total count of
schools in each cell, Nijk, becomes the sub-target population
size. During the analysis phase for modeling purposes, the sub-target population
count, Nijk, will be used as the model weights for the purpose
of indicator calculation used to measure program performance.
The sub-population count, Nijk, is used as
the basis for allocation of the purposeful sample, nijk,
within the matrix. Generally, the allocation will be proportional with a minimum
of two schools selected from each matrix cell. The sample school selection
process within each matrix cell will use systematic random sampling that
requires the schools in each cell to be ranked and arrayed by student population
size.
To determine whether the feeding programs achieved their
program goals, the GFE methodology will use three measurement criteria
(indicators): (1) enrollment, (2) attendance, and (3) performance. With
operational programs, normally the target population variability will dictate
the sample size necessary to achieve a certain precision of the estimate for the
desired indicators. Statistically, the inherent target population variability
can be controlled to a certain extent through stratification and classification
factors for placing schools into groupings that are more homogeneous within
groups than between groups or cells of the matrix.
When creating the matrix for each country/PVO project, the
regional coordinators need to consider logical school groupings based on
structure and environmental factors. Control of these factors is necessary
because of the impact they can potentially have on program performance and
success.
III. Background on WFP Sampling Methodology
The World Food Program (WFP) has prepared a paper that
describes the approach used for calculation of the sample size for its School
Feeding Baseline Survey. The WFP has based its survey design on a stratified
simple random sample approach and will sample a total of 3,700 schools in 23
countries, or roughly 161 sample schools per country. The actual country sample
sizes range from the smallest (60 schools) to the largest (388 schools). If one
makes the assumption that the issues facing WFP in these 23 countries are not
statistically different in school characteristics from those schools in
countries participating in GFE, then one would expect that comparable sample
sizes would be appropriate for GFE if USDA/FAS implements the same WFP survey
design.
The FAS plan developed for the GFE monitoring and evaluation
component is somewhat different from that implemented by WFP. Limited resources
requires tailoring the GFE survey design to produce comparable results more
efficiently. The solution is to make each country and PVO a separate case study
using an appropriate, purposeful sample of schools stratified using a matrix of
factors to control the target population variability. In the case of WFP, it has
chosen to use two different independent samples in its design—one for the
Baseline Survey and a different sample of schools for its Follow-up Survey.
As with any start-up program, the WFP survey design sample
methodology paper discusses the possibility that it may be necessary to adjust
the Follow-up Survey sample size. As stated in its paper: "This can occur
for instance when the indicators observed in the baseline survey showed
different levels from those that were used when calculating the required sample
sizes prior to the baseline survey. This would mean that the sample size used in
the baseline survey would be too small to satisfy the precision requirements for
the evaluation effort if used for the follow-up survey." Based on the
proposed GFE case study design described in this paper, making sample size
adjustments is not relevant.
IV. The GFE Approach to Evaluation of PVO Projects
The GFE monitoring and evaluation approach in this pilot
program is limited by available resources. While the GFE methodology is
statistically sound and defensible, limited resources require adoption of a plan
using a small, purposeful sample size tailored to a case study design requiring
more stringent controls on sampling frame construction.
Rather than using the WFP’s survey design based on two
independent samples, the GFE case study approach requires that a simplified
repeat visitation for the Follow-up Survey be completed for each of the Baseline
Survey sample schools. The repeat-sample design approach eliminates inherent
survey variability in the indicators due to differences by chance alone
associated with the use of two independent samples of different schools used for
the Baseline and Follow-up Surveys.
Details for implementation of a case study design for GFE will
follow and build on the general discussion at the beginning of this paper.
Detailed instructions will be developed as additional information becomes
available for field supervision by the GFE regional coordinators. It is
important to keep in mind that references to the WFP survey design are being
used only as a basis for comparison, and such reference should not be considered
in any way as making the WFP design a standard of comparison.
V. GFE Methodology Guidelines for PVO
Evaluation/Monitoring
There are basic guidelines that should be established for a
design of the GFE case study methodology and for determining the optimum
purposeful sample strategies. The following line items summarize the best
approach for examination and determination of each country’s critical design
factors; i.e., they assess the as-yet-unknown varying conditions,
infrastructure, and environmental issues.
- The WFP form template of questions is used as the basis for
developing the data collection form. For countries where specific data is
not applicable, the questions should be dropped from the form used in that
country. At a minimum, enrollment and attendance data will be collected.
- Each GFE/PVO country project is unique and should be
evaluated separately to determine the most efficient design and appropriate
sample size.
- If a PVO has collected "baseline data," this
information can be useful if identical information was obtained from each
participating school. This information, however, is not the GFE baseline
data needed for evaluation, which must be collected using questions derived
from the WFP form template. This is because even if each school asks for the
same information, but asks for it using a slightly different question, then
it is possible to get a different response. Thus, it is important that WFP
and USDA use the same form template and follow the final questionnaire
construction used in each country exactly as the questions come off the form
template. This is another reason that PVO baseline data cannot be used as
the basis for GFE baseline data.
- PVO baseline data could be useful for
"classifying" each of the participating schools in the GFE program
for sampling and estimation purposes. WFP classified each school as either a
"new" school or an "existing" school, the idea being
that existing school enrollment would have already increased from some lower
baseline prior to the school-feeding program. If the purpose is to entice
enrollment, it would be problematic to compare a "new" school with
no prior feeding program. The WFP strategy is to summarize these two groups
separately so that the analysis of the new schools’ overall performance
will be most advantageously reflected in the report.
- If resources are available for only a very small sample of
program schools, the WFP suggested that more than two classification
criteria, as described in number 4 above, be used, because a small sample
will not provide the same level of precision that the WFP has targeted;
i.e., measure change with a precision of 10-20 percent with a .05 level of
confidence. WFP suggested using a matrix approach with additional
classification criteria—pre-school, primary schools (using the official
government definition), and boarding schools. This approach requires
scrutinizing the PVO information on each school to determine the relevant
classification criteria. These would be used if the information is only
available on every school in the program. Since the agreement signed with
each PVO could have its own unique characteristics that could affect survey
design and sampling, this process is required for each GFE/PVO project.
- Sample sizes and sample selection procedures should be
determined on a country/project by country/project basis once the population
counts are determined for each cell (Ni) in a country’s
classification matrix.
VI. Detailed Discussion on
GFE/PVO Project Sampling
Each country PVO project will have its own unique
characteristics that will require tailoring the sampling design and data
collection form to best accommodate the particular differences associated with
each country PVO project. The basic data that must be collected relate
specifically to the need to estimate the three measurement criteria
(indicators). The WFP template needs to be scrutinized to ensure that only data
that are needed are being collected and that the data collected will allow
accurate estimation of the measurement criteria (indicators).
There are two general approaches to sampling: random
selection, and purposeful selection. Generally, a random sample is used to make
inferences about population characteristics and estimates of population totals,
averages, ratios, etc. Purposeful samples are often used for expediency or to
provide a cost-efficient indication of certain population characteristics, but
will not produce unbiased estimates of population totals, averages, ratios, etc.
For the purposes of GFE/PVO evaluation, a purposeful sample would accommodate
the lower level of resources available for data collection, while providing a
valid measure of change for the first two desired indicators. This is true when
the survey design includes repeated sampling of identical observations to
measure any possible change in population level of the desired indicators.
Random sampling is commonly used because it produces
statistically sound population estimates. But a scientific basis does not
guarantee that a random sampling will produce unbiased, accurate, and precise
estimates. One never knows whether an estimate from a random sample is accurate,
but one can calculate a confidence interval that allows a statement to be made
with regard to the degree one can be confident that the true population value
will fall within a range of values with a certain level of probability. Random
samples will generally be less efficient as the sample size decreases. The
advantage of a purposeful sample is that it will give a statistically defensible
estimate of percentage change when calculated using repeated sampling of matched
observations; i.e., repeat visits to identical schools. If one takes two random
samples at two different periods of time, the ability of results to measure the
true change in the population over the time period between surveys can be
problematic. While each survey will make an independent estimate of the
population characteristic, one does not know for sure whether any difference in
level between the survey estimates is a true population level change or a change
due to the difference in the different sample elements that compose each
independent sample. The strength of the repeat sample to measure population
characteristic changes is based on the strength of its application with
purposeful sampling under GFE.
To help facilitate the effectiveness of using a small sample,
it is essential to consider a strategy to stratify or classify the population (N)
into smaller and more homogeneous sub-populations using a matrix with X&Y
axis criteria to classify each school in the population (N) into one and
only one of the matrix cells. Such a survey design allows making valid
inferences to the percent change with respect to the desired indicators at the
national level as long as the proper weights are all applied to estimates for
each classification criteria (cell in the matrix). The weights are calculated
using the number of schools in each cell.
The following steps will be applied for each country/PVO
project:
- Decide on the classification criteria and assign each of
the total N schools participating in GFE into its appropriate Ni
strata or cells. The number of classification criteria is understandably
important. For example, WFP has deemed it necessary to use two
classification criteria—existing and new schools.
- Select a sample of schools (n). A total of 20
schools have to be selected as the target sample size. As a general rule of
thumb, a minimum sample size per cell is two. A general approach to
allocation of the total samples to cells, given that schools in each cell
are homogenous, would be using a proportional scheme based on the number of
schools in each cell. If the number of samples is sufficient, then a random
sample of ni schools (ni = two minimum)
can be selected from each of i strata or matrix cell. Depending on
the type of classification data available and its quality, the schools in
each strata could be ranked and a small sample size would provide a more
representative, purposeful sample. This decision will to be made on a
country/project-by-country/project basis. The extent to which the purposeful
sample is representative and accuracy of the indicators are both contingent
on careful selection of ni schools and proper weighting of
the summarized data.
- Tailor the form template for each country to collect the
appropriate data needed and available for the baseline survey calculations
for the ni sample schools. The decision to collect four
months of data for specific data items was a decision by WFP to best
estimate the baseline from which to measure future change, or measure the
effectiveness of the program. The GFE/PVO methodology is to collect data for
measurement of the baseline (first survey), and to resurvey the identically
sampled schools and collect corresponding data (follow-up survey). WFP
suggested selecting four months during the school year that reflected
seasonal trends in attendance to best estimate the baseline. Likewise, those
same four months of data will be collected during the school year under the
program to allow accurate measurement of the feeding programs’ effect on
the education program.
WFP also qualifies the baseline survey to encompass the last
complete academic year. Traumatic effects in the country anytime during that
last complete academic year can cause participation in the educational program
to be uncharacteristic or atypical for that year and can cause problems with
analysis and interpretation of the resulting indicators. The same holds true for
traumatic events that might occur during the feeding program academic year.
Collecting data for more than one prior full academic year for baseline purposes
was discussed by WFP as a solution to tempering the effects that traumatic
events can have on indicator analysis.
VII. GFE Project Matrix Construction and Sampling
With the onset of project implementation, the regional
coordinators investigated the conditions in GFE participating countries and
other factors that might impact project effectiveness to develop a sampling
matrix for unique classification of each participating school. Since each
participating countries’ project is unique and operating under different
conditions, the matrices should be tailored differently to meet each country’s
specific conditions and project needs. Generally, the regional coordinators
tailored each matrix to obtain as much information about the schools’ program
and operational characteristics as could be obtained from a sample of twenty
schools. Due to the limited number of samples, it was important to reduce the
number of identified factors to an absolute minimum to maintain the number of
matrix cells at a reasonable number (10 or fewer).
To illustrate the use of the matrix for sampling purposes, the
following two examples will detail the process used by the regional coordinators
to first create the matrix and then select the sample of schools:
1. The first example is Bosnia, where a complex set of factors
was considered in the country for matrix creation. Due to recent armed conflict,
one of the most important considerations was social vulnerability, which could
potentially affect program implementation. Similarly, whether schools were rural
or urban affects the ability of the PVO to effectively execute its food feeding
activities. Third, whether participating schools had a parent-teacher
Association (PTA) was deemed an extremely important factor in the school’s
ability to execute and support its programs. These three major factors were
considered important at the onset when little information was readily available.
The whole purpose of the matrix is to ensure that representative data will be
collected from the small sample to determine the project’s effectiveness.
The matrix used for sampling the Bosnia/Catholic Relief
Services (CRS) program schools is below. Within each cell of the matrix are two
numbers. The first number is the population of schools or total number in the
CRS feeding program classified with that cell’s characteristics. The second
number is the number of sample schools selected from the total population for
that cell. In all cases, some manner of random selection was used by the
regional coordinators to select the actual sample schools from each cell.
Sample for Bosnia/CRS Project Schools