Skip Navigation
 
Home | About CDC | Media Relations | A-Z Index | Contact Us
   
Centers for Disease Control & Prevention
CDC en Español 
Sexually Transmitted Diseases
Sexually Transmitted Diseases  >  Program Guidelines  >  Program Evaluation

Program EvaluationProgram Operations Guidelines for STD Prevention
Program Evaluation

Sections on this page:

TYPES OF EVALUATION

The evolution of evaluation research has led to a proliferation of evaluation methods and approaches, each of which has its role depending on what is being evaluated. Evaluations reflect many different scientific philosophies (Alkin, 1990). Because all programs have a set of guiding principles concerning how they should operate and how they achieve their outcomes, all interventions have a "program theory". However, what is often lacking is an explicit recognition of the program theory (Weiss, 1997; McClintock, 1990). The explicit recognition of the theory behind STD prevention and control activities is often overlooked in daily operations because many have been used routinely for many years. However, it may be necessary for managers to think about and identify the scientific and rational reasons why activities are conducted. STD prevention programs should be based on scientific evidence and evaluation should examine how the program in practice differs from the program in theory.

Evaluation activities should also follow the program's developmental stages. In general, there is a natural developmental sequence that intervention programs follow, and the evaluation activities should match the development level of the intervention appropriately. The program stage will determine the level of effort and the methods to be used. See Appendix C for uses and types of evaluation.

Formative Evaluation

When new programs, new interventions, new procedures, or new elements of existing programs are proposed, formative evaluation is indicated. Formative evaluations in the pre-implementation and design phase of a project emphasize needs assessment, and their data gathering may involve extensive community analysis or community identification procedures in addition to inquiry into a program setting and existing clientele. Formative evaluations are designed to help identify needs or gaps in service which the new program should address or to answer other questions that need to be answered (e.g., What is the most efficient way to recruit participants?, What types of program activities are desired?, What are consumer preferences for different STD test procedures?) (Wylle, 1992; Tessmer, 1994).

EXAMPLE: It is assumed that female clients would prefer urine based testing over those involving pelvic examinations; however, until formative information is obtained, program planners may not be initiating an intervention most acceptable/desired by the clients.

Thus, formative evaluations are conducted to collect data which provide information about the intervention that is being delivered. It is not just process information (how many tests will be done), but also how the clients react or respond to the intervention.

EXAMPLE: Whether female clients prefer a urine test because it is quick and does not require being undressed and undergoing an invasive procedure, or whether they prefer to feel that their "test" is more complete because a pelvic examination or pap smear was done.

Evaluability Assessment

When the evaluation of existing programs is desired, an evaluability assessment should be conducted. An evaluability assessment will determine to what extent an evaluation is possible (Smith, 1989, 1990; Smith 1981; Fisher, 1982). In conducting an evaluability assessment, the evaluator must be able to clarify program goals and objectives, determine the extent to which the goals and objectives can be achieved, determine what data are available or could be collected to assess program activities, determine the program performance measures and if they can be gathered at a reasonable cost, and explain how the results will be used. In addition, they should be able to identify the programmatic activities responsible for bringing about the intended results (Wholey, 1994). If the program cannot be adequately described in this way, program mangers should focus on gathering the appropriate information and clarifying goals and objectives before any other evaluation tasks are undertaken.

EXAMPLE: The STD prevention program has obtained the assistance of outside experts to evaluate its efforts to increase screening in adolescent females in managed care settings. However, on examination, the evaluators learn that the program's stated objective was to "educate providers in all managed care settings on the need for screening." Further, they learn that the program did not specify the number of providers targeted, the number who received training, the type and extent of managed care settings, the number of adolescent females being screened before the intervention and had no way to ascertain the number now screened. Thus, this aspect of the program cannot be evaluated unless additional data are gathered.

Recommendations

  • A formative evaluation should be conducted when a new intervention or program is undertaken or when a different way of conducting an intervention is developed.
  • An evaluability assessment should be conducted when planning an evaluation of any portion of an existing program.

Process Evaluation

As programs develop there is a need to assess how well the implementation of the program is going and, if needed, to make corrections. In these stages, there are many evaluation questions that could be asked, all having to do with program monitoring and evaluation activities related to this problem. Answering these questions involves process evaluation. Process evaluations include documenting actual program functioning (Dehar, 1993; Finnegan, 1989), measuring exposure to and diffusion of the interventions (Fortmann, 1982; Hausman, 1992; Steckler, 1992), and identifying barriers to implementation (Demers, 1992). Process evaluation includes the identification of the target population, a description of the services delivered, the use of resources, and the qualifications and experiences of the personnel participating in them (NIDA, 1991). It involves determining what services were actually delivered, to whom, and with what level of resources.

EXAMPLE: Process evaluation of the effort to increase screening in adolescent females would include, at a minimum, the number of adolescent females in the population and the number screened before and as a part of the intervention, the tests used, and a description of the providers.

Documenting program functioning is important for two reasons. If the program is working well, there will be interest in replicating the program in other locations that serve similar or other populations. If the program is not working well, it is of tremendous use to know exactly how the program failed, in which component, and in what population (Chen, 1990).

EXAMPLE: Program A conducted a formative evaluation and determined that female clients really do prefer urine based screening for chlamydia and  based on prevalence data, a plan was developed to test 90% of the target population. However, the process information showed that half way through the intervention period, only 10% of the women had been tested. Instead of concluding that the effort was a failure, additional qualitative information was gathered which showed the drop off point for urine specimens was too public and women felt embarrassed at leaving urine specimens where everyone could see. There are program monitoring tasks which must also be conducted before an outcome or impact evaluation can take place. Program monitoring tasks are concerned with documenting actual program functioning. Several major questions posed in this evaluation component are:

• Which elements of the program actually have been implemented?

Usually the practical problem here is that there are no data readily available to answer the question. When that occurs, the "answer" may be a guess rather than supported by evidence.

Example: One of the program's surveillance objectives is for all laboratories in the area to report all positive syphilis serologies within a specified time. Unless the program staff can document how many laboratories there are, how many do serologies, and how often results are reported, that aspect of the program can only be estimated.

• What are the types and volume of treatments or services actually provided to clients?

This question is important to answer both for accountability purposes and also to assist in the development of an outcome evaluation subsequent to program implementation.

Example: If the program is concerned with preventing congenital syphilis, it is not only necessary to have laboratory data on syphilis serologies, but is also necessary to know how many pregnant women there are, how many receive testing for syphilis and at what stage of pregnancy.

• What are the characteristics of program participants?

It is important to determine if the recipients of program services resemble the intended "target group" as identified in the program design and development stage. An effective intervention administered to a non-target group may be just as useless as an ineffective intervention administered to a targeted group.

EXAMPLE: If the STD prevention program has determined that most congenital syphilis cases have occurred in newborns of adolescents, but syphilis testing occurs mostly in adult women with private insurance, then the target population is not being reached.

Program monitoring can function as quality assurance of activities. Managers and staff should develop tools to ensure that the daily operations are functioning as they should. Corrections are more easily made when detected early and are less likely to create long-term, large scale damage to program progress.

EXAMPLES: Program monitoring may include chart reviews, direct observation of interviews and counseling sessions, routine analysis of laboratory reporting, and analysis of screening procedures and results.

An increased focus on accountability by funding sources has also increased requirements for evidence that a program is delivering what was paid for. Regular feedback from monitoring can be one of the most powerful tools a program manager has for documenting the operational effectiveness of a project, justifying staff, defending the continued existence of the program and even requesting additional support.

Finally, the information gained through program monitoring is necessary to determine which (if any) aspect of the program is appropriate for impact evaluation. The reason for this should be obvious, but it's often overlooked in the rush to evaluate program impact: programs (or components of programs) that don't exist or don't exist as intended should not be evaluated for impact (Rossi, 1998).

Outcome Evaluation

When process evaluation shows that the program was implemented properly, there is often interest in measuring the effectiveness of the actual program (Mohr, 1995). Outcome evaluation is concerned with the end result of STD prevention and interventions that have an effect on the health of populations. Criteria for using outcomes for evaluation include: (1) being objective, in that outcomes can be observed; (2) being measurable in ways that are reliable and valid; (3) being attributable to the intervention delivered; and (4) being sensitive to the degree of change expected by the intervention. For STD prevention programs there are a number of different outcomes that can be measured: biological, behavioral, cognitive, economic, and health status. The ultimate outcome is a change in morbidity or mortality. Because the expertise and time commitments to conduct outcome evaluations are often not available to STD prevention programs within health departments, such evaluations may be done by outside evaluators. Outcome evaluation typically requires some understanding of research design. [Key points are discussed below so that managers can work effectively with evaluators.]

In some cases, it may be relevant to consider outcomes that are not directly measurable (for example, some of the sequelae of PID typically occur years after the initial chlamydial or gonorrheal infection). Such outcomes may still be worthwhile to consider, especially for purposes of economic evaluations. In such cases, it may be advisable to use estimates from published literature of the rates at which outcomes occur and vary the rates over a reasonable range (as an example, PID is estimated to occur in 10% - 40% of untreated gonorrheal and chlamydial infections; the effectiveness of the program in preventing PID could be assessed at each end of that range, plus some figure in the middle, such as 20% or 25%). This is a technique known as sensitivity analysis, and can also be used with figures that are known and measurable to etermine how program performance may be affected  if circumstances change. For example, an on-site syphilis screening program may not be justifiable given the current rate of positive tests, but might be worthwhile to conduct if syphilis incidence increased from current levels (Haddix, 1996; Gold, 1996).

The use of the terms outcome and impact have been used in conflicting ways in the past. However, one useful description of the definition of each is: outcome evaluation is the measuring of the effectiveness of an intervention on the target population, whereas impact evaluation attempts to measure the total effect of a prevention program on the community as a whole (NIDA, 1991). In this document we will use the term "outcome".

"Outcome" implies measures of effectiveness of an actual program. To assess outcomes, it is first essential to define in specific quantitative terms what the intended program effect is. To carry out a credible assessment of outcomes, it is then essential to design a scientific study, as rigorous and systematic as resources allow.

Defining Program Effect

To define the effect of the program it is necessary to define measurable goals. This is often difficult or impossible because theoretical goals of the program must be connected to empirical, measurable indicators in the real world. Programs without measurable goals cannot be rigorously evaluated.

Designing the Study

Designing an appropriate outcome or impact study is complicated; evaluators must overcome the challenges of building into the evaluation plan the ability to unambiguously infer that, if there is a change recorded in outcomes measurements, the change is due to the actions of the program and not to other external or internal influences.

External influences, often called confounders, are a potential explanation for program outcome. If the design is not well developed, it is often easy to jump to inaccurate conclusions (i.e., the intervention had an effect, when in reality there is little or no correlation.)

EXAMPLE: Reduced STD morbidity might actually be due to the effects of increased screening and treatment programs or education. Internal influences also need to be considered. For instance, peer counseling programs may purposely or inadvertently recruit adolescents who are already motivated to change behaviors. When testing for STDs in this group shows a lower prevalence than in similar adolescents, the results probably show an unrealistically high estimate of program efficacy compared to the adolescents who did not volunteer for the program.

Randomized Trials

The evaluation design that is considered to produce the strongest evidence that a program intervention or activity contributed to change is the randomized control trial (RCT). The rationale for this design is well established. In brief, the essence of a randomized trial lies in the random assignment of subjects to be exposed to the intervention or to be a control (not exposed to the intervention). By using the rule of chance, intervention and control groups are, on average, comparable before exposure. Because of this initial equivalence, if outcome differences between those who do and do not receive the intervention are statistically detected, they are highly likely to be due to the operation or processes of the intervention.

Quasi-experiments

Although randomized trials provide the strongest evidence about a program's effectiveness they may not be feasible to implement. RCTs are costly, time consuming, can be subject to methodological flaws, and may not be considered ethical to conduct if withholding an intervention from one group may adversely affect opportunities for improved health status. Thus, evaluators turn to the analysis of quasi-experiments, defined generally as any research design that does not utilize random assignment to deliberately construct an initial equivalence between groups. Quasi-experimental designs use a control group which is separate from the experimental group and not randomized. When randomized trials are not possible and quasi-experiments are substituted in their place, specific design features usually have to be instituted to rule out or eliminate each alternate explanation to the hypothesis of treatment effects.

Economic Evaluation

Economic evaluation considers both the outcomes of a program and the cost of producing those outcomes. In some cases, the most effective program may also have the lowest cost, but it is not necessarily true that the lowest-cost option is the most cost effective. It is also possible that the program that produces the most units of a given outcome may be impractical to implement because it is so costly that it diverts too many resources from other uses, or requires more resources than are available. An example is provided at the end of this subsection.

To conduct an economic evaluation, it is necessary to know what resources are used in a program, and what these resources cost. In some cases, the costs are not direct (i.e., they don't have to be paid), but indirect (such as an opportunity cost, which is the cost of using a resource in a given program that could be used elsewhere). This process involves measuring or estimating the value of facilities, equipment, personnel, and other resources used. Sometimes patient time commitments and travel costs are relevant, as well (Drummond, 1987). Adequately determining appropriate costs can be difficult, and should not be undertaken without the help of someone familiar with economic analyses (Rossi, 1998).

What costs are included in the analysis will depend upon the perspective chosen. The perspective of an analysis determines which costs are considered. The broadest perspective is societal, which includes all costs borne by all parts of society, including local programs, the health care system as a whole, and patients. More limited perspectives are also often used which do not consider the costs borne by some groups in the economy.

EXAMPLE: Client travel costs and time costs for clinic visits would not be relevant from a health care system perspective because the health care system does not pay for them, but would be for a societal perspective analysis which includes all costs. The perspective should be appropriate for the particular issue being analyzed (Haddix, 1996).

Types of Economic Evaluation

Cost Analysis

The simplest form of economic evaluation is a cost analysis. Because it considers only the costs, however, it is a partial economic evaluation (Drummond, 1987). To conduct a cost analysis the costs of a program must be determined, making sure to collect all relevant costs for the perspective being used (Haddix, 1996).

EXAMPLE: The STD prevention program might determine the cost of screening for chlamydia in family planning clinics, or the cost to follow women who tested positive for chlamydia in a private medical facility to get them treated.

It is important to conduct cost analyses when appropriate. However, at a minimum, the state/local health department should calculate the cost per service unit for each of its major prevention programs (the 'service unit' will depend on the program; for example, in an STD clinic, costs could be expressed as dollars per patient visit; dollars per gonorrhea, syphilis, or chlamydia test; or dollars per infection identified and treated).

Once costs are determined, there are three common methods used for comparing the costs and consequences of different interventions: cost effectiveness, cost-utility, and cost-benefit analysis.

Cost Effectiveness Analysis (CEA)

CEA divides the net cost of a program by the outcomes produced by the program. The outcomes chosen are generally the health effects targeted by the program, such as cases of disease prevented or lives saved. The result will be expressed as the net cost per unit of outcome.

EXAMPLE: In comparing programs that promote the detection of chlamydia, the unit of measure for the CEA might be "cases of PID averted".

This differs from the per-unit cost analysis presented in the previous section in that the cost savings associated with the adverse outcomes averted or with the desirable outcomes produced are incorporated into the net cost. This is the most commonly used type of economic analysis in the health field, and is especially well-suited to comparing different interventions or programs that share the same outcome (Haddix, 1996). The interventions can be ranked in order of increasing effectiveness, and the cost effects of moving from one intervention to the next most effective one can be easily determined. It is less effective in comparing interventions that produce different outcomes, because it does not provide a common outcome measure.

EXAMPLE: CEA would be more helpful in comparing two chlamydia screening programs than in comparing a chlamydia screening program with a cancer prevention program.

Cost-Utility Analysis (CUA)

CUA is similar to CEA, except that the program outcomes are measured in common terms across interventions, most commonly quality-adjusted life years (QALY) (Haddix, 1996; Farnham, 1996). With this approach, interventions that produce different outcomes (such as chlamydia prevention and cancer prevention) can be compared -- the different outcomes are translated into QALYs; it is then theoretically possible to determine the most efficient use of resources to produce the maximum amount of health. However, actually determining the QALYs gained by preventing a case of infection is not a straightforward task. QALY measures for STD outcomes are not well-developed. In other programs, CUA is most commonly used in programs with significant non-health benefits and is often used to determine whether to fund the program or not (Farnham, 1996).

Cost-Benefit Analysis (CBA)

CBA is also similar to CEA, except that it places a monetary value on the outcomes of programs. In the above example of CEA, the monetary value per case of PID averted would be determined and factored into the net cost. In theory, this is the broadest form of analysis because it can be determined whether the benefits of a program justify its costs. However, in practice it is also limited to a comparison of those specific costs and benefits that can easily be expressed in terms of money (Drummond, 1987). Cost benefit analysis often presents controversial questions, such as, "What is the value of saving a life?" or, "Is the life of an older person worth as much as the life of a younger person?" Determining the answers to these questions is not straightforward, and no clear consensus methodology has emerged. Because of these difficulties, CEA and CUA are more often used in health programs (Farnham, 1996).

EXAMPLE: The following example provides hypothetical data to illustrate the concepts presented above regarding economic analyses.

A program manager wishes to evaluate the gonorrhea screening program at one of the program's clinics. All women under 25 years of age are routinely tested when they present to the clinic. After collecting the costs for staffing, supplies, testing equipment, and clinic overhead, it is determined that the program currently costs $50,000 per year. This is the cost analysis of the program. It is further determined that 2,500 tests are performed each year, for a cost of $20 per test. The screening program leads to the detection and treatment of 50 cases of gonorrhea, and is estimated to prevent 10 cases of PID per year.

The manager wants to compare the effects of two possible alternatives to the screening program which routinely tests all women under 25 years of age: selective screening based upon a risk assessment, and expanded universal screening for all women under 35 years old. This is a cost effectiveness analysis (CEA). After adding up the costs and subtracting the savings from the cases of PID averted, the costs and outcomes of the three alternatives are:

Testing Approach Net Cost Cases of PID Prevented Cost per Case Prevented
Risk Assessment $30,000 9 $3,333
Test All < 25 Years 40,000 10 4,000
Test All < 35 Years 66,000 12 5,500

Which program is "best" will be partially determined by the resources that the health department can devote to screening. The risk assessment approach is not necessarily the most cost effective, despite having the lowest net cost. It also prevents the fewest cases of PID. Similarly, while testing all women less than 35 years old prevents the most cases of PID, the cost per case prevented is highest and may require a level of funding that is unavailable. If the net cost of testing all women under 35 years old had the lowest net cost, it would unequivocally be the most cost effective, because it would represent the lowest cost program that also prevented the largest number of cases of PID. Even when CEA does not provide a clear-cut best choice it gives policy makers information that can help them make resource allocation decisions.

Recommendation

  • At a minimum, programs should calculate the cost per service unit for each of its major prevention programs.

Measures

Issues pertaining to data collection and measurement are relevant for all levels of evaluation. The selection of indicators, instruments, measures, and data sources depends largely on the purpose and state of the program. One of the most basic considerations in selecting indicators is that they reflect the central goals and objectives of the program (NIDA, 1991). There are alternate methods that can be used to gather data for specific indicators, which vary in reliability, validity, depth, and cost. The method should reflect the priority being given to the indicator and the resources available for the evaluation. For example, if the process by which adolescents are believed to have access to health care needs to be thoroughly analyzed, the question may be approached by means of focus groups or in-depth individual interviews, a high cost approach because of the use of highly trained moderators or interviewers. However if only a cursory picture is required, then a few questions in a process evaluation might be sufficient (NIDA, 1991).

Evaluation activities in recent years have gone beyond basic budget and staff monitoring to count program outputs, such as services delivered to clients. Measurement of some outputs, such as counting the number of women screened and treated for chlamydia, captures the intended result of the program (Newcomer, 1997). Assessment of service delivery at the local level is not new, but linking the measures or indicators to program mission, setting performance targets, and regularly reporting on the achievement of target levels are relatively new features in performance measurement (Newcomer, 1997). The website www.cdc.gov/nchstp/dstd/hedis.htm includes information on the HEDIS chlamydia measure and software developed to evaluate resource allocations.

Performance measurement is an inclusive term that may refer to the routine measurement of program inputs, outputs, intermediate results, or eventual outcomes (Newcomer, 1997). Performance measurement "consists of the systematic description and judgment of programs and, to the extent feasible, systematic assessment of the extent to which they have the intended results" (Wholey, 1994; Newcomer, 1997).

In-depth program evaluations are usually done by organizations such as contractors and universities, while performance measurement is often done by the programs. The ability to truly measure outcomes is limited. For instance, it is very difficult to measure the prevention of congenital syphilis in a population; therefore, many agencies and programs measure non-prevention as a substitute, e.g. number of pregnant women screened for syphilis (Hatry, 1997).

Beginning October 1, 1997, Federal departments and agencies were required to prepare strategic plans which were forwarded to the Office of Management and Budget, to the President, and on to Congress. These plans are a part of the Government Performance and Results Act (GPRA) of 1993. GPRA compares actual performance with the goal levels that were set by the agency's annual performance plans. The goal levels set by CDC, and the achievement thereof, are in part dependent on the achievement of state and local STD prevention program goals and objectives.

 



Page last modified: August 16, 2007
Page last reviewed: August 16, 2007 Historical Document

Content Source: Division of STD Prevention, National Center for HIV/AIDS, Viral Hepatitis, STD, and TB Prevention