Introduction to impact evaluation

Quantifying and explaining the effects of interventions is at the heart of the evaluation of socio-economic development programmes.  For policy makers to make informed decisions, it is important to understand what works or what does not, as well as why, for whom and in which contexts.  This is a formidable list of questions, and the available analytical methods provide at best tentative and incomplete answers to most of them.  Thus it is of fundamental importance to clarify which methods can answer which questions, under which circumstances.

 

Two distinct sets of questions (and methods)

Two conceptually distinct sets of questions tend to emerge when it comes to assessing the effects of public policies: one deals primarily with the quantification of effects, the other with their explanation.  Accordingly, the sections of Sourcebook 2 dealing with impact evaluation methods are organized along the following lines:

•        Methods primarily devoted to establishing whether a given intervention produces the desired effects on some pre-established dimension of interest.  The overarching goal is to answer a “does it make a difference?” question by identifying and estimating causal effects through counterfactual methods.

•        Methods primarily devoted to understanding why an intervention produces intended and unintended effects, for whom and in which context. The goal is to answer the “why it works?” question by identifying the theory of change behind the programme and assessing its success by comparing theory with actual implementation.

We want to stress the term “primarily”.  Identifying and estimating causal effects requires some theory, while comparing theory and implementation requires some quantification. However, these remain two distinct questions.  It would be counterproductive, at this stage of the development and utilization of these methods, to force a synthesis between the two sets of questions and related methods. 

 

Claims of cognitive superiority vs. intellectual honesty

Clear cut separation should help prevent antagonism, which is rife when proponents of alternative methods vie for the attention of the same policy makers and compete for the same resources.  Claims of the alleged intellectual superiority of a set of methods over the other is the most deleterious manifestation of such rivalry and should be discouraged by openly rewarding the opposite attitude: the intellectually honest admission of the drawbacks, limitations and pitfalls of the analytical tools each side is able to deploy in answering questions about the what and why of the effectiveness of policy.  Rhetorical claims of cognitive superiority should be left to the bygone era of the fruitless “paradigm wars”. What the two camps mostly have in common is how little they truly understand about the effects of public policies.

While they should be kept separate methodologically, policymakers should use the results of both sets of methods as they see fit: “Even assuming that the counterfactual methods proved that a certain intervention worked and could even put a number on this, this is still a finding about one intervention under certain circumstances. We will need our more qualitative, "traditional" evaluation techniques to understand to which interventions these findings can be transferred and what determines the degree of transferability” (Stryczynski, 2009). Joint utilization is up to the user of the information, but it does not imply joint production.

 

Counterfactual impact evaluation (CIE) vs. Theory-based impact evaluation (TBIE)

The central question of CIE is rather narrow—how much difference does a treatment make—and produces answers that are typically numbers,  or more often differences, to which it is plausible to give a causal interpretation based on empirical evidence and some assumptions.  Is the difference observed in the outcome after the implementation of the intervention caused by the intervention itself, or by something else?  Answering this question in a credible way is nevertheless a very challenging task.

The CIE approach to evaluation is useful for many policy decisions, because: (i) it gives easily interpretable information; (ii) it is an essential ingredient for cost-benefit and cost-effectiveness calculations; (iii) it can be broken down into separate numbers for subgroups, provided that the subgroups are defined in advance. 

Howard White (2009), an advocate of TBIE, recognizes the importance of the following aspects of CIE: “Criticisms of reporting an average treatment effect should not be overstated. Heterogeneity matters, as does understanding the context in which a particular impact has occurred. But it will rarely be the case that the average treatment effect (usually both the treatment of the treated and the intention to treat) is not of interest. Indeed it is very likely to be the main parameter of interest. It would be misleading to report significance, or not, a particular sub-group if the average treatment effect had the opposite sign. Moreover the average treatment effect is the basis for cost effectiveness calculations”.

To sum up, “how much difference does a treatment make” is an important, relevant, methodologically sound evaluation question.  Yet it remains extremely challenging to answer, as the chapters on the various CIE approaches will openly document.  But it is certainly not the only question.

The importance of TBIE stems from the fact that a great deal of other information, besides quantifiable causal effect, is useful to policy makers to make decisions and to be accountable to citizens.  The  question of why a set of interventions produces effects, intended as well as unintended, for whom and in which context, is as relevant, important, and equally challenging, if not more, than the “made a difference” question.

This approach does not produce a number, it produces a narrative.  Thus it cannot be used for cost-benefit calculations, it is not communicated as quickly and schematically,  and it is not backed by a comparable set of statistical tools.  Thus it appears to some observers less scientific, less “objective”.  But it can provide a precious and rare commodity, insights into why things work, or don’t.  Above all, it is  based on the very powerful idea that the essential ingredient is not a counterfactual (“how things would have been without”) rather a theory of change (“how things should logically work to produce the desired change”).  The centrality of the theory of change justifies calling this approach theory-based impact evaluation.

 

Attribution vs. contribution

According to the Guide, causal questions are those that “strive to understand and assess relations of cause and effect (how and to what extent is what occurred attributable to the programme?)”.  Thus, this notion of causality is centred on the idea of “attribution”. Causal questions related to the attribution of programme impacts appear frequently in the context of socio-economic development policy.  For example, does aid to small and medium enterprises increase their survival or alter their hiring practices?  Does investment in a new transport infrastructure eliminate bottlenecks and reduce travelling times?  The ultimate objective in asking these questions is to learn whether the intervention works; which interventions produce the desired effect? Or, as seen from a different perspective, to what extent are the observed changes truly caused by the intervention?

In TBIE, causality is often declined as a problem of contribution, not attribution.  Often quoted is causal contribution analysis (Mayne, 2001; Leeuw, 2003) which aims to demonstrate whether or not the evaluated intervention is one of the causes of observed change. Contribution analysis relies upon chains of logical arguments that are verified through a careful field work. Rigour in causal contribution analysis involves systematically identifying and investigating alternative explanations for observed impacts.[1] This includes being able to rule out implementation failure as an explanation of lack of results, and developing testable hypotheses and predictions to identify the conditions under which interventions contribute to specific impacts.

 

It is not “complexity” driving the difference...

A common perception is that the “counterfactual impact evaluation” is suited for “simple” intervention, while “theory-based impact evaluation” is necessary for complex intervention.  The following citation is emblematic of this position.  “There is today more than ever a ‘continuum’ of interventions. At one end of the continuum are relatively simple projects characterized by single ‘strand’ initiatives with explicit objectives, carried out within a relatively short timeframe, where interventions can be isolated, manipulated and measured. For these types of interventions, experimental and quasi-experimental designs may be appropriate for assessing causal relationships, along with attention to the other tasks of impact evaluation. At the other end of the continuum are comprehensive programs with an extensive range and scope (increasingly at country, regional or global level), with a variety of activities that cut across sectors, themes and geographic areas, and emergent specific activities.” (Leeuw and Vaessen, 2009). 

This “division of labour” is by and large a misconception.  The “why it works” question  is relevant also for relatively simple projects characterized by single ‘strand’ initiatives with explicit objectives. Actually, the “why it works” question might stand a better chance of finding an answer in these situations than in comprehensive programmes with an extensive range and scope, with a variety of activities that cut across sectors, themes and geographic areas. The very idea that complex situations are easily understood by complex methods is simply wrong:  complexity is a problem for all. 

 

...it is rather the disciplines lurking behind

The CIE approach to evaluation is backed by a formidable stock of methodological tools. The statistical/econometric/epidemiological community has produced in the last three decades a rather sophisticated conceptual apparatus to deal with causal inference: the potential outcome or counterfactual approach. Quantifying effects requires establishing a counterfactual. That is to say, to reconstruct what would have happened in the absence of the intervention. This apparently simple idea turns out to be very powerful.

A section of Sourcebook 2 is devoted to methods needed to answer this type of questions. The logic of causal explanation adopted by these methods is referred to as “counterfactual logic”.  Its centerpiece is the notion of causal effect of an intervention, defined as the difference between the outcome observed after an intervention has taken place, and the outcome that would have occurred in the absence of the intervention: the latter is not observed and must be recovered from other data.

On the other hand, the field of theory-based impact evaluation is not lacking in the number of proposed methods. The literature on TBIE methodology is riddled with labels representing different (and sometimes not so different) methodological approaches. TBIE is backed by a vast array of qualitative, naturalistic, participatory, hermeneutic methods. However, these have not developed into a powerful and validated set of tools the CIE can draw upon.

Perhaps the most visible approach is Realist evaluation (Pawson and Tilley 1997; Pawson, 2002) that has spent a considerable amount of energy stressing the epistemological differences from CIE, proposing a different understanding of causality, based on a “generative” notion  centred on the identification of causal mechanisms, rather than a mere “successionist” view, typical of the counterfactual approach. The basic idea of Realist evaluation is that different contexts may yield different reactions to the same intervention, and putting in place alternative mechanisms may produce different results.

Another example is the GTZ Impact Model, developed by the International Fund for Agricultural Development (IFAD), a specialized agency of the United Nations, which shows an ‘attribution gap’ between the direct benefits (which can be demonstrated through project level monitoring and evaluation) and the indirect, longer term development results (observed changes) of the intervention. Impact pathway evaluation represents a set of hypotheses about what needs to happen for the intervention outputs to be transformed, over time, into impact on highly aggregated development indicators.

Finally, participatory evaluation approaches are built on the principle that stakeholders should be involved in some or all stages of the evaluation. In the case of impact evaluation this includes aspects such as the determination of objectives, indicators to be taken into account, as well as stakeholder participation in data collection and analysis.

 

With counterfactual and theory-based impact evaluation,where do “impact indicators” fit in?

EU regional policy has in the last 10-15 years brought the idea of evidence based policy making and evaluation to quite a wide audience.  Regions, Member States and evaluators spend a good deal of their time trying to set quantified objectives at the beginning of consecutive programming periods and reporting against them annually. For the ERDF alone, Member States used some 21,000 indicators in the 2000-06 programming period, measuring "outputs, results and impacts".  However, the notion of impact used in such massive reporting system is different than the one underlying either Counterfactual or Theory-based impact evaluation. The difference is primarily one of scale.

Evaluation of Cohesion Policy has tended to be at the programme level. Such large-scale, complex programmes are not amenable to impact evaluation, under either notion, and traditional descriptive evaluations do not provide the scope to examine in detail and generate the evidence on what works and why.  Cohesion policy programmes typically include a high number of interventions of very different nature and reach a very substantial financial scale. It seems virtually impossible to apply counterfactual methods to programmes as a whole. So it is necessary to decompose the programmes down to their operational instruments ("measure level" or even below). Then the sheer number of individual actions would force concentration on some selected parts of programmes and methods applied to these selected parts.  The new approach for the 2007-2013 funding period -"ongoing evaluation" - presents an opportunity for Member States to evaluate impact at the level of interventions, using CIE and TBIE.

Nevertheless, good management needs good reporting, good accounting. Policy makers do want to know how what has been produced (e.g. how many kilometres of roads we built) with Structural Funds money. And creating "simple" reporting based on output is a real challenge in an EU with 27 Member States, more than 400 programmes, a vast area of eligible actions and the absence of common definitions (Stryczynski, 2009).

If this line of argument is acceptable for output and result indicators, because they need to be constantly aggregated at higher levels, a different argument must be conducted for “impact” indicators.  The notion of impact attached to impact indicator is a misnomer.  They should be called target indicators, because their purpose is to indicate along which dimension the policy is expected to make progress.  The progress towards objectives is a crucial piece of information.  But it does not have any casual interpretation, nor is it particularly amenable to the in-depth inquiry of the theory-based impact evaluation.

 

Selected references

Leeuw F. [2003], Reconstructing Program Theories: Methods Available and Problems to be Solved, in «American Journal of Evaluation», n. 24(1), pp. 5-20.

Leeuw F., Vaessen J. [2009], Impact Evaluations and Development: NONIE guidance on impact evaluation, Network of Networks on Impact Evaluation (NONIE).

Mayne J. [2001], Addressing Attribution through Contribution Analysis: Using Performance Measures Sensibly, in «Canadian Journal of Program Evaluation», n. 16(1), pp. 1-24.

Pawson R. [2002], Evidence-based Policy: The Promise of ‘Realist Synthesis’, in «Evaluation», n. 8(3), pp. 340-358.

Pawson R. and Tilley N. [1997], Realistic Evaluation, Sage Publications, Thousand Oaks, CA.

Stryczynski K. [2009], Rigorous impact evaluation: the magic bullet for evaluation of Cohesion Policy?, European Commission, Bruxelles.

White H. [2009], Theory-Based Impact Evaluation: Principles And Practice, Working Paper n. 3, International Initiative for Impact Evaluation, New Delhi.

[1]A radical point of departure between the two approaches is the very concept of “observed impacts”.  CIE contends that impacts are not observable, being the difference between something observable and a hypothetical state.  TBIE, somehow less clearly, contends that impacts can be observed.

Last update: 24/11/2009 | Top