THE LOGIC OF COUNTERFACTUAL IMPACT EVALUATION
The Introduction to Impact Evaluation identified two separate sets of questions, one dealing primarily with quantification of effects, the other with their explanation. The first relying on counterfactual methods, the second on theory-based methods. In this section we deal exclusively with the first set of methods, devoted to quantifying whether a given intervention produces the desired effects on some pre-established dimension of interest.
Questions related to the sign and magnitude of programme impacts arise frequently in the evaluation of socio-economic development programmes. Do R&D subsidies increase the level of R&D expenditure by subsidized firms? Do targeted ERDF funds increase per capita income of the assisted areas? Do urban renewal programmes contribute to the economic development of urban neighbourhoods? Does support to SMEs increase their employment levels? Does investment in new public infrastructure increase housing values?
In other words, the evaluation problem has to do with the “attribution” of the change observed to the intervention that has been implemented. Is the change due to the policy or would it have occurred anyway? Answering these questions is not as straightforward as it might seem. The challenge for quantifying effect is finding a credible approximation to what would have occurred in the absence of the intervention, and to compare it with what actually happened. The difference is the estimated effect, or impact, of the intervention, on the particular outcome of interest (be it per capita GDP, R&D expenditure, housing values or employment levels).
Effects, impacts, and counterfactuals
A notation on terminology is necessary. Unlike in other evaluation settings, here impacts and effects are perfect synonyms. There is truly no meaningful difference between the two terms, they both refer to the notion of “causal effect”, the difference between the outcome occurred after an intervention has taken place and the outcome that would have occurred in the absence of the intervention. The popular distinction between “effects” as immediate results and “impacts” as long-run, or wider, effects, has no meaning on this context.
The counterfactual situation is purely hypothetical, thus can never be directly observed. For the same reason, an effect can never be directly observed, nor can an impact (impact indicators notwithstanding). By contrast, effects and impacts can be inferred, as long as the available data allows a credible way to approximate the counterfactual.
There are two basic ways to approximate the counterfactual: (i) using the outcome observed for non-beneficiaries; or (ii) using the outcome observed for beneficiaries before they are exposed to the intervention. However, caution must be used in interpreting these differences as the “effect” of the intervention.
The use of these basic comparisons is often invoked in the Guide. For example on page 31 “the provision of support to companies to invest in new equipments could be evaluated by tracking the performance of supported companies and comparing this with the performance of an appropriately identified control group of companies not receiving support.”
On page 144 the following example suggests using as counterfactual what is observed before the intervention: “The simplest method is to use the initial situation ("baseline") as the counterfactual. For example, 100 SMEs receive investment support, between them they increase their capital stock from EUR20 million to EUR30 million. In this simple scenario, EUR20m is the baseline and EUR30m-EUR20m = Eur10 million is the estimated impact of assistance."
Extreme caution is needed to interpret the observed differences as “effects”
These observed differences (over time, across individuals) indeed show “objective facts”: for example, the performance of the supported firms is superior to that of the non-supported firms; the capital stock has increased after the support. What is problematic is the interpretation of these differences, what is dubious is their causal interpretation. Such interpretation is crucial for decision makers: only differences that have a plausible causal interpretation reveal “what works”. For example, how much of the difference in outcomes between supported and non supported companies is due to the support received? And how much of the difference is instead due to the way that differently performing companies sort themselves – in or out – when deciding whether to apply for support?
Impact evaluation is essentially about interpreting differences in a causal sense. The challenge facing the evaluator is to avoid giving a causal interpretation to differences that are due to other factors, not to the intervention. It is necessary to identify the possible sources of bias arising in each specific situation and indicate which methods can overcome these biases, under which assumptions. This is the essence of counterfactual impact evaluation.
Identifying effects from before-after comparisons of beneficiaries
Let us take the first of the basic comparisons, the before-after difference. When the same units are observed both before and after they are exposed to an intervention, the fundamental evaluation problem is that the observed change could be due to the intervention as well as to “other changes” occurring during the same period.
The problem can be formally illustrated by the following decomposition:

In particular, maturation and natural evolution imply that the social or economic phenomena the intervention is trying to affect, do evolve naturally over time, in ways that are independent of the intervention. For example, the socio-economic situation of urban neighbourhoods tends to evolve over time, for better or for worse. Thus, the differences observed before and after an urban renewal programme will incorporate the (possible) effect of the programme and the results of such maturation/natural evolution.
The identification of causal effects from before-after comparisons is generally very problematic. Other than assuming away the problem—assuming temporal stability, that is, that there is no maturation or natural evolution—there is often little that can be done.
Before-after differences do not reveal the true effect of the intervention, unless we assume complete stability of other factors. Formally

It should be stressed the different meaning of the terms “observe”, “assume” and “infer”. We observe
, we can assume OB-A = 0, which would allow us to infer that E =
. The assumption OB-A = 0 would be called the “identifying assumption”, because it would be crucial in giving a casual interpretation to the observed difference.
Identifying effects by comparing beneficiaries and non-beneficiaries
By far the most common strategy to estimate the causal effect of an intervention is to exploit the fact that some “units” have been exposed to the intervention and some other have not, according to some selection mechanism or rule.
For example, eligible enterprises might or might not apply for state aid to finance R&D projects; unemployed workers might or might not participate in a retraining programme after a plan closure; urban neighbourhoods might or might not receive funding for urban renewal projects. Although the existence of universal policies cannot be ignored, they are relatively rare in the case on cohesion policies. In most cases, it is possible to find units that are not exposed to the policy. For simplicity, we consider only the case of a simple binary treatment, where either the units receive the treatment implied by the policy, or they do not.
The outcomes observed among beneficiaries can be compared to those among non-beneficiaries, (assuming the outcomes can be measured for both groups with the same instrument): however, this difference does not by itself reveal the true effect of the intervention on the outcome. It cannot necessarily be interpreted in a causal sense. The causal interpretation depends on the nature of the process that leads some units to be exposed to the intervention, while others are not.
The observed difference can always be thought as the sum of two components: the true effect of the policy and the difference created by the selection process itself. Neither one can actually be observed, we can only make guesses about them. The following decomposition is fundamental to show the logic behind the impact evaluation methods illustrated in this section of the Sourcebook.

For example, in the case of the support given to firms to invest in new equipments, the differences between the performance of supported and non-supported firms can be decomposed into the true causal effect (possibly zero) of the support and the differences due to the selection process that sorts companies into applicants and non-applicants, and then sorts applicants into recipients and non-recipients. It is very likely that supported and non-supported firms would differ in terms of performance even if the former had not received the support.
The difference observed between beneficiaries and non-beneficiaries does not reveal (identify) the true effect of the intervention unless the selection bias is zero. Formally:

Again, the line of reasoning is the following: we only observe
, we can assume S T-NT= 0, which would allow us to infer that E =
. Then ST-NT= 0 would be called the “identifying assumption”.
But how does one eliminate selection bias? Eliminating selection bias represents the major challenge in conducting impact evaluations and it has received a lot of attention by the statistical, economic and sociological methodologists. A range of methods and techniques are available to (attempt to) deal with it. Knowledge of the selection process is crucial in order to choose the best methods. The methods presented in this section of the Sourcebook have a common goal: to recover the true effect of an intervention on the beneficiaries by forcing ST-NT to be as close as possible to zero.
The ideal strategy to eliminate selection bias: randomization
The ideal strategy to eliminate selection bias is to randomly select who becomes a beneficiary and who becomes a non beneficiary. In this case we know selection bias is zero. Formally:

Unfortunately, randomization is rarely a feasible option for cohesion policies, because it requires that the control over “who received what” is given to the evaluator, who in turns gives it to chance. However, cohesion policies are first and foremost interventions that assign resources to local actors. Randomly assigning resources to local actors for purpose of evaluation is politically unfeasible, because it contradicts the very nature of the allocation process to disadvantaged areas.
At a more disaggregated level, when local actors allocate the resources to specific initiatives or projects, randomization can be used in order to learn “what works”. The learning generated by the use of randomization could motivate some local actors to adopt it despite its difficulties. Randomization as an evaluation strategy is now widely used in the context of developing countries. On the other hand, even when politically feasible, randomization still encounters many limitations (and more detractors one would expect on the basis of these limitations alone).
Randomization produces impact estimates that are internally valid, but are difficult to generalize: such generalization is key to the usefulness of the result for policy-making. Experiments are often costly and require close monitoring to ensure that they are effectively administered. The potential for denying treatment can pose ethical questions that are politically sensitive. These may reduce the chances of an experiment being considered as a means of evaluating a programme and may also increase the chances of those responsible for delivery of the programme being reluctant to cooperate.
Randomization requires careful planning of interventions, an early involvement of the evaluator and a degree of stability of the environment in which the experiment is taking place: all features that are rarely present in the public sector of EU Member States. Randomization requires that the intervention is fairly simple, while cohesion policies are traditionally complex, because they insist on multifaceted/multilevel problems: while complexity is an overall obstacle to evaluation, and to knowledge more generally, in the case of randomization the clash between methods and circumstances is particularly evident.
There are also practical problems that can bias the estimates. It may be that the implementation of the experiment itself alters the framework within which the programme operates. This is known as ‘randomisation bias’ and can arise for a number of reasons. For instance, if random exclusion from a programme demotivates those who have been randomised out, they may perform more poorly than they might otherwise have done, thus artificially boosting the apparent advantages of participation.
Another endemic problem with experiments is non compliance. This can take the form of no-shows (those assigned to treatment who drop-out before it is completed, sometimes even before it starts) or of crossovers (those assigned to control who manage to receive treatment anyway). With both no-shows and crossovers, non-experimental methods can be used to retrieve the desired parameters. However, this is a second-best position since experiments are designed specifically to avoid this sort of adjustment. Moreover, it is worth noting that the problems of programme no-shows and crossovers are not unique to experiments, although experiments may exacerbate the second problem by creating a pool of people who want to participate but were refused by the randomisation process.
To conclude, any credible strategy for evaluating the impact of cohesion policy must include in its arsenal a number of non-experimental methods and techniques (also referred to as “quasi-experimental”).
The non-experimental strategies to reduce/eliminate selection bias
The general strategy pursued by the evaluator using non-experimental methods can be represented by the following expression:

The following section illustrates four main non-experimental strategies to correct the presence of selection bias and recover the causal effect of the intervention. We examine them in turns.
A. The difference-in-differences identification strategy
Difference-in-differences or double differencing is based on the precondition that outcome data (for example, firm sales) are available for beneficiaries and non-beneficiaries (assisted and non assisted firms), both before and after the intervention (say, the year preceding and the year following the receipt of assistance).

As a consequence, we also are able to observe 
Effects are obtained by subtracting
the pre-intervention difference in outcomes between beneficiaries and non-beneficiaries from the post-intervention difference. The identifying assumption is that selection bias is constant in time, so that S T-NT = ST-NT|t-1.

The result of the double difference can be interpreted as a causal effect only if the pre-post trend for non-beneficiaries is a good approximation for the (counterfactual) trend among beneficiaries. The plausibility of this assumption can be tested if more periods of pre-intervention data are available.
Difference-in-differences in details
B. The matching identification strategy
The matching strategy is based on the possibility of observing all the relevant characteristics X of both beneficiaries and non-beneficiaries and to pick the non-beneficiaries that “look alike” beneficiaries along these characteristics.

Once the matching is performed, the effect of the intervention is identified by the remaining difference in outcomes between beneficiaries and matched non-beneficiaries, under the assumptions that matching has also eliminated selection bias.

The plausibility of the elimination of selection bias by matching cannot be tested: it becomes more credible as more and more X’s related to the selection process are observable.
Propensity score matching in details
C. The discontinuity identification strategy
The strategy is based on the idea of discontinuity in treatment around a threshold, which applies mainly to those situations in which some units are made eligible for the intervention and others are made ineligible by some well defined rule, typically some administrative rule. The two groups are similar in other respects, but they are (sharply) divided according to their position with respect to a threshold, indicated with C*: those on one side of the threshold are exposed to the policy, those on the other side are not.
The essential idea for identifying the effect is that around C* with has a situation similar to randomization. Let us indicate with
a neighbourhood of C*. Forma

The effect of the treatment (around the threshold) is obtained by the difference in outcomes around the threshold. The identifying assumption (more credible than most) is that selection bias is zero around the threshold.

It should be noted that the estimated effect is a local effect: it is more credible (internal validity) but less generalizable (external validity).
Discontinuity design in details
D. The instrumental variables identification strategy
The fourth strategy is based on the idea of involuntary variation (in the official jargon instrumental variables): those situations in which the receipt of treatment is partially determined by an extraneous factor. As it will be apparent in the specific chapter, this identification strategy is notably more complex. The point of departure is that the structural effect of interest E cannot be recovered with any strategy based on the adjustment of S. There are no ways of forcing ST-NT to go to zero

However the existence of the extraneous factor Z, which influences participation (to keep things simple we assume this to be binary), allows a way around the problem. One actually needs two identifying assumptions. The first is that the extraneous factor has an influence on T, in the sense that those with Z=1 participate in the policy with higher probability than those with Z=0. Thus we can write the effect of Z on T as

The second assumption is that the true effect of Z on the outcome can be recovered without any bias. This can be written as:

Thus Z induces two effects: one on the outcome, one on participation. Neither effect is of much interest from a policy perspective, we are interested in the effect E of participation. It can be shown that E can be obtained by the ratio of the two effects of Z:

The proof of this result requires some algebra, while it is difficult to convey it intuitively.
Instrumental variables in details