Difference-in-differences


Description and purposes of the tool

The impact of a policy on an outcome can be estimated by computing a double difference, one over time (before-after) and one across subjects (between beneficiaries and non beneficiaries). In its simplest form, this method requires only aggregate data on the outcome variable:  no covariates or microdata are strictly necessary. If sample average data is available for beneficiaries and non beneficiaries for at least two time periods, the difference-in-differences (DID) method produces estimates of impacts that are in principle more plausible than those based on a single difference (either over time or between groups). However, some untestable assumptions are still needed in order to identify impacts through double differencing.

There are two ways to explain how double differencing produces impact estimates. The most intuitive is to start out with the difference in outcomes between beneficiaries and non beneficiaries, measured after the intervention has taken place (for example, the difference in average employment between supported and non supported SME, a year after the support has been provided.)  As seen in the introductory chapter, such difference does not reveal the effect of the intervention, since beneficiaries tend to be different from non beneficiaries even in the absence of the intervention.  This is what we called selection bias.  Now, let us suppose we have data on the outcome variable for beneficiaries and non-beneficiaries observed before the intervention takes place.  Subtracting the pre-intervention difference in outcomes from the post-intervention difference eliminates one kind of selection bias, namely the kind related to time-invariant individual characteristics.  In other words, if what differentiates beneficiaries and non beneficiaries is fixed in time, subtracting the pre-intervention differences eliminates selection bias and produces a plausible estimate of the impact of the intervention.

 

A stylized example

URBAN I and II were Community Initiatives funded through the Structural Funds, to promote regeneration in urban areas suffering from high unemployment, high levels of poverty and social exclusion, and poor environmental conditions.[1]  Evaluating the success of these programmes involves answering causal questions, such as “did the urban regeneration programmes produce a positive effect on the socio-economic conditions of the areas involved?”  The difference-in-differences method can provide an answer as long as the outcome of interest can be measured both before and after the implementation of the urban regeneration programme in a representative sample of both participating and non participating urban areas.

Let us take the impact on the unemployment rate: it is estimated by subtracting the difference observed between the two groups before the intervention from the difference observed after the intervention. The following picture provides a graphical illustration of this interpretation of the difference-in-difference method.  On the horizontal axis we have time, with two points, one before and one after the urban regeneration initiative was implemented.  Let us say, 2000 and 2006, as in the URBAN II initiative.  On the vertical axis we put the unemployment rate.  Each of the four circles in the graph represents an average:  two are taken in 2000 and two in 2006, respectively among the 70 urban areas that received funding for urban regeneration, and among a sample of 70 comparable areas, located in the same cities, but not given any funding.[2] 

Obviously, the difference observed between the two groups of areas in 2006 is not the impact of the programme:  this difference could be caused entirely by the selection process—that is, areas with higher unemployment rate had better chances of being admitted into the programme.  If taken as an indication of programme impact, the difference shown in the graph would represent a disappointing result:  that URBAN produces no useful impact on the labour market, because after the intervention the unemployment rate is higher in the funded areas than in the unfunded ones.

The fallacy of this interpretation is fully evident when uses data on the unemployment rate observed before the intervention. Figure 1 shows that in 2000 the difference in the unemployment rate between the two groups of areas was even larger than in 2006.  It is the reduction in the unemployment rate gap that can be interpreted as the impact of the programme. 

However, the validity of this conclusion depends on a crucial assumption:  that in  the absence of URBAN, the trend among funded areas would have been similar to that of the unfunded areas.  Graphically, this is tantamount to drawing a dotted line parallel to the trend observed among unfunded areas, but starting where the funded areas are in 2000.  This dotted line points a square in 2006:  this is the counterfactual, our estimate of what would have happened to the unemployment rate in URBAN areas had URBAN not been implemented.

 

An alternative explanation

An alternative way to explain how the double differencing identifies the impact of a policy is to start from the change observed over time among beneficiaries.   This difference cannot be interpreted as the impact of the policy, because many other factors and processes unfolding over time, besides the intervention, might have caused the observed change.  One way to take this “natural dynamics” into account is to compute the change over time observed among non-beneficiaries during the same period.  Subtracting the change observed over time among non-beneficiaries from that observed among beneficiaries produces an estimate of the impact of the programme.  It is the same estimate as that shown in Figure 1, because it depends on the same crucial assumption—that in the absence of the intervention the trend among the two groups of areas would have been the same.  This different view of the same result is illustrated in Figure 2.

The results cannot be different than before:  the four points did not move, the dotted line is parallel to the same solid line and thus leads to the same counterfactual.  What is different is the line of reasoning used to interpret the data.  In the first case, one stresses selection bias and the attempt to correct it by subtracting pre-intervention differences. In the second case, one stresses the other type of distortion, due to natural dynamics, and attempts to correct it by subtracting the change observed among non-beneficiaries.  In both cases, one really makes the same assumption:  that of “parallelism” between what actually happened and what would have happened without the policy.


 

[1]The first round of the URBAN programme was launched in 1994 and ran until 1999. URBAN I supported 118 European cities in 15 Member States and had a community contribution of €950 million.  Its successor, URBAN II supported 70 programmes across 14 countries and received €754 million from the European Regional Development Fund (ERDF).

[2]ECOTEC (2009), in an attempt to apply DID to the URBAN II programme, compared the unemployment rate of the URBAN II area with the rate for the city as a whole.

Circumstances in which it is applied

The applicability of the DID method requires that the outcome is replicable over time, that is, equivalent measurements can be taken repeatedly in successive time periods and that this repeated measurement can be done independently of the existence of the policy.  Many if not most outcomes relevant for public policy are replicable over time for the same units —such as sales or profits of firms, the income of individuals or the consumption of households. We have panel data if the measures are taken on the same units over time. 

Some outcomes have only one meaningful realization for each individual unit, such as the duration of unemployment after a job loss, or the weight of babies at birth.   In these cases reliability can be obtained at a more aggregate level by using successive cohorts of individuals experiencing the same event. For example, successive cohorts of individuals entering unemployment will produce distinct estimates of the average duration of unemployment.

Another issue relevant for the applicability of DID is whether data on the outcome variable are routinely collected as part of official statistics, such as the unemployment rate and the per capita GDP, or instead outcome data must be collected ad hoc.  In the latter case, a serious obstacle to the applicability of DID often comes from the fact that nobody before the intervention has given any thought to collecting such data, particularly at the level of geographical detail that becomes relevant after the policy is implemented.[1]  If comparable pre-intervention data are lacking, one can resort to retrospective measurement, taken after the policy is implemented but with reference to both the pre-intervention period as well as the post-intervention period. The danger of such strategy is contamination between measures referring to different time periods but collected with the same interview.

The applicability of the method requires also that the intervention is of a discrete (binary) nature:  one needs units that are exposed and units that are not exposed to the policy.  Interventions of a continuous nature cannot be easily analysed with this method.[2]

 

[1]ECOTEC (2009) documents the difficulties in obtaining unemployment rate data for urban areas for the years 2000 and 2006.

[2]The reader is referred the discussion of Chapter 5 of Angrist and Pischke (2008) on many issues relevant to DID, such as a comparison with fixed-effect models, the use of covariates, as well as extensions to multiple periods and continuous treatments.

The main steps involved

In order to illustrate the steps involved a real application of the DID method will be used as an example: it is taken from an evaluation of the impact of Structural Funds in Sweden during the period 1995 to 1999 (ITPS 2004).  The study was sponsored by the Swedish Institute for Growth Policy Studies and conducted by Oxford Research and the University of Umea: it consists of a “comparison between the group of municipalities that have been recipients of structural fund projects with the group of municipalities that have not received structural funds”. 

Step 1.   Defining the outcome variable(s)

The analysis can be conducted with respect to as many outcome variables there are data for. The Swedish study focuses “on the trends in three goal indicators (per capita income, employment and population) in order to see the effects the structural funds have had in the relatively poorest Swedish municipalities”.  The analysis is then extended to cover intermediate outcomes, to explore the mechanisms behind the effects (or the lack thereof).  We will report results for one outcome variable, the annual growth in per capita income, because the study conducts most of the analysis with respect to this variable.

 

Step 2.   Defining the time dimension

In the Swedish study “the two periods that are compared are the period 1990–1995 and the period 1995–1999. The first period ends in the year the geographical programme was introduced and the second period includes the entire period of time covered by the geographical programme. The periods have been selected in such a way that they cover approximately the same length of time.” While the latter is not a requirement, it is important that the choice of periods clearly distinguishes a “before” the intervention period and an “after” period.

 

Step 3.   Computing the double difference

The basic analysis is simply a matter of computing averages for the two groups in the two time periods, thus obtaining a value corresponding to the four circles displayed in Figure 1 and 2.  These averages are best displayed in the following format, showing the groups been compared on the rows and the time periods on the columns. The simple differences are found in the two margins, while the “difference between the differences” is shown in the lowest right cell of the table. 

The table can be read in two different ways, in line with the two interpretations discussed earlier. If one reads the columns first, the focus is on the differences between the two groups of municipalities.  It turns out that the two groups did not differ much in terms of per capita income growth in the five years leading up to the 95-99 structural funds intervention.  They differ more sharply after the policy is enacted, in the sense that the non supported municipalities experience higher growth in per capita income.  The DID estimate is thus the difference between an almost zero pre-intervention difference and a negative post-intervention difference, leading to a negative DID estimate.

The importance of double differencing can be more fully appreciated if one reads the table by the rows.  The first row taken by itself would have one conclude that the intervention is extremely effective:  an average growth rate of 2.35 percent has become a more substantial 4.45 percent—almost double.  However, the other municipalities fared even better, with a 2.80 point increase in the rate of growth. The DID estimate is obviously the same as before, negative 0.70 points.

The following is the comment in the report: “Where per capita income is concerned, the result is that the municipalities in receipt of support were the more successful of the two groups during the period 1990 to 1995 when they did not receive any support. However, during the period 1995 to 1999 municipalities in receipt of support were significantly less successful compared to municipalities not in receipt of support. Even if development trends are positive in both groups, it is the group not in receipt of support that is most successful. The difference-in-difference rating is –0.70 which shows that annual growth in municipalities in receipt of support is 0.70 percentage points lower than in municipalities not in receipt of support. This is thus a sign of an increasing difference between the two types of municipalities.”

The report is careful in not attaching a strong causal interpretation to this conclusion, talking only about “increasing difference between the two types of municipalities”.  In other parts of the report we find stronger statements, for example “The main conclusion of the evaluation is that it is not possible to trace any effects of the EC’s geographical programmes on overall regional development. During the period the programmes were studied, the regional differences have tended to intensify rather than be leveled out.”

It must be stressed that any causal interpretation rests on one — untestable — assumption:  that in the absence of the programme the supported municipalities would have continued to enjoy the same growth as the non supported ones.  In this particular case, this assumption seems implausible.  Most likely the supported municipalities were on a lower growth path than the supported ones.  If this were the case, what seems to be a negative impact could well turn into a zero impact, or a positive one.

 

Step 4.   Relaxing the assumption of “parallelism”

There are two possible extension of the simple DID method:  they both require the availability of “more data” in order to relax the parallelism assumption. If one had outcome data for more time pre-intervention time periods—in the example, for the previous five years, from 1985 to 1989—one could test directly the hypothesis that the growth paths were the same in  the two groups in the absence of the intervention. If it would turn out that indeed to growth paths were different, this information can be incorporated into the analysis.

The alternative to more outcome data is data on other variables that influence both the outcome variable and are correlated with treatment status. However, incorporating other variables entails a big loss in terms of simplicity: it requires a shift from the simple—and intuitive—differences between means to the use of a regression model estimated on microdata. 

 

Step 5.   Using regression to replicate the DID results

Let us see first how the results shown in Table 1 can be obtained through a regression model, then we will add covariates to the model.  Using the same data that produced the DID estimate, one can easily estimate the following regression equation:

 

The following are the estimates reported in the study:

By comparing the estimates in Table 2 with those in Table 1, one can easily see that the regression exactly reproduces the estimates produced by the differences in means. More precisely, the estimate of α of 2.28 corresponds to the average income growth for the municipalities without support in the 1990-95 period. The initial difference between the two groups is reproduced by β and it is an insignificant 0.071.   By contrast, very significantly different from zero is the pre-post difference for the municipalities without support, 2.80, reproduced by γ.  Finally, the DID impact estimate corresponds to δ and turns out to be significant, and negative.

Why then go to the trouble of estimating a regression, if the results are identical to those obtained by simple differences? The main reason is that other variables can be added to the right-hand side of the equation, allowing a different way of relaxing the stringent parallelism assumption.

 

Step 6.   Including covariates into the regression

The Swedish study adds two covariates to the regression model.  One is defined as a cycle indicator, and it is percentage change in proportion of the population employed in the private sector in the municipality, the second is defined as a structural indicator and it is the percentage change in the proportion of the population aged 25–64 in the municipality.  These variables are intended, according to the report, to “test whether any periodical and/or structural changes have taken place between the two periods of time that can possibly better explain regional development than support from the EC’s geographical programmes”.

The addition of the two variables to the model (including the interaction terms with the existing regressors) changes the estimates of the Structural Funds impact from negative and significant to basically zero, as shown in Table 3. 

In the words of the Swedish report “there are no significant differences in the extended model between the two groups of municipalities in the first period. The difference-in-difference estimate is still negative but it is not significant.”

The other interaction terms, not shown in Table 3, allow the evaluators to assess how the effects of the two explanatory variables interact with the funding in time.  The report continues “If we look more closely at the two explanatory variables, it can be seen for example that the variable proportion of private sector employees is the driving force for income growth in the group of municipalities not in receipt of support, particularly in the period 1995 to 1999, the higher economic activity in Sweden during these years seem to have benefited these municipalities.” From the complex interaction structure between the explanatory variables and the treatment and period indicators, we calculated that the effect of a percentage point increase in private employment has the following pattern:

Effect of cycle

 

Unfunded

1990-95

Unfunded

1995-99

Funded

1990-95

Funded

1995-99

indicator

 

0.027

0.308

-0.019

-0.164

It can be seen that the effect of the cycle indicator is sizeable and positive only for the municipalities not in receipt of support, in the period 1995-99. 

As far as the structural indicator is concerned, the report states that “where the variable proportion of the population in the age group 25–64 years is concerned, the picture is more diffuse. Where the municipalities in receipt of support are concerned, in the period 1990 to 1995 there was a weakly significant negative relationship between this proportion of the population and income growth, while for the period 1995–1999 there was a weakly significant positive relationship.

 

Effect of

 

Unfunded

1990-95

Unfunded

1995-99

Funded

1990-95

Funded

1995-99

structural indicator

 

0.185

-0.176

-0.262

0.387

 

 

 

 

 

The report concludes with the following heroic explanation: “One possible interpretation of this can be that the support disbursed during the period 1995 to 1999 has made it possible to convert a larger proportion of population in working age to growth into per capita income while this was not possible during the period during which the municipalities did not receive support.”

Strengths and limitation of the approach

Despite its wide applicability, the difference–in-differences method is not the magic bullet of impact evaluation some claim it to be. On its positive side is the fact of not requiring complex data structures to be estimated, just aggregate data on policy outcomes, collected before and after the intervention.  As one applies the method in practice, its limitations start to become clear.

On the practical side, the need of pre-intervention outcome data often represents an insurmountable obstacle, most often because of lack of planning in data collection. On the more conceptual side, the simplicity of the method comes at a price in terms of assumptions:  the crucial identifying assumption to obtain impact estimates is that the counterfactual trend is the same for treated and non treated units. This assumption can only be tested (and relaxed if violated), if more data are available. 

In making explicit the trade-off between data and assumptions the DID method represents a great tool for teaching the logic of non-experimental methods.  Its greatness is significantly reduced when the method is actually used to derive impact estimates.

Selected references

Angrist J., Pischke J.S. [2008], Mostly Harmless Econometrics, Princeton University Press, NJ

Bertrand M., Duflo E. and Mullainathan S. [2004], How Much Should We Trust Differences-in-Differences Estimates?, in «The Quarterly Journal of Economics», 2004, vol. 119, n. 1, pp. 249-275.

Card D., Krueger A. [1994], Minimum Wages and Employment: A Case Study of the Fast-Food Industry in New Jersey and Pennsylvania, in «The American Economic Review», 1994, vol. 84, n. 4, pp. 772-793.

Card D., Krueger A. [1997], Myth and Measurement: The New Economics of the Minimum Wage, Princeton, NJ, Princeton University Press.

ITPS [2004], The EC Regional Structural Funds impact in Sweden 1995-1999:A quantitative analysis, Swedish Institute for Growth Policy Studies, University of Umeå, Department of Geography.

Last update: 01/12/2009 | Top